GitHub already has a program to scan for keys, since publishing Discord tokens by mistake used to get the token immediately revoked and a DM from the system account saying why
I thought there were many first and third party services looking for this kind of thing (AWS, Github, GWS, crypto, etc tokens). Seems weird that a F500 company repo was not receiving the regular, let alone extra deep scanning which could have trivially found these.
There was a recent post from someone who made the realization that most of these scanning services only investigate the main branch. Extra gold in them hills if you also consider development branches.
For things pushed to github, github has quite sophisticated secret scanning. They have a huge list of providers where they will automatically verify if a potential key is real and revoke it automatically [2], and a smaller list of generic patters they try to match if you enable the matching of "non-provider patterns".
This seems to be a case of someone accidentally publishing their github token somewhere else. I'm not sure how github would cheaply and easily prevent that. Though there are third party tools that scan your web presence for secrets, including trying wordlists of common files or directories
GitHub wants to sell a service. Keys are convenient. Better alternatives in authorization and authentication exist, and GitHub is very aware of them. They even offer and facilitate them. For example, see OIDC. But many users either want keys because they're used to them or GitHub is sure they do, so they continue to offer them to avoid friction. The alternatives require more parameters, thought, and coordination between services.
GitHub has deprecated classic tokens, but the new tokens are not backwards compatible. The deprecated tokens have also continued to be available for some time. Real security professionals will tell you flatly "tokens are bad", and they're right. They're leakable attack vectors. The tokens are the problem and discontinuation is the solution. Scanning is simply symptom treating, and given what I know about Microsoft culture, I doubt that's going to change soon or quickly.
They do scan but they miss a lot. The frequency decreased after Github started scanning all repositories but I still report leaked secrets to bug bounty programs pretty often.
Unfortunately Home Depot don't have a bug bounty program so I don't scan them.
They at least scan GitHub for all kind of exposed tokens in public repositories, and even have partnerships with the companies where you can connect with those tokens (SaaS, PaaS...) to verify they're valid and even revoke them automatically if necessary.
I think there are crawlers that do that. Somehow I accidentally had a commit with an openai key in it, and when I published an open source repo with that commit within ~20 seconds I got an email from openai someone had retired my exposed key.
The article doesn’t say where the Home Depot token was published. Almost certainly not on GitHub or it would have been invalidated. But AFAIK GitHub doesn’t crawl other sites looking for GitHub tokens. I suppose Microsoft could provide GitHub a feed of GitHub tokens found by their Bing crawlers.
They definitely do have automation to scan for this already. I've seen plenty of alerts (fortunately all false positives that triggered on example keys that weren't real). I don't know how comprehensive it is, but it does exist.
I've got friends who tell me they'd never consider applying for YC because they are on a classic H1B with a tech company. What typically happens with folks on a H1B who make it into YC?
That makes sense ... is there a typical path? Basically - are they right in thinking it's a high risk thing for them or should I encourage them to take a shot?
My understanding is that their time spent on the company before YC accepts them is the biggest risk factor?
The general rule about computer fans is that the bigger they are the the quieter they are (i.e. 140mm > 120mm etc.) It's a wierd market gap that large "commercial" air moving fans are so loud.
I was pretty excited for Onyx as a way to stand up a useful open source RAG + LLM at small scale but as of two weeks ago it was clearly full of features ticked off a list that nobody has actually tried to use. For example, you can scrape sites and upload docs but you can’t really keep track of what’s been processed within the UI or map back to the documents cleanly.
It’s nice to see an attempt at an end to end stack (for all that it seems this is “obvious” … there are not that many functional options) but wow we’ve forgotten the basis of making useful products. I’m hoping it gets enough time to bake.
Really appreciate the feedback (and glad to hear the core concept resonated with you).
The admin side of the house has been missing a bit of love, and we have a large overhaul coming soon that I'm hoping addresses some (most?) of your concerns. For now, if you'd like to view documents that have been processed, you can check out the `Explorer` panel on the left.
In general, I'd love to hear more about what gives it that "unbaked" feel for you if you're up for a quick chat.
I'm sure you guys are thinking about this, but please just go through the steps of setting up via docker, uploading say a grad student's worth of papers and docs, scrape a small topic off wikipedia, try and use it for three days and take a look at the ergonomics. It's not easy to regroup sets of documents, get results that link to the document to view post indexing for RAG etc. etc. etc.
In general there are a lot of low hanging RAG optimizations that you could do to make this usable for people who don't want to write their own bits of code to make it usable. I ended up fiddling a bit more with anythingllm which, while having fewer features, understands the workflows a bit more.
I would normally figure this out by reading motherboard manuals. Which for SKUs you can buy standalone tend to be on the manufacter's site with no account/paywall. They tend to include all the "if you populate this slot you lose xyz" language. Along with how to change PCIe lane bifercation in bios if nessesary.
I was genuinely impressed by the Antigravity browser plugin for "agentic" work.
I ran into a neat website and asked it to generate a similiar UX with Astro and it did a decent-ish job of seeing how the site handled scrolling visually and in code and replicating it in a tidy repo.
I mean it's even simpler. Almonds are entirely non essential (many other more water efficient nuts) to the food supply and in California consume more water than the entire industrial sector, and a bit more than all residential usage (~5 million acre-feet of water).
Add a datacenter tax of 3x to water sold to datacenters and use it to improve water infrastructure all around. Water is absolutely a non-issue medium term, and is only a short term issue because we've forgotten how to modestly grow infrastructure in response to rapid changes in demand.
It seems like a cheap and simple thing to offer your customers a little extra safety.
Anybody interested in starting a platform agnostic service to do this?
reply