Hacker Newsnew | past | comments | ask | show | jobs | submit | rao-v's commentslogin

I’m surprised that GitHub, OpenAI etc. doesn’t have automation to scan the usual surfaces for hashes of their access tokens.

It seems like a cheap and simple thing to offer your customers a little extra safety.

Anybody interested in starting a platform agnostic service to do this?


GitHub already has a program to scan for keys, since publishing Discord tokens by mistake used to get the token immediately revoked and a DM from the system account saying why

I thought there were many first and third party services looking for this kind of thing (AWS, Github, GWS, crypto, etc tokens). Seems weird that a F500 company repo was not receiving the regular, let alone extra deep scanning which could have trivially found these.

There was a recent post from someone who made the realization that most of these scanning services only investigate the main branch. Extra gold in them hills if you also consider development branches.


For things pushed to github, github has quite sophisticated secret scanning. They have a huge list of providers where they will automatically verify if a potential key is real and revoke it automatically [2], and a smaller list of generic patters they try to match if you enable the matching of "non-provider patterns".

This seems to be a case of someone accidentally publishing their github token somewhere else. I'm not sure how github would cheaply and easily prevent that. Though there are third party tools that scan your web presence for secrets, including trying wordlists of common files or directories

1: https://docs.github.com/en/code-security/secret-scanning/int...

2: https://docs.github.com/en/code-security/secret-scanning/int...


GitHub wants to sell a service. Keys are convenient. Better alternatives in authorization and authentication exist, and GitHub is very aware of them. They even offer and facilitate them. For example, see OIDC. But many users either want keys because they're used to them or GitHub is sure they do, so they continue to offer them to avoid friction. The alternatives require more parameters, thought, and coordination between services.

GitHub has deprecated classic tokens, but the new tokens are not backwards compatible. The deprecated tokens have also continued to be available for some time. Real security professionals will tell you flatly "tokens are bad", and they're right. They're leakable attack vectors. The tokens are the problem and discontinuation is the solution. Scanning is simply symptom treating, and given what I know about Microsoft culture, I doubt that's going to change soon or quickly.


They do scan but they miss a lot. The frequency decreased after Github started scanning all repositories but I still report leaked secrets to bug bounty programs pretty often. Unfortunately Home Depot don't have a bug bounty program so I don't scan them.

Where was this token found, in an open source repo? There are numerous ways to scan commits, for free even in open source repos: https://docs.github.com/en/code-security/secret-scanning/int...

They at least scan GitHub for all kind of exposed tokens in public repositories, and even have partnerships with the companies where you can connect with those tokens (SaaS, PaaS...) to verify they're valid and even revoke them automatically if necessary.

I think there are crawlers that do that. Somehow I accidentally had a commit with an openai key in it, and when I published an open source repo with that commit within ~20 seconds I got an email from openai someone had retired my exposed key.

The article doesn’t say where the Home Depot token was published. Almost certainly not on GitHub or it would have been invalidated. But AFAIK GitHub doesn’t crawl other sites looking for GitHub tokens. I suppose Microsoft could provide GitHub a feed of GitHub tokens found by their Bing crawlers.

They definitely do have automation to scan for this already. I've seen plenty of alerts (fortunately all false positives that triggered on example keys that weren't real). I don't know how comprehensive it is, but it does exist.

GitHub does! They tell you when you pushed something dangerous almost right away.

GitHub Advanced Security blocks the push, I believe.


I've got friends who tell me they'd never consider applying for YC because they are on a classic H1B with a tech company. What typically happens with folks on a H1B who make it into YC?

That's hard to respond to other than to say that there are multiple visas that allow participation in accelerators and training programs.

That makes sense ... is there a typical path? Basically - are they right in thinking it's a high risk thing for them or should I encourage them to take a shot?

My understanding is that their time spent on the company before YC accepts them is the biggest risk factor?


No I don't think it's high risk at all because even if the path were something other than H-1B, they always could go back to H-1B.

As a non-lawyer, it seems that you'd have to leave your company to join YC, meaning you'd forfeit your H1B visa.

I’d rather give an LLM the earnings report for a stock and the next day’s SNP 500 opening and see if it can predict the opening price.

Expecting an LLM to magically beat efficient market theory is a bit silly.

Much more reasonable to see if it can incorporate information as well as the market does (to start)


The general rule about computer fans is that the bigger they are the the quieter they are (i.e. 140mm > 120mm etc.) It's a wierd market gap that large "commercial" air moving fans are so loud.


Unironically this


Windows registry just sort of hovering in the backdrop


Something that is still inheritable, between “there is one and it is global” and “there is a separate copy for each process”.


I was pretty excited for Onyx as a way to stand up a useful open source RAG + LLM at small scale but as of two weeks ago it was clearly full of features ticked off a list that nobody has actually tried to use. For example, you can scrape sites and upload docs but you can’t really keep track of what’s been processed within the UI or map back to the documents cleanly.

It’s nice to see an attempt at an end to end stack (for all that it seems this is “obvious” … there are not that many functional options) but wow we’ve forgotten the basis of making useful products. I’m hoping it gets enough time to bake.


Really appreciate the feedback (and glad to hear the core concept resonated with you).

The admin side of the house has been missing a bit of love, and we have a large overhaul coming soon that I'm hoping addresses some (most?) of your concerns. For now, if you'd like to view documents that have been processed, you can check out the `Explorer` panel on the left.

In general, I'd love to hear more about what gives it that "unbaked" feel for you if you're up for a quick chat.


Hey - good response!

I'm sure you guys are thinking about this, but please just go through the steps of setting up via docker, uploading say a grad student's worth of papers and docs, scrape a small topic off wikipedia, try and use it for three days and take a look at the ergonomics. It's not easy to regroup sets of documents, get results that link to the document to view post indexing for RAG etc. etc. etc.

In general there are a lot of low hanging RAG optimizations that you could do to make this usable for people who don't want to write their own bits of code to make it usable. I ended up fiddling a bit more with anythingllm which, while having fewer features, understands the workflows a bit more.


I’ve been struggling to find an AM5 board that can run three MI50s at 4x. This is perfect thank you.

Him are you sure about some of the PCI slots? I think some marked as 4x get downgraded to 1x on these boards…

Further edit - this maybe accurate - how are you getting this / confirming it?


I would normally figure this out by reading motherboard manuals. Which for SKUs you can buy standalone tend to be on the manufacter's site with no account/paywall. They tend to include all the "if you populate this slot you lose xyz" language. Along with how to change PCIe lane bifercation in bios if nessesary.


I was genuinely impressed by the Antigravity browser plugin for "agentic" work.

I ran into a neat website and asked it to generate a similiar UX with Astro and it did a decent-ish job of seeing how the site handled scrolling visually and in code and replicating it in a tidy repo.


I mean it's even simpler. Almonds are entirely non essential (many other more water efficient nuts) to the food supply and in California consume more water than the entire industrial sector, and a bit more than all residential usage (~5 million acre-feet of water).

Add a datacenter tax of 3x to water sold to datacenters and use it to improve water infrastructure all around. Water is absolutely a non-issue medium term, and is only a short term issue because we've forgotten how to modestly grow infrastructure in response to rapid changes in demand.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: