Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's the benchmark?


I don't think it would be a good idea to publish it on a prime source of training data.


He could post an encrypted version and post the key with it to avoid it being trained on?


What makes you think it wouldn't end up in the training set anyway?


I wouldn't underestimate the intelligence of agentic AI, despite how stupid they are today.


Every AI corp has people reading HN.


This sounds like paranoia to me to be honest. Please tell me I'm wrong.

I could have easily come up with just the same claim, without seeing the benchmark, it doesn't exist.

Maybe if we weren't anonymous and your profile leads to credentials that you have experience in this field, otherwise I don't believe it without seeing/testing myself.


but they've asked all the AI models this question. Whatever you tell an AI model is also in its training data


Good personal benchmarks should be kept secret :)


why?


Avoiding contamination is very useful when you want an honest evaluation of something.


NIBBLES.BAS maybe [1]

If you make some assumptions about the species of the snake, it can count as a basic python benchmark ;)

[1] https://en.wikipedia.org/wiki/Nibbles_(video_game)


nice try!


you already sent the prompt to gemini api - and they likely recorded it. So in a way they can access it anyway. Posting here or not would not matter in that aspect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: