What's the benchmark?

ahmedfromtunis · 2025-11-18T15:59:49 1763481589

I don't think it would be a good idea to publish it on a prime source of training data.

Hammershaft · 2025-11-18T16:06:08 1763481968

He could post an encrypted version and post the key with it to avoid it being trained on?

benterix · 2025-11-18T16:14:12 1763482452

What makes you think it wouldn't end up in the training set anyway?

rs186 · 2025-11-18T18:08:39 1763489319

I wouldn't underestimate the intelligence of agentic AI, despite how stupid they are today.

stefs · 2025-11-18T23:01:33 1763506893

Every AI corp has people reading HN.

mlrtime · 2025-11-19T12:33:06 1763555586

This sounds like paranoia to me to be honest. Please tell me I'm wrong.

I could have easily come up with just the same claim, without seeing the benchmark, it doesn't exist.

Maybe if we weren't anonymous and your profile leads to credentials that you have experience in this field, otherwise I don't believe it without seeing/testing myself.

shawabawa3 · 2025-11-19T10:44:44 1763549084

but they've asked all the AI models this question. Whatever you tell an AI model is also in its training data

petters · 2025-11-18T15:58:36 1763481516

Good personal benchmarks should be kept secret :)

mlrtime · 2025-11-19T12:33:56 1763555636

pclmulqdq · 2025-11-19T13:47:52 1763560072

Avoiding contamination is very useful when you want an honest evaluation of something.

GuB-42 · 2025-11-19T12:48:54 1763556534

NIBBLES.BAS maybe [1]

If you make some assumptions about the species of the snake, it can count as a basic python benchmark ;)

[1] https://en.wikipedia.org/wiki/Nibbles_(video_game)

prodigycorp · 2025-11-18T16:01:03 1763481663

nice try!

ankit219 · 2025-11-18T18:42:59 1763491379

you already sent the prompt to gemini api - and they likely recorded it. So in a way they can access it anyway. Posting here or not would not matter in that aspect.