Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hard to do for an industry benchmark since doing the test in such a mode requires sending the question to the LLM which then basically puts it into a public training set.

This has been tried multiple times by multiple people and it ends up not doing so great over time in terms of retaining immunity to “cheating”.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: