Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
axpy906
4 months ago
|
parent
|
context
|
favorite
| on:
Evals in 2025: going beyond simple benchmarks to b...
My thoughts were this. The moment it is public it’s probably in the training data set. The real evals are the ones that you have to make an a problem you’re trying to solve and the data you are using.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: