Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These bar charts are getting more disingenuous every day. This one makes it seem like Gemma3 ranks as nr. 2 on the arena just behind the full DeepSeek R1. But they just cut out everything that ranks higher. In reality, R1 currently ranks as nr. 6 in terms of Elo. It's still impressive for such a small model to compete with much bigger models, but at this point you can't trust any publication by anyone who has any skin in model development.


The chart isn't claiming to be an overview of the best ranking models - it's an evaluation of this particular model, which wouldn't be helped at all by having loads more unrelated models in the chart, even if that would have helped you avoid misunderstanding the point of the chart.


How are better ranking models unrelated? They are explicitly comparing open and closed, small and large foundation models. Leaving the best ones out is just plain disingenuous. There's no way to sugarcoat this.


The most disturbing thing is that in the chart it ranks higher than V3. Test a few prompts against DeepSeek V3 and Gemma 3. They are like at two totally different levels, one is a SOTA model, one is a small LLM that can be useful for certain vertical tasks perhaps.


open llm leaderboard [0] is probably good to compare open weights model on many different benchmarks - wish they put also some closed source one just to see what's relative ranking of best open weights to closed source one. They haven't updated yet for gemma 3 though

[0] https://huggingface.co/spaces/open-llm-leaderboard/open_llm_...


Beware that they use very narrow metrics. Which is also why you only see fine-tunes over there gaming narrow aspects. If your edge case fits into one of those - great. If not and you just want a good general purpose model you'll have to look elsewhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: