These bar charts are getting more disingenuous every day. This one makes it seem...

swores · 2025-03-12T08:35:09 1741768509

The chart isn't claiming to be an overview of the best ranking models - it's an evaluation of this particular model, which wouldn't be helped at all by having loads more unrelated models in the chart, even if that would have helped you avoid misunderstanding the point of the chart.

sigmoid10 · 2025-03-12T08:40:55 1741768855

How are better ranking models unrelated? They are explicitly comparing open and closed, small and large foundation models. Leaving the best ones out is just plain disingenuous. There's no way to sugarcoat this.

antirez · 2025-03-12T09:25:44 1741771544

The most disturbing thing is that in the chart it ranks higher than V3. Test a few prompts against DeepSeek V3 and Gemma 3. They are like at two totally different levels, one is a SOTA model, one is a small LLM that can be useful for certain vertical tasks perhaps.

pzo · 2025-03-12T08:30:41 1741768241

open llm leaderboard [0] is probably good to compare open weights model on many different benchmarks - wish they put also some closed source one just to see what's relative ranking of best open weights to closed source one. They haven't updated yet for gemma 3 though

[0] https://huggingface.co/spaces/open-llm-leaderboard/open_llm_...

sigmoid10 · 2025-03-12T08:43:30 1741769010

Beware that they use very narrow metrics. Which is also why you only see fine-tunes over there gaming narrow aspects. If your edge case fits into one of those - great. If not and you just want a good general purpose model you'll have to look elsewhere.