Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

it performs worst than 8b llama 3 so you probably don't need that much.


Where do you see that? This comparison[0] shows it outperforming Llama-3-8B on 5 out of 6 benchmarks. I'm not going to claim that this model looks incredible, but it's not that easily dismissed for a model that has the compute complexity of a 17B model.

[0]: https://www.snowflake.com/wp-content/uploads/2024/04/table-3...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: