Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Really impressive performance from the Moondream model, but looking at the results from the big 3 labs, it's absolutely wild how poorly Claude and OpenAI perform. Gemini isn't as good as Moondream, but it's clearly the only one that's even half way decent at these vision tasks. I didn't realize how big a performance gap there was.


Funnily enough, Gemini is also the only one able to read a D20. ChatGPT consistently gets it wrong, and Claude mostly argues it can't read the face of the die that's facing up because it's obstructed (it's not lol).


I'm not sure why they haven't been acquired yet by any of the big ones, since clearly Moondream is pretty good! Definitely seems like something Anthropic/OpenAI/whoever would want to fold into their platforms and such. Everyone involved in creating it should probably be swimming in money and visual use cases for LLMs should become far less useless with the reach of the big orgs.


Gemini is really fantastic at anything that's OCR-adjacent, and it promptly falls over on most other image-related tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: