Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The claim of “strongest” (what does that even mean?) seems moot. I don’t think a multimodal model is the way to go to use on single, home, GPUs.

I would much rather have specific tailored models to use in different scenarios, that could be loaded into the GPU when needed. It’s a waste of parameters to have half of the VRAM loaded with parts of the model targeting image generation when all I want to do is write code.



That's interesting. Are they often an amalgam of image & text tokens? Because, yeah, image generation is not interesting to em at all.


Perhaps the model performs better (has higher intelligence) if it was trained on a more diverse set of topics (?)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: