Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are also CPU extensions like AVX512-VNNI and AVX512-BF16. Maybe the idea of communicating out to a card that holds your model will eventually go away. Inference is not too memory bandwidth hungry, right?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: