Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For clarity, this is ONLY the forward pass of the model. There's no training code, batching, kv cache for efficiency, GPU support, etc ...

Neat, but please add one-line comments/docstrings where these missing bits would go.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: