Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1 H100 is only 80GB of HBM. I guess you mean a server with 4xH100 is 1 node?


this is essentially 400b params. With FP8, comparing to Grok'3 320B model, which requires 320GB VRam in int4, I think what the OP meant is actually 8 H100.

Which is ... a lot to say the least.

And all optimization is for latency, not throughput, because with 8 H100, you can easily hosted 4 replicas of 70B.


Thanks for the correction, there are indeed 8x nodes. https://developer.nvidia.com/blog/introducing-nvidia-hgx-h10...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: