Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  Mac Studio max spec, ~120 tflops (fp16?), 384GB RAM, 3x bandwidth, $9499
512GB.

DGX has 256GB/s bandwidth so it wouldn't offer the most tokens/s.



Perhaps they are referring to default GPU allocation that is 75% of the unified memory, but it is trivial to increase it.


The GPU memory allocation refers to how capacity is alloted, not bandwidth. Sounds like the same 256-bit/quad-channel 8000MHz lpddr5 you can get today with Strix Halo.


384GB is 75% of 512GB. The M3 Ultra bandwidth is over 800GB/s, though potentially less in practice.

Using an M3 Ultra I think the performance is pretty remarkable for inference and concerns about prompt processing being slow in particular are greatly exaggerated.

Maybe the advantage of the DGX Spark will be for training or fine tuning.


I very consistently see people say prompt processing is slow for larger context sizes ("notoriously slow"), something that is much less of an issue with eg CUDA setups.


Depends on the model. gpt-oss-120b will easily crunch large prompts in a few seconds. It's remarkable. It's gpt-4-mini at home.


tokens/s/$ then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: