There are also CPU extensions like AVX512-VNNI and AVX512-BF16. Maybe the idea o...

		bee_rider 14 days ago \| parent \| context \| favorite \| on: TinyTinyTPU: 2×2 systolic-array TPU-style matrix-m... There are also CPU extensions like AVX512-VNNI and AVX512-BF16. Maybe the idea of communicating out to a card that holds your model will eventually go away. Inference is not too memory bandwidth hungry, right?