> "...It matches the full-precision (i.e., FP16 or BF16)
Wait... WHAT?!
When did //HALF PRECISION// become //FULL PRECISION//?
FWIW, I cannot find where you're quoting from. I cannot find "matches" on TFA nor the GitHub link. And in the paper I see
3.2 Inference Accuracy
The bitnet.cpp framework enables lossless inference for ternary BitNet b1.58 LLMs. To evaluate inference accuracy, we randomly selected 1,000 prompts from WildChat [ ZRH+24 ] and compared the outputs generated by bitnet.cpp and llama.cpp to those produced by an FP32 kernel. The evaluation was conducted on a token-by-token basis, with a maximum of 100 tokens per model output, considering an inference sample lossless only if it exactly matched the full-precision output.
When did //HALF PRECISION// become //FULL PRECISION//?
FWIW, I cannot find where you're quoting from. I cannot find "matches" on TFA nor the GitHub link. And in the paper I see