LZ is even better. Neither arithmetic nor Huffman will compress when the probabi...

d_burfoot · on July 4, 2024

> LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same

Comparing LZ to arithmetic encoding is a category error. LZ and Huffman are combined modeling+encoding methods, while arithmetic is just an encoding method, and it can be combined with any modeling technique. Arithmetic plus a suitable modeling technique will achieve compression as good as LZ, Huffman, or any other scheme. The PAQ8 compressors, and I believe its successors in the Hutter Prize ranking, use arithmetic plus a very advanced modeling scheme.

http://prize.hutter1.net/hfaq.htm#paq8

lazamar · on July 4, 2024

Indeed, but worth noting that LZ is a modelling scheme, whilst Huffman is a coding technique.

That is, LZ determines, dynamically as it goes, what are all the elements we want to encode and their probabilities. Then you need a coder, like Huffman, to actually encode it.

In the post I used a semi-static zero-order byte-based model. Which means I counted the byte occurrences first and just used that count for the probabilities throughout all of the encoding. Then I used Huffman codes to translate those probabilities into bits.

But I'm considering writing a follow-up changing this static model for an LZ77 one as I think that would be fun.