For all readers, arithmetic codes are better in nearly all ways. They can be imp...

lifthrasiir · on July 4, 2024

If you do have an option to switch from Huffman, rANS is now the way to go, not a clasical arithmetic coding.

londons_explore · on July 4, 2024

Two slight benefits of Huffman codes over arithmetic:

* They usually self synchronize when some data is corrupted (but not guaranteed, does not apply where the Huffman table is dynamic)

* Neither Huffman nor arithmetic codes are easy to parallelize the decoding of, but Huffman is slightly easier.

kqr · on July 4, 2024

I was under the impression that arithmetic codes are guaranteed to be at least one bit less efficient than Huffman codes per input block. What makes you say they have better compression ratio?

Are you thinking of pre-defined Huffman tables that aren't adapted to the input? Because the latter ought to be as good as it gets.

(I agree with the other benefits. Since arithmetic coding tables are built in a streaming fashion rather than constructing the codebook up front, they are more memory-efficient while working.)

lifthrasiir · on July 4, 2024

Huffman codes are conceptually isomorphic to arithmetic codes where all probabilities are 2^-k with k integer, so they have an obvious disadvantage due to more inaccurate symbol distribution.

SassyBird · on July 4, 2024

Hopefully k is natural. ;)

lifthrasiir · on July 4, 2024

Implied because any symbol distribution which probabilities do not sum to 1 is invalid anyway ;-)

hcs · on July 4, 2024

Huffman codes are less efficient per symbol since each symbol is a bit string, arithmetic coding effectively smears symbols across bits more finely. Whether you use a dynamic or static probability model is a different issue applying to either coding method. (Emotionally though I prefer Huffman codes, they're just so neat)

lazamar · on July 4, 2024

There is one way in which Huffman codes are better: they are easier to explain and simpler to implement.

I went for simplicity of exposition in the post, but arithmetic coders can indeed get arbitrarily close to the entropy, which is not quite the case with Huffman.

nottorp · on July 4, 2024

> easier to explain

I think Huffman is the one compression algorithm that compresses stuff significantly that can fit on the proverbial napkin, so it's a good start.

The others require the whole napkin stack at the table.

userbinator · on July 4, 2024

LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same, but LZ will find repetitions easily. LZ also decompresses extremely quickly --- faster than memcpy is often mentioned.

d_burfoot · on July 4, 2024

> LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same

Comparing LZ to arithmetic encoding is a category error. LZ and Huffman are combined modeling+encoding methods, while arithmetic is just an encoding method, and it can be combined with any modeling technique. Arithmetic plus a suitable modeling technique will achieve compression as good as LZ, Huffman, or any other scheme. The PAQ8 compressors, and I believe its successors in the Hutter Prize ranking, use arithmetic plus a very advanced modeling scheme.

http://prize.hutter1.net/hfaq.htm#paq8

lazamar · on July 4, 2024

Indeed, but worth noting that LZ is a modelling scheme, whilst Huffman is a coding technique.

That is, LZ determines, dynamically as it goes, what are all the elements we want to encode and their probabilities. Then you need a coder, like Huffman, to actually encode it.

In the post I used a semi-static zero-order byte-based model. Which means I counted the byte occurrences first and just used that count for the probabilities throughout all of the encoding. Then I used Huffman codes to translate those probabilities into bits.

But I'm considering writing a follow-up changing this static model for an LZ77 one as I think that would be fun.