For all readers, arithmetic codes are better in nearly all ways. They can be implemented in less RAM and code, they compress and decompress to a better ratio, and the probabilities of different symbols appearing can be dynamically updated during the stream far more easily.
The only reason Huffman codes are used is they were invented first and arithmetic codes were patented. That patent has now expired, so we should use the better design.
I was under the impression that arithmetic codes are guaranteed to be at least one bit less efficient than Huffman codes per input block. What makes you say they have better compression ratio?
Are you thinking of pre-defined Huffman tables that aren't adapted to the input? Because the latter ought to be as good as it gets.
(I agree with the other benefits. Since arithmetic coding tables are built in a streaming fashion rather than constructing the codebook up front, they are more memory-efficient while working.)
Huffman codes are conceptually isomorphic to arithmetic codes where all probabilities are 2^-k with k integer, so they have an obvious disadvantage due to more inaccurate symbol distribution.
Huffman codes are less efficient per symbol since each symbol is a bit string, arithmetic coding effectively smears symbols across bits more finely. Whether you use a dynamic or static probability model is a different issue applying to either coding method. (Emotionally though I prefer Huffman codes, they're just so neat)
There is one way in which Huffman codes are better: they are easier to explain and simpler to implement.
I went for simplicity of exposition in the post, but arithmetic coders can indeed get arbitrarily close to the entropy, which is not quite the case with Huffman.
LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same, but LZ will find repetitions easily. LZ also decompresses extremely quickly --- faster than memcpy is often mentioned.
> LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same
Comparing LZ to arithmetic encoding is a category error. LZ and Huffman are combined modeling+encoding methods, while arithmetic is just an encoding method, and it can be combined with any modeling technique. Arithmetic plus a suitable modeling technique will achieve compression as good as LZ, Huffman, or any other scheme. The PAQ8 compressors, and I believe its successors in the Hutter Prize ranking, use arithmetic plus a very advanced modeling scheme.
Indeed, but worth noting that LZ is a modelling scheme, whilst Huffman is a coding technique.
That is, LZ determines, dynamically as it goes, what are all the elements we want to encode and their probabilities. Then you need a coder, like Huffman, to actually encode it.
In the post I used a semi-static zero-order byte-based model.
Which means I counted the byte occurrences first and just used that count for the probabilities throughout all of the encoding. Then I used Huffman codes to translate those probabilities into bits.
But I'm considering writing a follow-up changing this static model for an LZ77 one as I think that would be fun.
The only reason Huffman codes are used is they were invented first and arithmetic codes were patented. That patent has now expired, so we should use the better design.