Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For all readers, arithmetic codes are better in nearly all ways. They can be implemented in less RAM and code, they compress and decompress to a better ratio, and the probabilities of different symbols appearing can be dynamically updated during the stream far more easily.

The only reason Huffman codes are used is they were invented first and arithmetic codes were patented. That patent has now expired, so we should use the better design.



If you do have an option to switch from Huffman, rANS is now the way to go, not a clasical arithmetic coding.


Two slight benefits of Huffman codes over arithmetic:

* They usually self synchronize when some data is corrupted (but not guaranteed, does not apply where the Huffman table is dynamic)

* Neither Huffman nor arithmetic codes are easy to parallelize the decoding of, but Huffman is slightly easier.


I was under the impression that arithmetic codes are guaranteed to be at least one bit less efficient than Huffman codes per input block. What makes you say they have better compression ratio?

Are you thinking of pre-defined Huffman tables that aren't adapted to the input? Because the latter ought to be as good as it gets.

(I agree with the other benefits. Since arithmetic coding tables are built in a streaming fashion rather than constructing the codebook up front, they are more memory-efficient while working.)


Huffman codes are conceptually isomorphic to arithmetic codes where all probabilities are 2^-k with k integer, so they have an obvious disadvantage due to more inaccurate symbol distribution.


Hopefully k is natural. ;)


Implied because any symbol distribution which probabilities do not sum to 1 is invalid anyway ;-)


Huffman codes are less efficient per symbol since each symbol is a bit string, arithmetic coding effectively smears symbols across bits more finely. Whether you use a dynamic or static probability model is a different issue applying to either coding method. (Emotionally though I prefer Huffman codes, they're just so neat)


There is one way in which Huffman codes are better: they are easier to explain and simpler to implement.

I went for simplicity of exposition in the post, but arithmetic coders can indeed get arbitrarily close to the entropy, which is not quite the case with Huffman.


> easier to explain

I think Huffman is the one compression algorithm that compresses stuff significantly that can fit on the proverbial napkin, so it's a good start.

The others require the whole napkin stack at the table.


LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same, but LZ will find repetitions easily. LZ also decompresses extremely quickly --- faster than memcpy is often mentioned.


> LZ is even better. Neither arithmetic nor Huffman will compress when the probability of all symbols is the same

Comparing LZ to arithmetic encoding is a category error. LZ and Huffman are combined modeling+encoding methods, while arithmetic is just an encoding method, and it can be combined with any modeling technique. Arithmetic plus a suitable modeling technique will achieve compression as good as LZ, Huffman, or any other scheme. The PAQ8 compressors, and I believe its successors in the Hutter Prize ranking, use arithmetic plus a very advanced modeling scheme.

http://prize.hutter1.net/hfaq.htm#paq8


Indeed, but worth noting that LZ is a modelling scheme, whilst Huffman is a coding technique.

That is, LZ determines, dynamically as it goes, what are all the elements we want to encode and their probabilities. Then you need a coder, like Huffman, to actually encode it.

In the post I used a semi-static zero-order byte-based model. Which means I counted the byte occurrences first and just used that count for the probabilities throughout all of the encoding. Then I used Huffman codes to translate those probabilities into bits.

But I'm considering writing a follow-up changing this static model for an LZ77 one as I think that would be fun.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: