Indie Eagle

Posted on Friday, 03 October, 2003 - 07:16 pm:

What is the most efficient way, in terms of file size, to compress and decompress binary data (1s and 0s)?

Andre Rzym

Posted on Friday, 03 October, 2003 - 08:23 pm:

Consider two extremes:

(i) We have a file which we know contains 1 million digits, each of which is equally probably - random and uncorrelated. Then no compression is possible. (ii) We have a file which we know contains 1 million digits of p. Now the digits may be a bit of a pain to calculate, but they are perfectly deterministic, and therefore the file actually contains no information - we can compress it to zero! Decompression for this (zero length) compressed file is simply a digit generator for p!

The point is, the amount of compression depends upon the number of possible messages and their individual probability.

Once you know this, then Huffman Encoding will do the trick for you. Optimally.

This is a rather terse reply because it is a rather big subject, but let me know if you want more details or examples.

Andre