Compressibility Example
Posted
by user285726
on Stack Overflow
See other posts from Stack Overflow
or by user285726
Published on 2010-06-10T13:39:21Z
Indexed on
2010/06/10
13:42 UTC
Read the original article
Hit count: 353
compression
|huffman-encoding
From my algorithms textbook:
The annual county horse race is bringing in three thoroughbreds who have never competed against one another. Excited, you study their past 200 races and summarize these as prob- ability distributions over four outcomes: first (“first place”), second, third, and other.
Outcome Aurora Whirlwind Phantasm
0.15 0.30 0.20
first
0.10 0.05 0.30
second
0.70 0.25 0.30
third
0.05 0.40 0.20
other
Which horse is the most predictable? One quantitative approach to this question is to look at compressibility. Write down the history of each horse as a string of 200 values (first, second, third, other). The total number of bits needed to encode these track- record strings can then be computed using Huffman’s algorithm. This works out to 290 bits for Aurora, 380 for Whirlwind, and 420 for Phantasm (check it!). Aurora has the shortest encoding and is therefore in a strong sense the most predictable.
How did they get 420 for Phantasm? I keep getting 400 bytes, as so:
Combine first, other = 0.4, combine second, third = 0.6. End up with 2 bits encoding each position.
Is there something I've misunderstood about the Huffman encoding algorithm?
Textbook available here: http://www.cs.berkeley.edu/~vazirani/algorithms.html (page 156).
© Stack Overflow or respective owner