How should I compress a file with multiple bytes that are the same with Huffman coding?

Posted by Omega on Programmers See other posts from Programmers or by Omega
Published on 2012-11-22T01:30:07Z Indexed on 2012/11/22 5:11 UTC
Read the original article Hit count: 295

Filed under:

On my great quest for compressing/decompressing files with a Java implementation of Huffman coding (http://en.wikipedia.org/wiki/Huffman_coding) for a school assignment, I am now at the point of building a list of prefix codes. Such codes are used when decompressing a file. Basically, the code is made of zeroes and ones, that are used to follow a path in a Huffman tree (left or right) for, ultimately, finding a byte.

In this Wikipedia image, to reach the character m the prefix code would be 0111 enter image description here

The idea is that when you compress the file, you will basically convert all the bytes of the file into prefix codes instead (they tend to be smaller than 8 bits, so there's some gain).

So every time the character m appears in a file (which in binary is actually 1101101), it will be replaced by 0111 (if we used the tree above).

Therefore, 1101101110110111011011101101 becomes 0111011101110111 in the compressed file.

I'm okay with that. But what if the following happens:

In the file to be compressed there exists only one unique byte, say 1101101.
There are 1000 of such byte.

Technically, the prefix code of such byte would be... none, because there is no path to follow, right? I mean, there is only one unique byte anyway, so the tree has just one node.

Therefore, if the prefix code is none, I would not be able to write the prefix code in the compressed file, because, well, there is nothing to write.

Which brings this problem: how would I compress/decompress such file if it is impossible to write a prefix code when compressing? (using Huffman coding, due to the school assignment's rules)

This tutorial seems to explain a bit better about prefix codes: http://www.cprogramming.com/tutorial/computersciencetheory/huffman.html but doesn't seem to address this issue either.

Developer IT

How should I compress a file with multiple bytes that are the same with Huffman coding? - Developer IT

How should I compress a file with multiple bytes that are the same with Huffman coding?

algorithms

algorithm-analysis

Related posts about algorithms

Finding a problem in some task [closed]

Genetic algorithms

understanding evaluation function

How to know whether to create a general system or to hack a solution

How to implement a genetic algorithm with distance, time, and cost

Related posts about algorithm-analysis

Algorithm Analysis tool for java

algorithm analysis - orders of growth question

Sublinear Extra Space MergeSort

Help with algorithmic complexity in custom merge sort implementation

How meaningful is the Big-O time complexity of an algorithm?

Categories cloud