Compression algorithm for IEEE-754 data

Posted by David Taylor on Stack Overflow See other posts from Stack Overflow or by David Taylor
Published on 2010-02-10T17:05:44Z Indexed on 2010/03/28 22:53 UTC
Read the original article Hit count: 527

Filed under:
|
|

Anyone have a recommendation on a good compression algorithm that works well with double precision floating point values? We have found that the binary representation of floating point values results in very poor compression rates with common compression programs (e.g. Zip, RAR, 7-Zip etc).

The data we need to compress is a one dimensional array of 8-byte values sorted in monotonically increasing order. The values represent temperatures in Kelvin with a span typically under of 100 degrees. The number of values ranges from a few hundred to at most 64K.

Clarifications

  • All values in the array are distinct, though repetition does exist at the byte level due to the way floating point values are represented.

  • A lossless algorithm is desired since this is scientific data. Conversion to a fixed point representation with sufficient precision (~5 decimals) might be acceptable provided there is a significant improvement in storage efficiency.

Update

Found an interesting article on this subject. Not sure how applicable the approach is to my requirements.

http://users.ices.utexas.edu/~burtscher/papers/dcc06.pdf

© Stack Overflow or respective owner

Related posts about data

Related posts about compression