Random access gzip stream
Posted
by jkff
on Stack Overflow
See other posts from Stack Overflow
or by jkff
Published on 2010-03-26T21:39:33Z
Indexed on
2010/03/26
21:43 UTC
Read the original article
Hit count: 333
I'd like to be able to do random access into a gzipped file. I can afford to do some preprocessing on it (say, build some kind of index), provided that the result of the preprocessing is much smaller than the file itself.
Any advice?
My thoughts were:
- Hack on an existing gzip implementation and serialize its decompressor state every, say, 1 megabyte of compressed data. Then to do random access, deserialize the decompressor state and read from the megabyte boundary. This seems hard, especially since I'm working with Java and I couldn't find a pure-java gzip implementation :(
- Re-compress the file in chunks of 1Mb and do same as above. This has the disadvantage of doubling the required disk space.
- Write a simple parser of the gzip format that doesn't do any decompressing and only detects and indexes block boundaries (if there even are any blocks: I haven't yet read the gzip format description)
© Stack Overflow or respective owner