Looking for a fast, compact, streamable, multi-language, strongly typed serialization format

Posted by sanity on Stack Overflow See other posts from Stack Overflow or by sanity
Published on 2009-07-28T02:17:59Z Indexed on 2010/04/27 14:23 UTC
Read the original article Hit count: 305

Filed under:
|
|
|

I'm currently using JSON (compressed via gzip) in my Java project, in which I need to store a large number of objects (hundreds of millions) on disk. I have one JSON object per line, and disallow linebreaks within the JSON object. This way I can stream the data off disk line-by-line without having to read the entire file at once.

It turns out that parsing the JSON code (using http://www.json.org/java/) is a bigger overhead than either pulling the raw data off disk, or decompressing it (which I do on the fly).

Ideally what I'd like is a strongly-typed serialization format, where I can specify "this object field is a list of strings" (for example), and because the system knows what to expect, it can deserialize it quickly. I can also specify the format just by giving someone else its "type".

It would also need to be cross-platform. I use Java, but work with people using PHP, Python, and other languages.

So, to recap, it should be:

  • Strongly typed
  • Streamable (ie. read a file bit by bit without having to load it all into RAM at once)
  • Cross platform (including Java and PHP)
  • Fast
  • Free (as in speech)

Any pointers?

© Stack Overflow or respective owner

Related posts about serialization

Related posts about java