Large scale storage for incrementally-appended documents?
Posted
by
Ben Dilts
on Stack Overflow
See other posts from Stack Overflow
or by Ben Dilts
Published on 2011-01-03T01:43:17Z
Indexed on
2011/01/03
1:53 UTC
Read the original article
Hit count: 564
I need to store hundreds of thousands (right now, potentially many millions) of documents that start out empty and are appended to frequently, but never updated otherwise or deleted. These documents are not interrelated in any way, and just need to be accessed by some unique ID.
Read accesses are some subset of the document, which almost always starts midway through at some indexed location (e.g. "document #4324319, save #53 to the end").
These documents start very small, at several KB. They typically reach a final size around 500KB, but many reach 10MB or more.
I'm currently using MySQL (InnoDB) to store these documents. Each of the incremental saves is just dumped into one big table with the document ID it belongs to, so reading part of a document looks like "select * from saves where document_id=14 and save_id > 53 order by save_id", then manually concatenating it all together in code.
Ideally, I'd like the storage solution to be easily horizontally scalable, with redundancy across servers (e.g. each document stored on at least 3 nodes) with easy recovery of crashed servers.
I've looked at CouchDB and MongoDB as possible replacements for MySQL, but I'm not sure that either of them make a whole lot of sense for this particular application, though I'm open to being convinced.
Any input on a good storage solution?
© Stack Overflow or respective owner