Document-oriented vs Column-oriented database fit

Posted by user1007922 on Programmers See other posts from Programmers or by user1007922
Published on 2012-10-01T21:49:35Z Indexed on 2012/10/02 3:55 UTC
Read the original article Hit count: 535

Filed under:

cassandra

I have a data-intensive application that desperately needs a database make-over.

The general data model: There are records with RIDs, grouped together by group IDs (GID). The records have arbitrary data fields, (maybe 5-15) with a few of them mandatory and the rest optional, and thus sparse.

The general use model: There are LOTS and LOTS of Writes. Millions to Billions of records are stored. Very often, they are associated with new GIDs, but sometimes, they are associated with existing GIDs.

There aren't as many reads, but when they happen, they need to be pretty fast or at least constant speed regardless of the database size. And when the reads happen, it will need to retrieve all the records/RIDs with a certain GID.

I don't have a need to search by the record field values. Primarily, I will need to query by the GID and maybe RID.

What database implementation should I use?

I did some initial research between document-oriented and column-oriented databases and it seems the document-oriented ones are a good fit, model-wise. I could store all the records together under the same document key using the GID. But I don't really have any use for their ability to search the document contents itself.

I like the simplicity and scalability of column-oriented databases like Cassandra, but how should I model my data in this paradigm for optimal performance? Should my key be the GID and should I create a column for each record/RID? (there maybe thousands or hundreds of thousands of records in a group/GID). Or should my key be the RID and ensure each row has a column for the GID value? What results in faster writes and reads under this model?

Developer IT

Document-oriented vs Column-oriented database fit - Developer IT

Document-oriented vs Column-oriented database fit

database

mongodb

database-design

cassandra

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

Cloning A Database On The Same Server Using Rman Duplicate From Active Database

cPickle ImportError: No module named multiarray

SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

OTN ???? ?????? ???????

Related posts about mongodb

MongoDB usage best practices

Errors trying to run MongoDB

push new value to mongodb inner array - mongodb/php

Write-only collections in MongoDB

How to install mongoDB on windows?

Categories cloud