Data aggregation mongodb vs mysql
Posted
by Dimitris Stefanidis
on Stack Overflow
See other posts from Stack Overflow
or by Dimitris Stefanidis
Published on 2010-05-12T11:53:56Z
Indexed on
2010/05/12
18:34 UTC
Read the original article
Hit count: 398
I am currently researching on a backend to use for a project with demanding data aggregation requirements. The main project requirements are the following.
Store millions of records for each user. Users might have more than 1 million entries per year so even with 100 users we are talking about 100 million entries per year.
Data aggregation on those entries must be performed on the fly. The users need to be able to filter on the entries by a ton of available filters and then present summaries (totals , averages e.t.c) and graphs on the results. Obviously I cannot precalculate any of the aggregation results because the filter combinations (and thus the result sets) are huge.
Users are going to have access on their own data only but it would be nice if anonymous stats could be calculated for all the data.
The data is going to be most of the time in batch. e.g the user will upload the data every day and it could like 3000 records. In some later version there could be automated programs that upload every few minutes in smaller batches of 100 items for example.
I made a simple test of creating a table with 1 million rows and performing a simple sum of 1 column both in mongodb and in mysql and the performance difference was huge. I do not remember the exact numbers but it was something like mysql = 200ms , mongodb = 20 sec.
I have also made the test with couchdb and had much worse results.
What seems promising speed wise is cassandra which I was very enthusiastic about when I first discovered it. However the documentation is scarce and I haven't found any solid examples on how to perform sums and other aggregate functions on the data. Is that possible ?
As it seems from my test (Maybe I have done something wrong) with the current performance its impossible to use mongodb for such a project although the automated sharding functionality seems like a perfect fit for it.
Does anybody have experience with data aggregation in mongodb or have any insights that might be of help for the implementation of the project ?
Thanks, Dimitris
© Stack Overflow or respective owner