How best to merge/sort/page through tons of JSON arrays?
- by Joshiatto
Here's the scenario: Say you have millions of JSON documents stored as text files. Each JSON document is an array of "activity" objects, each of which contain a "created_datetime" attribute. What is the best way to merge/sort/filter/page through these activities via a web UI? For example, say we want to take a few thousand of the documents, merge them into a gigantic array, sort the array by the "created_datetime" attribute descending and then page through it 10 activities at a time.
Also keep in mind that roughly 25% of these JSON documents are updated every day, and updates have to make it into the view within 5 minutes.
My first thought is to parse all of the documents into an RDBMS table and then it would just be a simple query such as "select top 10 name, created_datetime from Activity where user_id=12345 order by created_datetime desc".
Some have suggested I use NoSQL techniques such as hadoop or map/reduce instead. How exactly would this work?
For more background, see: Why is NoSQL better for this scenario?