NoSQL as a movement is an interesting beast. I kinda like that it’s negatively defined (I happen to belong myself to at least one other such a-community). It’s not in its roots about proposing one specific new silver bullet to kill an old problem. it’s about challenging the consensus.
Actually, blindly and systematically replacing relational databases with object databases would just replace one set of issues with another.
No, the point is to recognize that relational databases are not a universal answer -although they have been used as one for so long- and recognize instead that there’s a whole spectrum of data storage solutions out there.
Why is it so hard to recognize, by the way? You are already using some of those other data storage solutions every day. Let me cite a few:
- The file system
- Active Directory
- XML / JSON documents
- The Web
- e-mail
- Logs
- Excel files
- EXIF blobs in your photos
- Relational databases
- And yes, object databases
It’s just a fact of modern life. Notice by the way that most of the data that you use every day is unstructured and thus mostly unsuitable for relational storage. It really is more a matter of recognizing it: you are already doing NoSQL.
So what happens when for any reason you need to simultaneously query two or more of these heterogeneous data stores? Well, you build an index of sorts combining them, and that’s what you query instead.
Of course, there’s not much distance to travel from that to realizing that querying is better done when completely separated from storage.
So why am I writing about this today?
Well, that’s something I’ve been giving lots of thought, on and off, over the last ten years. When I built my first CMS all that time ago, one of the main problems my customers were facing was to manage and make sense of the mountain of unstructured data that was constituting most of their business. The central entity of that system was the file system because we were dealing with lots of Word documents, PDFs, OCR’d articles, photos and static web pages. We could have stored all that in SQL Server. It would have worked. Ew. I’m so glad we didn’t.
Today, I’m working on Orchard (another CMS ;). It’s a pretty young project but already one of the questions we get the most is how to integrate existing data.
One of the ideas I’ll be trying hard to sell to the rest of the team in the next few months is to completely split the querying from the storage. Not only does this provide great opportunities for performance optimizations, it gives you homogeneous access to heterogeneous and existing data sources. For free.