DocumentDB - Another Azure NoSQL Storage Service

Posted by Shaun on Geeks with Blogs See other posts from Geeks with Blogs or by Shaun
Published on Mon, 25 Aug 2014 00:26:29 GMT Indexed on 2014/08/25 10:21 UTC
Read the original article Hit count: 572

Filed under:

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2014/08/25/documentdb---another-azure-nosql-storage-service.aspx

Microsoft just released a bunch of new features for Azure on 22nd and one of them I was interested in most is DocumentDB, a document NoSQL database service on the cloud.

 

Quick Look at DocumentDB

We can try DocumentDB from the new azure preview portal. Just click the NEW button and select the item named DocumentDB to create a new account.

Screen Shot 2014-08-23 at 11.19.27

Specify the name of the DocumentDB, which will be the endpoint we are going to use to connect later. Select the capacity unit, resource group and subscription. In resource group section we can select which region our DocumentDB will be located.

Same as other azure services select the same location with your consumers of the DocumentDB, for example the website, web services, etc..

Screen Shot 2014-08-23 at 11.23.21

After several minutes the DocumentDB will be ready. Click the KEYS button we can find the URI and primary key, which will be used when connecting.

Screen Shot 2014-08-23 at 11.34.13

Now let's open Visual Studio and try to use the DocumentDB we had just created. Create a new console application and install the DocumentDB .NET client library from NuGet with the keyword "DocumentDB".

You need to select "Include Prerelase" in NuGet Package Manager window since this library was not yet released.

Screen Shot 2014-08-24 at 18.37.46

Next we will create a new database and document collection under our DocumentDB account. The code below created an instance of DocumentClient with the URI and primary key we just copied from azure portal, and create a database and collection. And it also prints the document and collection link string which will be used later to insert and query documents.

   1: static void Main(string[] args)
   2: {
   3:     var endpoint = new Uri("https://shx.documents.azure.com:443/");
   4:     var key = "LU2NoyS2fH0131TGxtBE4DW/CjHQBzAaUx/mbuJ1X77C4FWUG129wWk2oyS2odgkFO2Xdif9/ZddintQicF+lA==";
   5:  
   6:     var client = new DocumentClient(endpoint, key);
   7:     Run(client).Wait();
   8:  
   9:     Console.WriteLine("done");
  10:     Console.ReadKey();
  11: }
  12:  
  13: static async Task Run(DocumentClient client)
  14: {
  15:  
  16:     var database = new Database() { Id = "testdb" };
  17:     database = await client.CreateDatabaseAsync(database);
  18:     Console.WriteLine("database link = {0}", database.SelfLink);
  19:  
  20:     var collection = new DocumentCollection() { Id = "testcol" };
  21:     collection = await client.CreateDocumentCollectionAsync(database.SelfLink, collection);
  22:     Console.WriteLine("collection link = {0}", collection.SelfLink);
  23: }

Below is the result from the console window. We need to copy the collection link string for future usage.

Screen Shot 2014-08-24 at 19.43.46

Now if we back to the portal we will find a database was listed with the name we specified in the code.

Screen Shot 2014-08-24 at 19.45.13

Next we will insert a document into the database and collection we had just created. In the code below we pasted the collection link which copied in previous step, create a dynamic object with several properties defined. As you can see we can add some normal properties contains string, integer, we can also add complex property for example an array, a dictionary and an object reference, unless they can be serialized to JSON.

   1: static void Main(string[] args)
   2: {
   3:     var endpoint = new Uri("https://shx.documents.azure.com:443/");
   4:     var key = "LU2NoyS2fH0131TGxtBE4DW/CjHQBzAaUx/mbuJ1X77C4FWUG129wWk2oyS2odgkFO2Xdif9/ZddintQicF+lA==";
   5:  
   6:     var client = new DocumentClient(endpoint, key);
   7:  
   8:     // collection link pasted from the result in previous demo
   9:     var collectionLink = "dbs/AAk3AA==/colls/AAk3AP6oFgA=/";
  10:  
  11:     // document we are going to insert to database
  12:     dynamic doc = new ExpandoObject();
  13:     doc.firstName = "Shaun";
  14:     doc.lastName = "Xu";
  15:     doc.roles = new string[] { "developer", "trainer", "presenter", "father" };
  16:  
  17:     // insert the docuemnt
  18:     InsertADoc(client, collectionLink, doc).Wait();
  19:  
  20:     Console.WriteLine("done");
  21:     Console.ReadKey();
  22: }

the insert code will be very simple as below, just provide the collection link and the object we are going to insert.

   1: static async Task InsertADoc(DocumentClient client, string collectionLink, dynamic doc)
   2: {
   3:     var document = await client.CreateDocumentAsync(collectionLink, doc);
   4:     Console.WriteLine(await JsonConvert.SerializeObjectAsync(document, Formatting.Indented));
   5: }

Below is the result after the object had been inserted.

Screen Shot 2014-08-24 at 19.53.02

Finally we will query the document from the database and collection. Similar to the insert code, we just need to specify the collection link so that the .NET SDK will help us to retrieve all documents in it.

   1: static void Main(string[] args)
   2: {
   3:     var endpoint = new Uri("https://shx.documents.azure.com:443/");
   4:     var key = "LU2NoyS2fH0131TGxtBE4DW/CjHQBzAaUx/mbuJ1X77C4FWUG129wWk2oyS2odgkFO2Xdif9/ZddintQicF+lA==";
   5:  
   6:     var client = new DocumentClient(endpoint, key);
   7:  
   8:     var collectionLink = "dbs/AAk3AA==/colls/AAk3AP6oFgA=/";
   9:  
  10:     SelectDocs(client, collectionLink);
  11:  
  12:     Console.WriteLine("done");
  13:     Console.ReadKey();
  14: }
  15:  
  16: static void SelectDocs(DocumentClient client, string collectionLink)
  17: {
  18:     var docs = client.CreateDocumentQuery(collectionLink + "docs/").ToList();
  19:     foreach(var doc in docs)
  20:     {
  21:         Console.WriteLine(doc);
  22:     }
  23: }

Since there's only one document in my collection below is the result when I executed the code. As you can see all properties, includes the array was retrieve at the same time. DocumentDB also attached some properties we didn't specified such as "_rid", "_ts", "_self" etc., which is controlled by the service.

Screen Shot 2014-08-24 at 20.14.29

 

DocumentDB Benefit

DocumentDB is a document NoSQL database service. Different from the traditional database, document database is truly schema-free. In a short nut, you can save anything in the same database and collection if it could be serialized to JSON.

We you query the document database, all sub documents will be retrieved at the same time. This means you don't need to join other tables when using a traditional database. Document database is very useful when we build some high performance system with hierarchical data structure.

For example, assuming we need to build a blog system, there will be many blog posts and each of them contains the content and comments. The comment can be commented as well. If we were using traditional database, let's say SQL Server, the database schema might be defined as below.

image

When we need to display a post we need to load the post content from the Posts table, as well as the comments from the Comments table. We also need to build the comment tree based on the CommentID field.

But if were using DocumentDB, what we need to do is to save the post as a document with a list contains all comments. Under a comment all sub comments will be a list in it. When we display this post we just need to to query the post document, the content and all comments will be loaded in proper structure.

   1: {
   2:     "id": "xxxxx-xxxxx-xxxxx-xxxxx",
   3:     "title": "xxxxx",
   4:     "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
   5:     "postedOn": "08/25/2014 13:55",
   6:     "comments": 
   7:     [
   8:         {
   9:             "id": "xxxxx-xxxxx-xxxxx-xxxxx",
  10:             "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
  11:             "commentedOn": "08/25/2014 14:00",
  12:             "commentedBy": "xxx"
  13:         },
  14:         {
  15:             "id": "xxxxx-xxxxx-xxxxx-xxxxx",
  16:             "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
  17:             "commentedOn": "08/25/2014 14:10",
  18:             "commentedBy": "xxx",
  19:             "comments":
  20:             [
  21:                 {
  22:                     "id": "xxxxx-xxxxx-xxxxx-xxxxx",
  23:                     "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
  24:                     "commentedOn": "08/25/2014 14:18",
  25:                     "commentedBy": "xxx",
  26:                     "comments":
  27:                     [
  28:                         {
  29:                             "id": "xxxxx-xxxxx-xxxxx-xxxxx",
  30:                             "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
  31:                             "commentedOn": "08/25/2014 18:22",
  32:                             "commentedBy": "xxx",
  33:                         }
  34:                     ]
  35:                 },
  36:                 {
  37:                     "id": "xxxxx-xxxxx-xxxxx-xxxxx",
  38:                     "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
  39:                     "commentedOn": "08/25/2014 15:02",
  40:                     "commentedBy": "xxx",
  41:                 }
  42:             ]
  43:         },
  44:         {
  45:             "id": "xxxxx-xxxxx-xxxxx-xxxxx",
  46:             "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
  47:             "commentedOn": "08/25/2014 14:30",
  48:             "commentedBy": "xxx"
  49:         }
  50:     ]
  51: }

 

DocumentDB vs. Table Storage

DocumentDB and Table Storage are all NoSQL service in Microsoft Azure. One common question is "when we should use DocumentDB rather than Table Storage". Here are some ideas from me and some MVPs.

First of all, they are different kind of NoSQL database. DocumentDB is a document database while table storage is a key-value database.

Second, table storage is cheaper. DocumentDB supports scale out from one capacity unit to 5 in preview period and each capacity unit provides 10GB local SSD storage. The price is $0.73/day includes 50% discount. For storage service the highest price is $0.061/GB, which is almost 10% of DocumentDB.

Third, table storage provides local-replication, geo-replication, read access geo-replication while DocumentDB doesn't support.

Fourth, there is local emulator for table storage but none for DocumentDB. We have to connect to the DocumentDB on cloud when developing locally.

But, DocumentDB supports some cool features that table storage doesn't have. It supports store procedure, trigger and user-defined-function. It supports rich indexing while table storage only supports indexing against partition key and row key. It supports transaction, table storage supports as well but restricted with Entity Group Transaction scope. And the last, table storage is GA but DocumentDB is still in preview.

 

Summary

In this post I have a quick demonstration and introduction about the new DocumentDB service in Azure. It's very easy to interact through .NET and it also support REST API, Node.js SDK and Python SDK.

Then I explained the concept and benefit of  using document database, then compared with table storage.

 

Hope this helps,

Shaun

All documents and related graphics, codes are provided "AS IS" without warranty of any kind.
Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

© Geeks with Blogs or respective owner

Related posts about Microsoft Azure