DocumentDB - Another Azure NoSQL Storage Service
- by Shaun
Originally posted on: http://geekswithblogs.net/shaunxu/archive/2014/08/25/documentdb---another-azure-nosql-storage-service.aspxMicrosoft just released a bunch of new features for Azure on 22nd and one of them I was interested in most is DocumentDB, a document NoSQL database service on the cloud. Quick Look at DocumentDB We can try DocumentDB from the new azure preview portal. Just click the NEW button and select the item named DocumentDB to create a new account. Specify the name of the DocumentDB, which will be the endpoint we are going to use to connect later. Select the capacity unit, resource group and subscription. In resource group section we can select which region our DocumentDB will be located. Same as other azure services select the same location with your consumers of the DocumentDB, for example the website, web services, etc.. After several minutes the DocumentDB will be ready. Click the KEYS button we can find the URI and primary key, which will be used when connecting. Now let's open Visual Studio and try to use the DocumentDB we had just created. Create a new console application and install the DocumentDB .NET client library from NuGet with the keyword "DocumentDB". You need to select "Include Prerelase" in NuGet Package Manager window since this library was not yet released. Next we will create a new database and document collection under our DocumentDB account. The code below created an instance of DocumentClient with the URI and primary key we just copied from azure portal, and create a database and collection. And it also prints the document and collection link string which will be used later to insert and query documents. 1: static void Main(string[] args)
2: {
3: var endpoint = new Uri("https://shx.documents.azure.com:443/");
4: var key = "LU2NoyS2fH0131TGxtBE4DW/CjHQBzAaUx/mbuJ1X77C4FWUG129wWk2oyS2odgkFO2Xdif9/ZddintQicF+lA==";
5:
6: var client = new DocumentClient(endpoint, key);
7: Run(client).Wait();
8:
9: Console.WriteLine("done");
10: Console.ReadKey();
11: }
12:
13: static async Task Run(DocumentClient client)
14: {
15:
16: var database = new Database() { Id = "testdb" };
17: database = await client.CreateDatabaseAsync(database);
18: Console.WriteLine("database link = {0}", database.SelfLink);
19:
20: var collection = new DocumentCollection() { Id = "testcol" };
21: collection = await client.CreateDocumentCollectionAsync(database.SelfLink, collection);
22: Console.WriteLine("collection link = {0}", collection.SelfLink);
23: }
Below is the result from the console window. We need to copy the collection link string for future usage.
Now if we back to the portal we will find a database was listed with the name we specified in the code.
Next we will insert a document into the database and collection we had just created. In the code below we pasted the collection link which copied in previous step, create a dynamic object with several properties defined. As you can see we can add some normal properties contains string, integer, we can also add complex property for example an array, a dictionary and an object reference, unless they can be serialized to JSON.
1: static void Main(string[] args)
2: {
3: var endpoint = new Uri("https://shx.documents.azure.com:443/");
4: var key = "LU2NoyS2fH0131TGxtBE4DW/CjHQBzAaUx/mbuJ1X77C4FWUG129wWk2oyS2odgkFO2Xdif9/ZddintQicF+lA==";
5:
6: var client = new DocumentClient(endpoint, key);
7:
8: // collection link pasted from the result in previous demo
9: var collectionLink = "dbs/AAk3AA==/colls/AAk3AP6oFgA=/";
10:
11: // document we are going to insert to database
12: dynamic doc = new ExpandoObject();
13: doc.firstName = "Shaun";
14: doc.lastName = "Xu";
15: doc.roles = new string[] { "developer", "trainer", "presenter", "father" };
16:
17: // insert the docuemnt
18: InsertADoc(client, collectionLink, doc).Wait();
19:
20: Console.WriteLine("done");
21: Console.ReadKey();
22: }
the insert code will be very simple as below, just provide the collection link and the object we are going to insert.
1: static async Task InsertADoc(DocumentClient client, string collectionLink, dynamic doc)
2: {
3: var document = await client.CreateDocumentAsync(collectionLink, doc);
4: Console.WriteLine(await JsonConvert.SerializeObjectAsync(document, Formatting.Indented));
5: }
Below is the result after the object had been inserted.
Finally we will query the document from the database and collection. Similar to the insert code, we just need to specify the collection link so that the .NET SDK will help us to retrieve all documents in it.
1: static void Main(string[] args)
2: {
3: var endpoint = new Uri("https://shx.documents.azure.com:443/");
4: var key = "LU2NoyS2fH0131TGxtBE4DW/CjHQBzAaUx/mbuJ1X77C4FWUG129wWk2oyS2odgkFO2Xdif9/ZddintQicF+lA==";
5:
6: var client = new DocumentClient(endpoint, key);
7:
8: var collectionLink = "dbs/AAk3AA==/colls/AAk3AP6oFgA=/";
9:
10: SelectDocs(client, collectionLink);
11:
12: Console.WriteLine("done");
13: Console.ReadKey();
14: }
15:
16: static void SelectDocs(DocumentClient client, string collectionLink)
17: {
18: var docs = client.CreateDocumentQuery(collectionLink + "docs/").ToList();
19: foreach(var doc in docs)
20: {
21: Console.WriteLine(doc);
22: }
23: }
Since there's only one document in my collection below is the result when I executed the code. As you can see all properties, includes the array was retrieve at the same time. DocumentDB also attached some properties we didn't specified such as "_rid", "_ts", "_self" etc., which is controlled by the service.
DocumentDB Benefit
DocumentDB is a document NoSQL database service. Different from the traditional database, document database is truly schema-free. In a short nut, you can save anything in the same database and collection if it could be serialized to JSON.
We you query the document database, all sub documents will be retrieved at the same time. This means you don't need to join other tables when using a traditional database. Document database is very useful when we build some high performance system with hierarchical data structure.
For example, assuming we need to build a blog system, there will be many blog posts and each of them contains the content and comments. The comment can be commented as well. If we were using traditional database, let's say SQL Server, the database schema might be defined as below.
When we need to display a post we need to load the post content from the Posts table, as well as the comments from the Comments table. We also need to build the comment tree based on the CommentID field.
But if were using DocumentDB, what we need to do is to save the post as a document with a list contains all comments. Under a comment all sub comments will be a list in it. When we display this post we just need to to query the post document, the content and all comments will be loaded in proper structure.
1: {
2: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
3: "title": "xxxxx",
4: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
5: "postedOn": "08/25/2014 13:55",
6: "comments":
7: [
8: {
9: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
10: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
11: "commentedOn": "08/25/2014 14:00",
12: "commentedBy": "xxx"
13: },
14: {
15: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
16: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
17: "commentedOn": "08/25/2014 14:10",
18: "commentedBy": "xxx",
19: "comments":
20: [
21: {
22: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
23: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
24: "commentedOn": "08/25/2014 14:18",
25: "commentedBy": "xxx",
26: "comments":
27: [
28: {
29: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
30: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
31: "commentedOn": "08/25/2014 18:22",
32: "commentedBy": "xxx",
33: }
34: ]
35: },
36: {
37: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
38: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
39: "commentedOn": "08/25/2014 15:02",
40: "commentedBy": "xxx",
41: }
42: ]
43: },
44: {
45: "id": "xxxxx-xxxxx-xxxxx-xxxxx",
46: "content": "xxxxx, xxxxxxxxx. xxxxxx, xx, xxxx.",
47: "commentedOn": "08/25/2014 14:30",
48: "commentedBy": "xxx"
49: }
50: ]
51: }
DocumentDB vs. Table Storage
DocumentDB and Table Storage are all NoSQL service in Microsoft Azure. One common question is "when we should use DocumentDB rather than Table Storage". Here are some ideas from me and some MVPs.
First of all, they are different kind of NoSQL database. DocumentDB is a document database while table storage is a key-value database.
Second, table storage is cheaper. DocumentDB supports scale out from one capacity unit to 5 in preview period and each capacity unit provides 10GB local SSD storage. The price is $0.73/day includes 50% discount. For storage service the highest price is $0.061/GB, which is almost 10% of DocumentDB.
Third, table storage provides local-replication, geo-replication, read access geo-replication while DocumentDB doesn't support.
Fourth, there is local emulator for table storage but none for DocumentDB. We have to connect to the DocumentDB on cloud when developing locally.
But, DocumentDB supports some cool features that table storage doesn't have. It supports store procedure, trigger and user-defined-function. It supports rich indexing while table storage only supports indexing against partition key and row key. It supports transaction, table storage supports as well but restricted with Entity Group Transaction scope. And the last, table storage is GA but DocumentDB is still in preview.
Summary
In this post I have a quick demonstration and introduction about the new DocumentDB service in Azure. It's very easy to interact through .NET and it also support REST API, Node.js SDK and Python SDK.
Then I explained the concept and benefit of using document database, then compared with table storage.
Hope this helps,
Shaun
All documents and related graphics, codes are provided "AS IS" without warranty of any kind.
Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.