Distribute Sort Sample Service
- by kaleidoscope
How it works? Using the front-end of the service, a user can specify a size in MB for the input data set to sort. Algorithm CreateAndSplit The CreateAndSplit task generates the input data and stores them as 10 blobs in the utility storage. The URLs to these blobs are packaged as Separate work items and written to the queue. · Separate The Separate task reads the blobs with the random numbers created in the CreateAndSplit task and places the random numbers into buckets. The interval of the numbers that go into one bucket is chosen so that the expected amount of numbers (assuming a uniform distribution of the numbers in the original data set) is around 100 kB. Each bucket is represented as a blob container in utility storage. Whenever there are 10 blobs in one bucket (i.e., the placement in this bucket is complete because we had 10 original splits), the separate task will generate a new Sort task and write the task into the queue. · Sort The Sort task merges all blobs in a single bucket and sorts them using a standard sort algorithm. The result is stored as a blob in utility storage. · Concat The concat task merges the results of all Sort tasks into a single blob. This blob can be downloaded as a text file using this Web page. As the resulting file is presented in text format, the size of the file is likely to be larger than the specified input file. Anish