Search Results

Search found 11409 results on 457 pages for 'large teams'.

Page 70/457 | < Previous Page | 66 67 68 69 70 71 72 73 74 75 76 77  | Next Page >

  • Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

    - by Shaun
    Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK.  But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage.   Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19:  20: index++; 21: } 22: } 23:  24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29:  30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server.   Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution.   In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact,  a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop.   Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10:  11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22:  23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30:  31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4:  5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14:  15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally.   Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12:  13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11:  12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21:  22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14:  15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24:  25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13:  14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20:  21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32:  33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40:  41: offset = end; 42: index++; 43: } 44:  45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72:  73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5:  6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12:  13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16:  17: return View(); 18: } 19:  20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30:  31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40:  41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50:  51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64:  65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74:  75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage.   Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: });   Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation.   Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance.   Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

    Read the article

  • Advanced TSQL Tuning: Why Internals Knowledge Matters

    - by Paul White
    There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes.  Query tuning is not complete as soon as the query returns results quickly in the development or test environments.  In production, your query will compete for memory, CPU, locks, I/O and other resources on the server.  Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data.  In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row.  The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type.  A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL,   CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL,   CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table.  The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results.  This seems slow, considering that the table only has 50,000 rows.  Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time.  Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring.  The script below monitors STATISTICS IO output and the amount of tempdb used by the test query.  We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB.  The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row.  With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first).  This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts.  In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted.  SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed.  Sorts typically require a very large number of comparisons, so this is usually a very effective optimization.  One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself.  SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use.  The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory.  Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant.  The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources.  More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1).  Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB.  The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file.  Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case.  In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms.  If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much.  Why is the data size so much smaller?  The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored.  By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output.  You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead.  SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows.  The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes.  The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page.  There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option.  By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row.  SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage.  You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now.  This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query).  SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant.  No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers.  Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’.  Why?  Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed.  SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages.  This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this?  Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need.  245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory.  Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily?  That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column.  That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal.  I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator.  Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted.  We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need.  The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb.  The small remaining inefficiency is in reading the id column values from the clustered primary key index.  As a clustered index, it contains all the in-row data at its leaf.  The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead.  The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure.  This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only.  This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data.  The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings.  It isn’t about the internal differences between TEXT and MAX data types either.  It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load.  No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance.  Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows.  5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is!  If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb.  Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough.  Tuning for logical reads and adding covering indexes is not enough.  If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on.  The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008.  They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator.  Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

    Read the article

  • Is INT_MIN-1 an underflow or overflow?

    - by Johannes Schaub - litb
    I seem to remember that I was reading that underflow means you have a too small magnitude that cannot be presented anymore in a type overflow means you have a too large magnitude that cannot be presented anymore in a type However, in practice I perceive that the terms are used such that underflow means you have a too small value that cannot be presented anymore in a type overflow means you have a too large value that cannot be presented anymore in a type What is the correct meaning to use here? Are the terms defined differently for integer and floating point types?

    Read the article

  • Fraud Detection with the SQL Server Suite Part 2

    - by Dejan Sarka
    This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

    Read the article

  • SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

    - by Pinal Dave
    [Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance. Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode. In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion.  The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values.  Properly configured, it is reliable and reasonably fast. Among the many settings available on the lookup component, one of the most critical is the cache mode.  This selection will determine whether and how the distinct lookup values are cached during package execution.  It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results. Full Cache The full cache mode setting is the default cache mode selection in the SSIS lookup transformation.  Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location.  As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS. The most commonly used cache mode is the full cache setting, and for good reason.  The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well.  Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution. There are a few potential gotchas to be aware of when using full cache mode.  First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data.  If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data.  Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation.  This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation).  There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation.  Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation. Full cache mode is ideally useful when one or all of the following conditions exist: The size of the reference data set is small to moderately sized The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data Partial Cache When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow.  Initially, each distinct value will be retrieved individually from the specified source, and then cached.  To be clear, this is a row-by-row lookup for each distinct key value(s). This is a less frequently used cache setting because it addresses a narrower set of scenarios.  Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set.  If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)).  Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode. Using partial cache mode is ideally suited for the conditions below: The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small The size of the lookup data is too large to effectively store in cache The lookup source is well indexed to allow for fast retrieval of row-by-row values No Cache As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS.  As a result, every single row in the pipeline data set will require a query against the lookup source.  Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused.  In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful. As such, it’s critical to know your data before choosing this option.  Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline. I would recommend considering the no cache mode only when all of the below conditions are true: The reference data set is too large to reasonably be loaded into SSIS memory The pipeline data set is small and is not expected to grow There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values) Conclusion The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow.  Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution.  Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

    Read the article

  • Agile PLM 9.3 Service Pack 2 (SP2 or 9.3.0.2) is released along with AUT 1.6.2.0 and AutoVue 20 for

    - by Shane Goodwin
    Oracle released Agile PLM 9.3 SP2 on June 14 and the Agile installer for AutoVue 20 for Agile PLM on April 30. Also available are the new versions of AUT and Averify - 1.6.3 for both tools. 9.3 SP2 is a combined English and NLS release for use on any version of 9.3.0. SP2 contains many bug fixes and rolls up several Hot Fixes - please review the Readme for all the details. In addition, this release also addresses some scalability issues when working with very large Exports and Reports. When exporting very large BOMs, the export module will now release objects more efficiently to reduce the amount of memory consumed on the Application Server. Adminstrators can also control the maximum row limits for Users verses system processes, like ACS. Several out of the box BOM reports have also been changed to use a new row limit option. The combination of all these changes will provide more stability on the application server for customers managing very large datasets. 9.3 SP2 also adds support for Oracle Database 11gR2 for Windows, Oracle Internet Directory (OID) and Oracle Access Manager (OAM). Please note that currently the Variant Patch is not intended to be released for SP2. Customers running the Variant Patch should remain on 9.3.0.0 or 9.3.0.1. Back in April, we also released the AutoVue 20 for Agile PLM installer. AutoVue 20 has many new features which will help Agile PLM customers. Large multi-page Word documents and 2D CAD documents will open more quickly to the first page or first rendition. Memory usage is less when working with 3D Models. There are many new formats supported for MCAD, 2D Cad, and EDA. AutoVue 20 is immediately available for Windows and Linux platforms. The new software can be found in Edelivery or Metalink / Oracle Support: - AutoVue 20 for Agile PLM is on E-Delivery with part number B58963-01 - Oracle Agile PLM 9.3 Service Pack 2 (9.3.0.2) My Oracle Support Patch ID 9782736 - AVERIFY 1.6.3 My Oracle Support Patch ID 9791892 - AUT 1.6.3 My Oracle Support Patch ID 9791908 - Agile PLM 9.3 SP2 Documentation is available on the OTN Agile Documentation Page

    Read the article

  • Windows Azure Recipe: Big Data

    - by Clint Edmonson
    As the name implies, what we’re talking about here is the explosion of electronic data that comes from huge volumes of transactions, devices, and sensors being captured by businesses today. This data often comes in unstructured formats and/or too fast for us to effectively process in real time. Collectively, we call these the 4 big data V’s: Volume, Velocity, Variety, and Variability. These qualities make this type of data best managed by NoSQL systems like Hadoop, rather than by conventional Relational Database Management System (RDBMS). We know that there are patterns hidden inside this data that might provide competitive insight into market trends.  The key is knowing when and how to leverage these “No SQL” tools combined with traditional business such as SQL-based relational databases and warehouses and other business intelligence tools. Drivers Petabyte scale data collection and storage Business intelligence and insight Solution The sketch below shows one of many big data solutions using Hadoop’s unique highly scalable storage and parallel processing capabilities combined with Microsoft Office’s Business Intelligence Components to access the data in the cluster. Ingredients Hadoop – this big data industry heavyweight provides both large scale data storage infrastructure and a highly parallelized map-reduce processing engine to crunch through the data efficiently. Here are the key pieces of the environment: Pig - a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Mahout - a machine learning library with algorithms for clustering, classification and batch based collaborative filtering that are implemented on top of Apache Hadoop using the map/reduce paradigm. Hive - data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in distributed storage. Directly accessible to Microsoft Office and other consumers via add-ins and the Hive ODBC data driver. Pegasus - a Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop and that provides algorithms for important graph mining tasks such as Degree, PageRank, Random Walk with Restart (RWR), Radius, and Connected Components. Sqoop - a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. Flume - a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large log data amounts to HDFS. Database – directly accessible to Hadoop via the Sqoop based Microsoft SQL Server Connector for Apache Hadoop, data can be efficiently transferred to traditional relational data stores for replication, reporting, or other needs. Reporting – provides easily consumable reporting when combined with a database being fed from the Hadoop environment. Training These links point to online Windows Azure training labs where you can learn more about the individual ingredients described above. Hadoop Learning Resources (20+ tutorials and labs) Huge collection of resources for learning about all aspects of Apache Hadoop-based development on Windows Azure and the Hadoop and Windows Azure Ecosystems SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

    Read the article

  • Google I/O 2012 - Managing Google Compute Engine Virtual Machines Through Google App Engine

    Google I/O 2012 - Managing Google Compute Engine Virtual Machines Through Google App Engine Alon Levi, Adam Eijdenberg Google Compute Engine provides highly efficient and scalable virtual machines for large scale data processing operations. Integration with Google App Engine provides an orchestration framework to manage large virtual machine clusters used for data processing. This session will talk demonstrate integration and discuss future use cases of the two technologies. For all I/O 2012 sessions, go to developers.google.com From: GoogleDevelopers Views: 0 0 ratings Time: 51:06 More in Science & Technology

    Read the article

  • What does your Technical Documentation look like?

    - by Rachel
    I'm working on a large project and I would like to put together some technical documentation for other members of the team and for new programmers joining the project. What sort of documentation should I have? Just /// code comments or some other file(s) explaining the architechure and class design? I've never really done documentation except the occasional word doc to go with smaller apps, and I think this project is too large to doc in a single word file.

    Read the article

  • Agile Testing Days 2012 – Day 3 – Agile or agile?

    - by Chris George
    Another early start for my last Lean Coffee of the conference, and again it was not wasted. We had some really interesting discussions around how to determine what test automation is useful, if agile is not faster, why do it? and a rather existential discussion on whether unicorns exist! First keynote of the day was entitled “Fast Feedback Teams” by Ola Ellnestam. Again this relates nicely to the releasing faster talk on day 2, and something that we are looking at and some teams are actively trying. Introducing the notion of feedback, Ola describes a game he wrote for his eldest child. It was a simple game where every time he clicked a button, it displayed “You’ve Won!”. He then changed it to be a Win-Lose-Win-Lose pattern and watched the feedback from his son who then twigged the pattern and got his younger brother to play, alternating turns… genius! (must do that with my children). The idea behind this was that you need that feedback loop to learn and progress. If you are not getting the feedback you need to close that loop. An interesting point Ola made was to solve problems BEFORE writing software. It may be that you don’t have to write anything at all, perhaps it’s a communication/training issue? Perhaps the problem can be solved another way. Writing software, although it’s the business we are in, is expensive, and this should be taken into account. He again mentions frequent releases, and how they should be made as soon as stuff is ready to be released, don’t leave stuff on the shelf cause it’s not earning you anything, money or data. I totally agree with this and it’s something that we will be aiming for moving forwards. “Exceptions, Assumptions and Ambiguity: Finding the truth behind the story” by David Evans started off very promising by making references to ‘Grim up North’ referring to the north of England. Not sure it was appreciated by most of the audience, but it made me laugh! David explained how there are always risks associated with exceptions, giving the example of a one-way road near where he lives, with an exception sign giving rights to coaches to go the wrong way. Therefore you could merrily swing around the corner of the one way road straight into a coach! David showed the danger in making assumptions with lyrical quotes from Lola by The Kinks “I’m glad I’m a man, and so is Lola” and with a picture of a toilet flush that needed instructions to operate the full and half flush. With this particular flush, you pulled the handle all the way down to half flush, and half way down to full flush! hmmm, a bit of a crappy user experience methinks! Then through a clever use of a passage from the Jabberwocky, David then went onto show how mis-translation/ambiguity is the can completely distort the original meaning of something, and this is a real enemy of software development. This was all helping to demonstrate that the term Story is often heavily overloaded in the Agile world, and should really be stripped back to what it is really for, stating a business problem, and offering a technical solution. Therefore a story could be worded as “In order to {make some improvement}, we will { do something}”. The first ‘in order to’ statement is stakeholder neutral, and states the problem through requesting an improvement to the software/process etc. The second part of the story is the verb, the doing bit. So to achieve the ‘improvement’ which is not currently true, we will do something to make this true in the future. My PM is very interested in this, and he’s observed some of the problems of overloading stories so I’m hoping between us we can use some of David’s suggestions to help clarify our stories better. The second keynote of the day (and our last) proved to be the most entertaining and exhausting of the conference for me. “The ongoing evolution of testing in agile development” by Scott Barber. I’ve never had the pleasure of seeing Scott before… OMG I would love to have even half of the energy he has! What struck me during this presentation was Scott’s explanation of how testing has become the role/job that it is (largely) today, and how this has led to the need for ‘methodologies’ to make dev and test work! The argument that we should be trying to converge the roles again is a very valid one, and one that a couple of the teams at work are actively doing with great results. Making developers as responsible for quality as testers is something that has been lost over the years, but something that we are now striving to achieve. The idea that we (testers) should be testing experts/specialists, not testing ‘union members’, supports this idea so the entire team works on all aspects of a feature/product, with the ‘specialists’ taking the lead and advising/coaching the others. This leads to better propagation of information around the team, a greater holistic understanding of the project and it allows the team to continue functioning if some of it’s members are off sick, for example. Feeling somewhat drained from Scott’s keynote (but at the same time excited that alot of the points he raised supported actions we are taking at work), I headed into my last presentation for Agile Testing Days 2012 before having to make my way to Tegel to catch the flight home. “Thinking and working agile in an unbending world” with Pete Walen was a talk I was not going to miss! Having spoken to Pete several times during the past few days, I was looking forward to hearing what he was going to say, and I was not disappointed. Pete started off by trying to separate the definitions of ‘Agile’ as in the methodology, and ‘agile’ as in the adjective by pronouncing them the ‘english’ and ‘american’ ways. So Agile pronounced (Ajyle) and agile pronounced (ajul). There was much confusion around what the hell he was talking about, although I thought it was quite clear. Agile – Software development methodology agile – Marked by ready ability to move with quick easy grace; Having a quick resourceful and adaptable character. Anyway, that aside (although it provided a few laughs during the presentation), the point was that many teams that claim to be ‘Agile’ but are not, in fact, ‘agile’ by nature. Implementing ‘Agile’ methodologies that are so prescriptive actually goes against the very nature of Agile development where a team should anticipate, adapt and explore. Pete made a valid point that very few companies intentionally put up roadblocks to impede work, so if work is being blocked/delayed, why? This is where being agile as a team pays off because the team can inspect what’s going on, explore options and adapt their processes. It is through experimentation (and that means trying and failing as well as trying and succeeding) that a team will improve and grow leading to focussing on what really needs to be done to achieve X. So, that was it, the last talk of our conference. I was gutted that we had to miss the closing keynote from Matt Heusser, as Matt was another person I had spoken too a few times during the conference, but the flight would not wait, and just as well we left when we did because the traffic was a nightmare! My Takeaway Triple from Day 3: Release often and release small – don’t leave stuff on the shelf Keep the meaning of the word ‘agile’ in mind when working in ‘Agile Look at testing as more of a skill than a role  

    Read the article

  • Architecture/pattern resources for small applications and tools

    - by s73v3r
    I was wondering if anyone had any resources or advice related to using architecture patterns like MVVM/MVC/MVP/etc on small applications and tools, as opposed to large, enterprisy ones. EDIT: Most of the information I see on application architecture is directed at large, enterprise applications. I'm just writing small programs and tools. As far as using these architecture patterns, is it generally worthwhile to go through the overhead of using an MVC/MVVM framework? Or would I be better off keeping it simple?

    Read the article

  • How to Avoid Duplicate Content in Wordpress Ecommerce Store

    - by Bhanuprakash Moturu
    hi i run a word press eCommerce store powered by woo commerce . i have a large inventory of products most of the product description is same for all products and its mandatory to include it. its creating a large duplicate content on site each category have 6 products i thought of a solution can you suggest which one is good 1 no index and follow product page and link it to categories page using canonical tag 2 index and nofollow product page and link it to categories page using canonical tag which is the best solution and is it a good practice to use canonical tag to link to categories page

    Read the article

  • Sending HTML Newsletters in a Batch Using SQL Server

    Sending a large volume of emails can put a strain on a mail relay server. However many people using SQL Server will need to do just that for things like newsletters, mailing lists, etc. Satnam Singh brings us a way to spread out the load by sending newsletters in multiple batches instead of one large process.

    Read the article

  • Is Innovation Dead?

    - by wulfers
    My question is has innovation died?  For large businesses that do not have a vibrant, and fearless leadership (see Apple under Steve jobs), I think is has.  If you look at the organizational charts for many of the large corporate megaliths you will see a plethora of middle managers who are so risk averse that innovation (any change involves risk) is choked off since there are no innovation champions in the middle layers.  And innovation driven top down can only happen when you have a visionary in the top ranks, and that is also very rare.So where is actual innovation happening, at the bottom layer, the people who live in the trenches…   The people who live for a challenge. So how can big business leverage this innovation layer?  Remove the middle management layer.   Provide an innovation champion who has an R&D budget and is tasked with working with the bottom layer of a company, the engineers, developers  and business analysts that live on the edge (Where the corporate tires meet the road). Here are two innovation failures I will tell you about, and both have been impacted by a company so risk averse it is starting to fail in its primary business ventures: This company initiated an innovation process several years ago.  The process was driven companywide with team managers being the central points of collection of innovative ideas.  These managers were given no budget to do anything with these ideas.  There was no process or incentive for these managers to drive it about their team.  This lasted close to a year and the innovation program slowly slipped into oblivion…. A second example:  This same company failed an attempt to market a consumer product in a line where there was already a major market leader.  This product was under development for several years and needed to provide some major device differentiation form the current market leader.  This same company had a large Lead Technologist community made up of real innovators in all areas of technology.  Did this same company leverage the skills and experience of this internal community,   NO!!! So to wrap this up, if large companies really want to survive, then they need to start acting like a small company.  Support those innovators and risk takers!  Reward them by implementing their innovative ideas.  Champion (from the top down) innovation (found at the bottom) in your companies.  Remember if you stand still you are really falling behind.Do it now!  Take a risk!

    Read the article

  • Gigantic 2d maps?

    - by Jesper Hallenberg
    I've been thinking about a game idea for a week or two and the last few hours I was thinking of some technical stuff and came up with that the map would need to be 360,000x360,000 pixels in size. To me it sounds insane, but I've never done much in depth game development so I'm not sure if its reasonable or not to have a map that large. Basicly my question is, would it work to have a map that large in a 2D game?

    Read the article

  • Reflecting on week long Training with SQLSkills

    - by NeilHambly
    Time for a quick reflection on my 5-day's training with SQLSkills, they have 4 weeks in their immersion training program, this was week 1: Internals & Performance held @ large Heathrow Hotel http://www.sqlskills.com/T_ImmersionInternalsDesign.asp So was the Course worth the Time and Money... undoubtedly, I believe we had a large number of the people there also self-funding along with the lucky corporate sponsored ones. It was akin to doing say the "London marathon" in that you know...(read more)

    Read the article

  • How to stretch the image from one screen to two screens

    - by wxiiir
    I want to be able to stretch the image from one monitor to a second monitor even if i have to use some software to do it. For example i want the top half of the image that is now shown on my first monitor to occupy the whole second monitor and i want the bottom half to occupy the whole first monitor, or the other way around i don't really care as long as it works. I would be cool to know how to do the same stuff but to the left half and right half. I know that image pixels would have twice the lenght or height this way but i don't really care about it as long as it works so basically i want the stuff to show on both monitors but with the same pixels as before. I have a hd4870 and windows7 and hd4000 family doesnt support having two monitors behaving like a large one, only hd5000 upwards, this would solve my problems without any of the drawbacks but it just can't be done (or maybe it can via software but i'm just too tired of searching). A solution to make almost any graphic card have two monitors behaving like a large one is matrox dualhead2go but that's just as expensive as a good hd5000 card so it's not worth it. thanks in advance EDIT I guess that nobody so far was able to fully comprehend my problem that was very explicitly written but i will elaborate some more. My hd4870 can have 2 monitors working with it but some stuff like games won't run on both monitors, which sucks. There are some ways to circumvent this problem and two of them are perfect or almost perfect but expensive and the third would be a software solution that would make it possible. The first one is to have and hd5000 family video card which will work just fine with both monitors. The second is to have a matrox dualhead2go that will make my hd4870 detect my two monitors as a large monitor. The third is to have a software that makes my two displays be detected as a large display and then captures the output of the video card, splits the images and renders them as 2d images to both monitors OR a simpler one but that would make outputted pixels double the width or height would be to capture the output of the graphics card to one screen, split it in two and enlarge it to fit both monitors and then output it to the monitors. p.s. By capturing the output of the video card i mean just make the video card process the stuff in a certain way. Making the video card detect two monitors as a large one via software may be a bit impossible or impracticable but stretching the output as a 2d image from one to both monitors for some coders should be a walk in the park so it would be likely that such program would exist or that some widespread softwares for dual monitor would have such function in them.

    Read the article

  • Importance of the SEO and Web Hosting Course

    The most important thing of having a website for your business is the ability of the site to generate a large number of traffic, hence it does not matter how well the site is designed or created. Even if the services or products that are offered by the business are of high quality, if the site is not able to generate a large number of traffic then surely you won't be able to sell anything.

    Read the article

  • Squid + Dans Guardian (simple configuration)

    - by The Digital Ninja
    I just built a new proxy server and compiled the latest versions of squid and dansguardian. We use basic authentication to select what users are allowed outside of our network. It seems squid is working just fine and accepts my username and password and lets me out. But if i connect to dans guardian, it prompts for username and password and then displays a message saying my username is not allowed to access the internet. Its pulling my username for the error message so i know it knows who i am. The part i get confused on is i thought that part was handled all by squid, and squid is working flawlessly. Can someone please double check my config files and tell me if i'm missing something or there is some new option i must set to get this to work. dansguardian.conf # Web Access Denied Reporting (does not affect logging) # # -1 = log, but do not block - Stealth mode # 0 = just say 'Access Denied' # 1 = report why but not what denied phrase # 2 = report fully # 3 = use HTML template file (accessdeniedaddress ignored) - recommended # reportinglevel = 3 # Language dir where languages are stored for internationalisation. # The HTML template within this dir is only used when reportinglevel # is set to 3. When used, DansGuardian will display the HTML file instead of # using the perl cgi script. This option is faster, cleaner # and easier to customise the access denied page. # The language file is used no matter what setting however. # languagedir = '/etc/dansguardian/languages' # language to use from languagedir. language = 'ukenglish' # Logging Settings # # 0 = none 1 = just denied 2 = all text based 3 = all requests loglevel = 3 # Log Exception Hits # Log if an exception (user, ip, URL, phrase) is matched and so # the page gets let through. Can be useful for diagnosing # why a site gets through the filter. on | off logexceptionhits = on # Log File Format # 1 = DansGuardian format 2 = CSV-style format # 3 = Squid Log File Format 4 = Tab delimited logfileformat = 1 # Log file location # # Defines the log directory and filename. #loglocation = '/var/log/dansguardian/access.log' # Network Settings # # the IP that DansGuardian listens on. If left blank DansGuardian will # listen on all IPs. That would include all NICs, loopback, modem, etc. # Normally you would have your firewall protecting this, but if you want # you can limit it to only 1 IP. Yes only one. filterip = # the port that DansGuardian listens to. filterport = 8080 # the ip of the proxy (default is the loopback - i.e. this server) proxyip = 127.0.0.1 # the port DansGuardian connects to proxy on proxyport = 3128 # accessdeniedaddress is the address of your web server to which the cgi # dansguardian reporting script was copied # Do NOT change from the default if you are not using the cgi. # accessdeniedaddress = 'http://YOURSERVER.YOURDOMAIN/cgi-bin/dansguardian.pl' # Non standard delimiter (only used with accessdeniedaddress) # Default is enabled but to go back to the original standard mode dissable it. nonstandarddelimiter = on # Banned image replacement # Images that are banned due to domain/url/etc reasons including those # in the adverts blacklists can be replaced by an image. This will, # for example, hide images from advert sites and remove broken image # icons from banned domains. # 0 = off # 1 = on (default) usecustombannedimage = 1 custombannedimagefile = '/etc/dansguardian/transparent1x1.gif' # Filter groups options # filtergroups sets the number of filter groups. A filter group is a set of content # filtering options you can apply to a group of users. The value must be 1 or more. # DansGuardian will automatically look for dansguardianfN.conf where N is the filter # group. To assign users to groups use the filtergroupslist option. All users default # to filter group 1. You must have some sort of authentication to be able to map users # to a group. The more filter groups the more copies of the lists will be in RAM so # use as few as possible. filtergroups = 1 filtergroupslist = '/etc/dansguardian/filtergroupslist' # Authentication files location bannediplist = '/etc/dansguardian/bannediplist' exceptioniplist = '/etc/dansguardian/exceptioniplist' banneduserlist = '/etc/dansguardian/banneduserlist' exceptionuserlist = '/etc/dansguardian/exceptionuserlist' # Show weighted phrases found # If enabled then the phrases found that made up the total which excedes # the naughtyness limit will be logged and, if the reporting level is # high enough, reported. on | off showweightedfound = on # Weighted phrase mode # There are 3 possible modes of operation: # 0 = off = do not use the weighted phrase feature. # 1 = on, normal = normal weighted phrase operation. # 2 = on, singular = each weighted phrase found only counts once on a page. # weightedphrasemode = 2 # Positive result caching for text URLs # Caches good pages so they don't need to be scanned again # 0 = off (recommended for ISPs with users with disimilar browsing) # 1000 = recommended for most users # 5000 = suggested max upper limit urlcachenumber = # # Age before they are stale and should be ignored in seconds # 0 = never # 900 = recommended = 15 mins urlcacheage = # Smart and Raw phrase content filtering options # Smart is where the multiple spaces and HTML are removed before phrase filtering # Raw is where the raw HTML including meta tags are phrase filtered # CPU usage can be effectively halved by using setting 0 or 1 # 0 = raw only # 1 = smart only # 2 = both (default) phrasefiltermode = 2 # Lower casing options # When a document is scanned the uppercase letters are converted to lower case # in order to compare them with the phrases. However this can break Big5 and # other 16-bit texts. If needed preserve the case. As of version 2.7.0 accented # characters are supported. # 0 = force lower case (default) # 1 = do not change case preservecase = 0 # Hex decoding options # When a document is scanned it can optionally convert %XX to chars. # If you find documents are getting past the phrase filtering due to encoding # then enable. However this can break Big5 and other 16-bit texts. # 0 = disabled (default) # 1 = enabled hexdecodecontent = 0 # Force Quick Search rather than DFA search algorithm # The current DFA implementation is not totally 16-bit character compatible # but is used by default as it handles large phrase lists much faster. # If you wish to use a large number of 16-bit character phrases then # enable this option. # 0 = off (default) # 1 = on (Big5 compatible) forcequicksearch = 0 # Reverse lookups for banned site and URLs. # If set to on, DansGuardian will look up the forward DNS for an IP URL # address and search for both in the banned site and URL lists. This would # prevent a user from simply entering the IP for a banned address. # It will reduce searching speed somewhat so unless you have a local caching # DNS server, leave it off and use the Blanket IP Block option in the # bannedsitelist file instead. reverseaddresslookups = off # Reverse lookups for banned and exception IP lists. # If set to on, DansGuardian will look up the forward DNS for the IP # of the connecting computer. This means you can put in hostnames in # the exceptioniplist and bannediplist. # It will reduce searching speed somewhat so unless you have a local DNS server, # leave it off. reverseclientiplookups = off # Build bannedsitelist and bannedurllist cache files. # This will compare the date stamp of the list file with the date stamp of # the cache file and will recreate as needed. # If a bsl or bul .processed file exists, then that will be used instead. # It will increase process start speed by 300%. On slow computers this will # be significant. Fast computers do not need this option. on | off createlistcachefiles = on # POST protection (web upload and forms) # does not block forms without any file upload, i.e. this is just for # blocking or limiting uploads # measured in kibibytes after MIME encoding and header bumph # use 0 for a complete block # use higher (e.g. 512 = 512Kbytes) for limiting # use -1 for no blocking #maxuploadsize = 512 #maxuploadsize = 0 maxuploadsize = -1 # Max content filter page size # Sometimes web servers label binary files as text which can be very # large which causes a huge drain on memory and cpu resources. # To counter this, you can limit the size of the document to be # filtered and get it to just pass it straight through. # This setting also applies to content regular expression modification. # The size is in Kibibytes - eg 2048 = 2Mb # use 0 for no limit maxcontentfiltersize = # Username identification methods (used in logging) # You can have as many methods as you want and not just one. The first one # will be used then if no username is found, the next will be used. # * proxyauth is for when basic proxy authentication is used (no good for # transparent proxying). # * ntlm is for when the proxy supports the MS NTLM authentication # protocol. (Only works with IE5.5 sp1 and later). **NOT IMPLEMENTED** # * ident is for when the others don't work. It will contact the computer # that the connection came from and try to connect to an identd server # and query it for the user owner of the connection. usernameidmethodproxyauth = on usernameidmethodntlm = off # **NOT IMPLEMENTED** usernameidmethodident = off # Preemptive banning - this means that if you have proxy auth enabled and a user accesses # a site banned by URL for example they will be denied straight away without a request # for their user and pass. This has the effect of requiring the user to visit a clean # site first before it knows who they are and thus maybe an admin user. # This is how DansGuardian has always worked but in some situations it is less than # ideal. So you can optionally disable it. Default is on. # As a side effect disabling this makes AD image replacement work better as the mime # type is know. preemptivebanning = on # Misc settings # if on it adds an X-Forwarded-For: <clientip> to the HTTP request # header. This may help solve some problem sites that need to know the # source ip. on | off forwardedfor = on # if on it uses the X-Forwarded-For: <clientip> to determine the client # IP. This is for when you have squid between the clients and DansGuardian. # Warning - headers are easily spoofed. on | off usexforwardedfor = off # if on it logs some debug info regarding fork()ing and accept()ing which # can usually be ignored. These are logged by syslog. It is safe to leave # it on or off logconnectionhandlingerrors = on # Fork pool options # sets the maximum number of processes to sporn to handle the incomming # connections. Max value usually 250 depending on OS. # On large sites you might want to try 180. maxchildren = 180 # sets the minimum number of processes to sporn to handle the incomming connections. # On large sites you might want to try 32. minchildren = 32 # sets the minimum number of processes to be kept ready to handle connections. # On large sites you might want to try 8. minsparechildren = 8 # sets the minimum number of processes to sporn when it runs out # On large sites you might want to try 10. preforkchildren = 10 # sets the maximum number of processes to have doing nothing. # When this many are spare it will cull some of them. # On large sites you might want to try 64. maxsparechildren = 64 # sets the maximum age of a child process before it croaks it. # This is the number of connections they handle before exiting. # On large sites you might want to try 10000. maxagechildren = 5000 # Process options # (Change these only if you really know what you are doing). # These options allow you to run multiple instances of DansGuardian on a single machine. # Remember to edit the log file path above also if that is your intention. # IPC filename # # Defines IPC server directory and filename used to communicate with the log process. ipcfilename = '/tmp/.dguardianipc' # URL list IPC filename # # Defines URL list IPC server directory and filename used to communicate with the URL # cache process. urlipcfilename = '/tmp/.dguardianurlipc' # PID filename # # Defines process id directory and filename. #pidfilename = '/var/run/dansguardian.pid' # Disable daemoning # If enabled the process will not fork into the background. # It is not usually advantageous to do this. # on|off ( defaults to off ) nodaemon = off # Disable logging process # on|off ( defaults to off ) nologger = off # Daemon runas user and group # This is the user that DansGuardian runs as. Normally the user/group nobody. # Uncomment to use. Defaults to the user set at compile time. # daemonuser = 'nobody' # daemongroup = 'nobody' # Soft restart # When on this disables the forced killing off all processes in the process group. # This is not to be confused with the -g run time option - they are not related. # on|off ( defaults to off ) softrestart = off maxcontentramcachescansize = 2000 maxcontentfilecachescansize = 20000 downloadmanager = '/etc/dansguardian/downloadmanagers/default.conf' authplugin = '/etc/dansguardian/authplugins/proxy-basic.conf' Squid.conf http_port 3128 hierarchy_stoplist cgi-bin ? acl QUERY urlpath_regex cgi-bin \? cache deny QUERY acl apache rep_header Server ^Apache #broken_vary_encoding allow apache access_log /squid/var/logs/access.log squid hosts_file /etc/hosts auth_param basic program /squid/libexec/ncsa_auth /squid/etc/userbasic.auth auth_param basic children 5 auth_param basic realm proxy auth_param basic credentialsttl 2 hours auth_param basic casesensitive off refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern . 0 20% 4320 acl NoAuthNec src <HIDDEN FOR SECURITY> acl BrkRm src <HIDDEN FOR SECURITY> acl Dials src <HIDDEN FOR SECURITY> acl Comps src <HIDDEN FOR SECURITY> acl whsws dstdom_regex -i .opensuse.org .novell.com .suse.com mirror.mcs.an1.gov mirrors.kernerl.org www.suse.de suse.mirrors.tds.net mirrros.usc.edu ftp.ale.org suse.cs.utah.edu mirrors.usc.edu mirror.usc.an1.gov linux.nssl.noaa.gov noaa.gov .kernel.org ftp.ale.org ftp.gwdg.de .medibuntu.org mirrors.xmission.com .canonical.com .ubuntu. acl opensites dstdom_regex -i .mbsbooks.com .bowker.com .usps.com .usps.gov .ups.com .fedex.com go.microsoft.com .microsoft.com .apple.com toolbar.msn.com .contacts.msn.com update.services.openoffice.org fms2.pointroll.speedera.net services.wmdrm.windowsmedia.com windowsupdate.com .adobe.com .symantec.com .vitalbook.com vxn1.datawire.net vxn.datawire.net download.lavasoft.de .download.lavasoft.com .lavasoft.com updates.ls-servers.com .canadapost. .myyellow.com minirick symantecliveupdate.com wm.overdrive.com www.overdrive.com productactivation.one.microsoft.com www.update.microsoft.com testdrive.whoson.com www.columbia.k12.mo.us banners.wunderground.com .kofax.com .gotomeeting.com tools.google.com .dl.google.com .cache.googlevideo.com .gpdl.google.com .clients.google.com cache.pack.google.com kh.google.com maps.google.com auth.keyhole.com .contacts.msn.com .hrblock.com .taxcut.com .merchantadvantage.com .jtv.com .malwarebytes.org www.google-analytics.com dcs.support.xerox.com .dhl.com .webtrendslive.com javadl-esd.sun.com javadl-alt.sun.com .excelsior.edu .dhlglobalmail.com .nessus.org .foxitsoftware.com foxit.vo.llnwd.net installshield.com .mindjet.com .mediascouter.com media.us.elsevierhealth.com .xplana.com .govtrack.us sa.tulsacc.edu .omniture.com fpdownload.macromedia.com webservices.amazon.com acl password proxy_auth REQUIRED acl all src all acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl SSL_ports port 443 563 631 2001 2005 8731 9001 9080 10000 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port # https, snews 443 563 acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port # unregistered ports 1936-65535 acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 10000 acl Safe_ports port 631 acl Safe_ports port 901 # SWAT acl purge method PURGE acl CONNECT method CONNECT acl UTubeUsers proxy_auth "/squid/etc/utubeusers.list" acl RestrictUTube dstdom_regex -i youtube.com acl RestrictFacebook dstdom_regex -i facebook.com acl FacebookUsers proxy_auth "/squid/etc/facebookusers.list" acl BuemerKEC src 10.10.128.0/24 acl MBSsortnet src 10.10.128.0/26 acl MSNExplorer browser -i MSN acl Printers src <HIDDEN FOR SECURITY> acl SpecialFolks src <HIDDEN FOR SECURITY> # streaming download acl fails rep_mime_type ^.*mms.* acl fails rep_mime_type ^.*ms-hdr.* acl fails rep_mime_type ^.*x-fcs.* acl fails rep_mime_type ^.*x-ms-asf.* acl fails2 urlpath_regex dvrplayer mediastream mms:// acl fails2 urlpath_regex \.asf$ \.afx$ \.flv$ \.swf$ acl deny_rep_mime_flashvideo rep_mime_type -i video/flv acl deny_rep_mime_shockwave rep_mime_type -i ^application/x-shockwave-flash$ acl x-type req_mime_type -i ^application/octet-stream$ acl x-type req_mime_type -i application/octet-stream acl x-type req_mime_type -i ^application/x-mplayer2$ acl x-type req_mime_type -i application/x-mplayer2 acl x-type req_mime_type -i ^application/x-oleobject$ acl x-type req_mime_type -i application/x-oleobject acl x-type req_mime_type -i application/x-pncmd acl x-type req_mime_type -i ^video/x-ms-asf$ acl x-type2 rep_mime_type -i ^application/octet-stream$ acl x-type2 rep_mime_type -i application/octet-stream acl x-type2 rep_mime_type -i ^application/x-mplayer2$ acl x-type2 rep_mime_type -i application/x-mplayer2 acl x-type2 rep_mime_type -i ^application/x-oleobject$ acl x-type2 rep_mime_type -i application/x-oleobject acl x-type2 rep_mime_type -i application/x-pncmd acl x-type2 rep_mime_type -i ^video/x-ms-asf$ acl RestrictHulu dstdom_regex -i hulu.com acl broken dstdomain cms.montgomerycollege.edu events.columbiamochamber.com members.columbiamochamber.com public.genexusserver.com acl RestrictVimeo dstdom_regex -i vimeo.com acl http_port port 80 #http_reply_access deny deny_rep_mime_flashvideo #http_reply_access deny deny_rep_mime_shockwave #streaming files #http_access deny fails #http_reply_access deny fails #http_access deny fails2 #http_reply_access deny fails2 #http_access deny x-type #http_reply_access deny x-type #http_access deny x-type2 #http_reply_access deny x-type2 follow_x_forwarded_for allow localhost acl_uses_indirect_client on log_uses_indirect_client on http_access allow manager localhost http_access deny manager http_access allow purge localhost http_access deny purge http_access allow SpecialFolks http_access deny CONNECT !SSL_ports http_access allow whsws http_access allow opensites http_access deny BuemerKEC !MBSsortnet http_access deny BrkRm RestrictUTube RestrictFacebook RestrictVimeo http_access allow RestrictUTube UTubeUsers http_access deny RestrictUTube http_access allow RestrictFacebook FacebookUsers http_access deny RestrictFacebook http_access deny RestrictHulu http_access allow NoAuthNec http_access allow BrkRm http_access allow FacebookUsers RestrictVimeo http_access deny RestrictVimeo http_access allow Comps http_access allow Dials http_access allow Printers http_access allow password http_access deny !Safe_ports http_access deny SSL_ports !CONNECT http_access allow http_port http_access deny all http_reply_access allow all icp_access allow all access_log /squid/var/logs/access.log squid visible_hostname proxy.site.com forwarded_for off coredump_dir /squid/cache/ #header_access Accept-Encoding deny broken #acl snmppublic snmp_community mysecretcommunity #snmp_port 3401 #snmp_access allow snmppublic all cache_mem 3 GB #acl snmppublic snmp_community mbssquid #snmp_port 3401 #snmp_access allow snmppublic all

    Read the article

  • Agile Database Techniques: Effective Strategies for the Agile Software Developer – book review

    - by DigiMortal
       Agile development expects mind shift and developers are not the only ones who must be agile. Every chain is as strong as it’s weakest link and same goes also for development teams. Agile Database Techniques: Effective Strategies for the Agile Software Developer by Scott W. Ambler is book that calls also data professionals to be part of agile development. Often are DBA-s in situation where they are not part of application development and later they have to survive large set of applications that all use databases different way. Of course, only some of these applications are not problematic when looking what database server has to do to serve them. I have seen many applications that rape database servers because developers have no clue what is going on in database (~3K queries to database per web application request – have you seen something like this? I have…) Agile Database Techniques covers some object and database design technologies and gives suggestions to development teams about topics they need help or assistance by DBA-s. The book is also good reading for DBA-s who usually are not very strong in object technologies. You can take this book as bridge between these two worlds. I think teams that build object applications that use databases should buy this book and try at least one or two projects out with Ambler’s suggestions. Table of contents Foreword by Jon Kern. Foreword by Douglas K. Barry. Acknowledgments. Introduction. About the Author. Part One: Setting the Foundation. Chapter 1: The Agile Data Method. Chapter 2: From Use Cases to Databases — Real-World UML. Chapter 3: Data Modeling 101. Chapter 4: Data Normalization. Chapter 5: Class Normalization. Chapter 6: Relational Database Technology, Like It or Not. Chapter 7: The Object-Relational Impedance Mismatch. Chapter 8: Legacy Databases — Everything You Need to Know But Are Afraid to Deal With. Part Two: Evolutionary Database Development. Chapter 9: Vive L’ Évolution. Chapter 10: Agile Model-Driven Development (AMDD). Chapter 11: Test-Driven Development (TDD). Chapter 12: Database Refactoring. Chapter 13: Database Encapsulation Strategies. Chapter 14: Mapping Objects to Relational Databases. Chapter 15: Performance Tuning. Chapter 16: Tools for Evolutionary Database Development. Part Three: Practical Data-Oriented Development Techniques. Chapter 17: Implementing Concurrency Control. Chapter 18: Finding Objects in Relational Databases. Chapter 19: Implementing Referential Integrity and Shared Business Logic. Chapter 20: Implementing Security Access Control. Chapter 21: Implementing Reports. Chapter 22: Realistic XML. Part Four: Adopting Agile Database Techniques. Chapter 23: How You Can Become Agile. Chapter 24: Bringing Agility into Your Organization. Appendix: Database Refactoring Catalog. References and Suggested Reading. Index.

    Read the article

  • Value of SOA Specialization interview with Thomas Schaller IPT - part III

    - by Jürgen Kress
    Recognized by Oracle, Preferred by Customers. We had the great opportunity to interview Thomas Schaller – Partner from our SOA Specialized Partner IPT Innovation Process Technology from Switzerland Why did IPT decide to become SOA Specialized? " SOA Specialization is a great branding for IPT. We are the SOA Specialists in the Swiss market, as we focus all our services around SOA. With 65 Swiss consultants focused on SOA Security & SOA Testing & BPM – Business Process Management & BSM – Business Service Modeling the partnership with Oracle as the technology leader in SOA is key, therefore it was important to us to become the first SOA Specialized company in Switzerland. As a result IPT is mentioned by Gartner as one of eight European SOA Consulting Firms and included in „Guide to SOA Consulting and System Integration Service Providers“ Can you describe the marketing activities with Oracle? Once a year we organize the largest SOA Conference in Switzerland “SOA, BPM & Integration Forum 2011“ Oracle is much more than a sponsor for the conference. Jointly we invite our customer base to attend this key event. The sales teams address jointly their most important prospects and customers. Oracle supports us with key speakers who present future directions of the Oracle SOA portfolio like Clemens Utschig-Utschig who presented details about the Complex Event Processing (CEP) solution in 2009 and James Allerton-Austin who presented details about the social BPM solution (BPM) in 2010. Additional our key customers presented their Oracle SOA success stories. How did you team with Oracle around the sales activities? "Sales alignment is key for the successful partnership. When we achieved! SOA Specialization we celebrated jointly with the Oracle and IPT middleware sales team. At the Aperol may interesting discussions resulted in joint opportunities and business. A key section of our joint business planning are marketing and sales activities. Together we define campaign topics and target customers. Matthias Breitschmid our superb Oracle partner manager ensures that the defined sales teams align and start the joint business. Regular we review our joint business plan with the joint management teams and Jürgen Kress our EMEA Oracle Sponsor. It is great to see that both companies profit from each other and we receive leads from Oracle!” Did you get Oracle support to train your consultants in the Oracle SOA Suite? “Enablement is key for us to deliver successful SOA projects. Together with Ralph Bellinghausen from the Oracle Enablement team we defined an Oracle trainings plan for our consultants. The monthly SOA Partner Community newsletter is a great resource to get the latest product updates, webcasts and trainings. As a SOA Specialized partner we get also invited to the SOA Blackbelt trainings, this trainings are hosted by Oracle product management where we get not only first hand information we get also direct access to the developers who can support us in critical project phases. Driven by the customer success we have increased our Oracle SOA practice by more than 200% in the last years!” Why did the customer decide for the IPT SOA offering? “SOA Specialization becomes a brand for customers, it proofs that we have the certified SOA skills and that IPT has delivered successful Oracle SOA projects. Jointly with Oracle and all the support we get from marketing, sales, enablement, support and product management we can ensure our customers to deliver their SOA project successful!” What are the next steps for IPT? “SOA Specialization is a super beneficial for IPT. We are looking forward to our upcoming SOA, BPM & Integration Forum 2011 and prepare to become BPM Specialized. part I Torsten Winterberg, Opitz Consulting & part II Debra Lilley, Fujitsu For more information on SOA Specialization and the SOA Partner Community please feel free to register at www.oracle.com/goto/emea/soa (OPN account required) Blog Twitter LinkedIn Mix Forum Wiki Website

    Read the article

  • Windows Azure Use Case: Agility

    - by BuckWoody
    This is one in a series of posts on when and where to use a distributed architecture design in your organization's computing needs. You can find the main post here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx  Description: Agility in this context is defined as the ability to quickly develop and deploy an application. In theory, the speed at which your organization can develop and deploy an application on available hardware is identical to what you could deploy in a distributed environment. But in practice, this is not always the case. Having an option to use a distributed environment can be much faster for the deployment and even the development process. Implementation: When an organization designs code, they are essentially becoming a Software-as-a-Service (SaaS) provider to their own organization. To do that, the IT operations team becomes the Infrastructure-as-a-Service (IaaS) to the development teams. From there, the software is developed and deployed using an Application Lifecycle Management (ALM) process. A simplified view of an ALM process is as follows: Requirements Analysis Design and Development Implementation Testing Deployment to Production Maintenance In an on-premise environment, this often equates to the following process map: Requirements Business requirements formed by Business Analysts, Developers and Data Professionals. Analysis Feasibility studies, including physical plant, security, manpower and other resources. Request is placed on the work task list if approved. Design and Development Code written according to organization’s chosen methodology, either on-premise or to multiple development teams on and off premise. Implementation Code checked into main branch. Code forked as needed. Testing Code deployed to on-premise Testing servers. If no server capacity available, more resources procured through standard budgeting and ordering processes. Manual and automated functional, load, security, etc. performed. Deployment to Production Server team involved to select platform and environments with available capacity. If no server capacity available, standard budgeting and procurement process followed. If no server capacity available, systems built, configured and put under standard organizational IT control. Systems configured for proper operating systems, patches, security and virus scans. System maintenance, HA/DR, backups and recovery plans configured and put into place. Maintenance Code changes evaluated and altered according to need. In a distributed computing environment like Windows Azure, the process maps a bit differently: Requirements Business requirements formed by Business Analysts, Developers and Data Professionals. Analysis Feasibility studies, including budget, security, manpower and other resources. Request is placed on the work task list if approved. Design and Development Code written according to organization’s chosen methodology, either on-premise or to multiple development teams on and off premise. Implementation Code checked into main branch. Code forked as needed. Testing Code deployed to Azure. Manual and automated functional, load, security, etc. performed. Deployment to Production Code deployed to Azure. Point in time backup and recovery plans configured and put into place.(HA/DR and automated backups already present in Azure fabric) Maintenance Code changes evaluated and altered according to need. This means that several steps can be removed or expedited. It also means that the business function requesting the application can be held directly responsible for the funding of that request, speeding the process further since the IT budgeting process may not be involved in the Azure scenario. An additional benefit is the “Azure Marketplace”, In effect this becomes an app store for Enterprises to select pre-defined code and data applications to mesh or bolt-in to their current code, possibly saving development time. Resources: Whitepaper download- What is ALM?  http://go.microsoft.com/?linkid=9743693  Whitepaper download - ALM and Business Strategy: http://go.microsoft.com/?linkid=9743690  LiveMeeting Recording on ALM and Windows Azure (registration required, but free): http://www.microsoft.com/uk/msdn/visualstudio/contact-us.aspx?sbj=Developing with Windows Azure (ALM perspective) - 10:00-11:00 - 19th Jan 2011

    Read the article

  • Oracle Social Network Developer Challenge: Fishbowl Solutions

    - by Kellsey Ruppel
    Originally posted by Jake Kuramoto on The Apps Lab blog. Today, I give you the final entry in the Oracle Social Network Developer Challenge, held last week during OpenWorld. This one comes from Friend of the ‘Lab and Fishbowl Solutions (@fishbowle20) hacker, John Sim (@jrsim_uix), whom you might remember from his XBox Kinect demo at COLLABORATE 12 (presentation slides and abstract) hacks and other exploits with WebCenter. We put this challenge together specifically for developers like John, who like to experiment with new tools and push the envelope of what’s possible and build cool things, and as you can see from his entry John did just that, mashing together Google Maps and Oracle Social Network into a mobile app built with PhoneGap that uses the device’s camera and GPS to keep teams on the move in touch. He calls it a Mobile GeoTagging Solution, but I think Avengers Assemble! would have equally descriptive, given that was obviously his inspiration. Here’s his description of the mobile app: My proposed solution was to design and simplify GeoLocation mapping, and automate updates for users and teams on the move; who don’t have access to a laptop or want to take their ipads out – but allow them to make quick updates to OSN and upload photos taken from their mobile device – there and then. As part of this; the plan was to include a rules engine that could be configured by the user to allow the device to automatically update and post messages when they arrived at a set location(s). Inspiration for this came from on{x} – automate your life. Unfortunately, John didn’t make it to the conference to show off his hard work in person, but luckily, he had a colleague from Fishbowl and a video to showcase his work.    Here are some shots of John’s mobile app for your viewing pleasure: John’s thinking is sound. Geolocation is usually relegated to consumer use cases, thanks to services like foursquare, but distributed teams working on projects out in the world definitely need a way to stay in contact. Consider a construction job. Different contractors all converge on a single location, and time is money. Rather than calling or texting each other and risking a distracted driving accident, an app like John’s allows everyone on the job to see exactly where the other contractors are. Using his GPS rules, they could easily be notified about how close each is to the site, definitely useful when you have a flooring contractor sitting idle, waiting for an electrician to finish the wiring. The best part is that the project manager or general contractor could stay updated on all the action (or inaction) using Oracle Social Network, either sitting at a desk using the browser app or desktop client or on the go, using one of the native mobile apps built for Oracle Social Network. I can see this being used by insurance adjusters too, and really any team that, erm, assembles at a given spot. Of course, it’s also useful for meeting at the pub after the day’s work is done. Beyond people, this solution could also be implemented for physical objects that are in route to a destination. Say you’re a customer waiting on rail shipment or a package delivery. You could track your valuable’s whereabouts easily as they report their progress via checkins. If they deviated from the GPS rules, you’d be notified. You might even be able to get a picture into Oracle Social Network with some light hacking. Thanks to John and his colleagues at Fishbowl for participating in our challenge. We hope everyone had a good experience. Make sure to check out John’s blog post on his work and the experience using Oracle Social Network. Although this is the final, official entry we had, tomorrow, I’ll show you the work of someone who finished code, but wasn’t able to make the judging event. Stay tuned.

    Read the article

  • How to prepare for a programming competition? Graphs, Stacks, Trees, oh my! [closed]

    - by Simucal
    Last semester I attended ACM's (Association for Computing Machinery) bi-annual programming competition at a local University. My University sent 2 teams of 3 people and we competed amongst other schools in the mid-west. We got our butts kicked. You are given a packet with about 11 problems (1 problem per page) and you have 4 hours to solve as many as you can. They'll run your program you submit against a set of data and your output must match theirs exactly. In fact, the judging is automated for the most part. In any case.. I went there fairly confident in my programming skills and I left there feeling drained and weak. It was a terribly humbling experience. In 4 hours my team of 3 people completed only one of the problems. The top team completed 4 of them and took 1st place. The problems they asked were like no problems I have ever had to answer before. I later learned that in order to solve them some of them effectively you have to use graphs/graph algorithms, trees, stacks. Some of them were simply "greedy" algo's. My question is, how can I better prepare for this semesters programming competition so I don't leave there feeling like a complete moron? What tips do you have for me to be able to answer these problems that involve graphs, trees, various "well known" algorithms? How can I easily identify the algorithm we should implement for a given problem? I have yet to take Algorithm Design in school so I just feel a little out of my element. Here are some examples of the questions asked at the competitions: ACM Problem Sets Update: Just wanted to update this since the latest competition is over. My team placed 1st for our small region (about 6-7 universities with between 1-5 teams each school) and ~15th for the midwest! So, it is a marked improvement over last years performance for sure. We also had no graduate students on our team and after reviewing the rules we found out that many teams had several! So, that would be a pretty big advantage in my own opinion. Problems this semester ranged from about 1-2 "easy" problems (ie bit manipulation, string manipulation) to hard (graph problems involving fairly complex math and network flow problems). We were able to solve 4 problems in our 5 hours. Just wanted to thank everyone for the resources they provided here, we used them for our weekly team practices and it definitely helped! Some quick tips that I have that aren't suggested below: When you are seated at your computer before the competition starts, quickly type out various data structures that you might need that you won't have access to in your languages libraries. I typed out a Graph data-structure complete with floyd-warshall and dijkstra's algorithm before the competition began. We ended up using it in our 2nd problem that we solved and this is the main reason why we solved this problem before anyone else in the midwest. We had it ready to go from the beginning. Similarly, type out the code to read in a file since this will be required for every problem. Save this answer "template" someplace so you can quickly copy/paste it to your IDE at the beginning of each problem. There are no rules on programming anything before the competition starts so get any boilerplate code out the way. We found it useful to have one person who is on permanent whiteboard duty. This is usually the person who is best at math and at working out solutions to get a head start on future problems you will be doing. One person is on permanent programming duty. Your fastest/most skilled "programmer" (most familiar with the language). This will save debugging time also. The last person has several roles between assessing the packet of problems for the next "easiest" problem, helping the person on the whiteboard work out solutions and helping the person programming work out bugs/issues. This person needs to be flexible and be able to switch between roles easily.

    Read the article

  • KVM and JBoss Java Application Server

    - by Jason
    We have a large Xen deployment running on both RHEL and CentOS and have recently started looking at KVM since this is where it looks like the future of VM's are on Linux. We can load the server and get everything running without an issue. However when we load up a new one with JBoss (4.2 Community edition, Sun JDK 6) and load a large EAR the server goes a little crazy. The %sy will jump to 80-99% and just hang for large periods of time we see a similar jump in %us on the host machine. We though at first this might be I/O as it seems to happen at start of JBoss but then would "cool down" after everything got loaded. We did some tests by extracting some large tar.gz files and using jar -xvf on the ear but could not re-create. Then we starting thinking this might be some type of memory access issues. We loaded a c-program that would generate a lot of memory activity and sure enough we saw the spikes again. Not as high mind you but we did see it jump. We then wrote a small java program to do the same thing and sure enough we saw it jump again. Any thoughts on what might be causing this? Is this just the way KVM works? As a side note we do NOT see this behavior on any other setup. Xen, VMWare and/or native iron. The system does seem a bit slower than our Xen / VMware ones.

    Read the article

< Previous Page | 66 67 68 69 70 71 72 73 74 75 76 77  | Next Page >