what is faster - Page 182

Find the set of largest contiguous rectangles to cover multiple areas

- by joelpt

I'm working on a tool called Quickfort for the game Dwarf Fortress. Quickfort turns spreadsheets in csv/xls format into a series of commands for Dwarf Fortress to carry out in order to plot a "blueprint" within the game. I am currently trying to optimally solve an area-plotting problem for the 2.0 release of this tool. Consider the following "blueprint" which defines plotting commands for a 2-dimensional grid. Each cell in the grid should either be dug out ("d"), channeled ("c"), or left unplotted ("."). Any number of distinct plotting commands might be present in actual usage. . d . d c c d d d d c c . d d d . c d d d d d c . d . d d c To minimize the number of instructions that need to be sent to Dwarf Fortress, I would like to find the set of largest contiguous rectangles that can be formed to completely cover, or "plot", all of the plottable cells. To be valid, all of a given rectangle's cells must contain the same command. This is a faster approach than Quickfort 1.0 took: plotting every cell individually as a 1x1 rectangle. This video shows the performance difference between the two versions. For the above blueprint, the solution looks like this: . 9 . 0 3 2 8 1 1 1 3 2 . 1 1 1 . 2 7 1 1 1 4 2 . 6 . 5 4 2 Each same-numbered rectangle above denotes a contiguous rectangle. The largest rectangles take precedence over smaller rectangles that could also be formed in their areas. The order of the numbering/rectangles is unimportant. My current approach is iterative. In each iteration, I build a list of the largest rectangles that could be formed from each of the grid's plottable cells by extending in all 4 directions from the cell. After sorting the list largest first, I begin with the largest rectangle found, mark its underlying cells as "plotted", and record the rectangle in a list. Before plotting each rectangle, its underlying cells are checked to ensure they are not yet plotted (overlapping a previous plot). We then start again, finding the largest remaining rectangles that can be formed and plotting them until all cells have been plotted as part of some rectangle. I consider this approach slightly more optimized than a dumb brute-force search, but I am wasting a lot of cycles (re)calculating cells' largest rectangles and checking underlying cells' states. Currently, this rectangle-discovery routine takes the lion's share of the total runtime of the tool, especially for large blueprints. I have sacrificed some accuracy for the sake of speed by only considering rectangles from cells which appear to form a rectangle's corner (determined using some neighboring-cell heuristics which aren't always correct). As a result of this 'optimization', my current code doesn't actually generate the above solution correctly, but it's close enough. More broadly, I consider the goal of largest-rectangles-first to be a "good enough" approach for this application. However I observe that if the goal is instead to find the minimum set (fewest number) of rectangles to completely cover multiple areas, the solution would look like this instead: . 3 . 5 6 8 1 3 4 5 6 8 . 3 4 5 . 8 2 3 4 5 7 8 . 3 . 5 7 8 This second goal actually represents a more optimal solution to the problem, as fewer rectangles usually means fewer commands sent to Dwarf Fortress. However, this approach strikes me as closer to NP-Hard, based on my limited math knowledge. Watch the video if you'd like to better understand the overall strategy; I have not addressed other aspects of Quickfort's process, such as finding the shortest cursor-path that plots all rectangles. Possibly there is a solution to this problem that coherently combines these multiple strategies. Help of any form would be appreciated.

Read the article

Why is JSON outputting out of order?

- by dcp3450

I'm am trying to get a list of weather information for 8 locations. I'm using a weather API that accepts longitude and latitude and spits back json output with the weather info for that location. I feed the coords in order 0-7 but when json processes the data it comes back in a seemingly random order. I assume it's because some process faster than others and json is outputing what it gets back as it gets it. The output is correct, only the order is wrong. var loc = null; var body = ""; var campuses = new Array(8); campuses[0] = "34.47242,-84.42489,1"; campuses[1] = "33.81488,-84.62048,2"; campuses[2] = "34.27502,-84.46976,3"; campuses[3] = "33.92987,-84.55065,4"; campuses[4] = "34.03433,-84.46723,5"; campuses[5] = "34.08362,-84.67115,6"; campuses[6] = "33.91124,-84.82634,7"; campuses[7] = "34.10409,-84.51804,8"; function getWeather(campusArray) { body += '<p class="topTitle">Campus Weather</p>'; var cSplit = new Array(); cSplit = campusArray.split(','); var loc = "http://www.worldweatheronline.com/feed/weather.ashx?q="+cSplit[0]+","+cSplit[1]+"&format=json&num_of_days=2&key=0a05fff921162948110401&callback=?"; $('#content').html('asdf'); $.getJSON(loc,function(js) { var data = js.data; var humidity = data.current_condition[0].humidity; var tempF = data.current_condition[0].temp_F; var iconDESC = data.current_condition[0].weatherDesc[0].value; var iconURL = data.current_condition[0].weatherIconUrl[0].value; var windDir = data.current_condition[0].winddir16Point; var windSpeed = data.current_condition[0].windspeedMiles; var tempMaxF = data.weather[0].tempMaxF; var tempMinF = data.weather[0].tempMinF; body += '<p class="title">'+cSplit[2]+'</p>'+ '<span class="body">'+tempF+ ' '+windSpeed+ '<img src="'+iconURL+'" /></span>'; $('#content').html(body); }); } getWeather(campuses[0]); getWeather(campuses[1]); getWeather(campuses[2]); getWeather(campuses[3]); getWeather(campuses[4]); getWeather(campuses[5]); getWeather(campuses[6]); getWeather(campuses[7]); I have also tried it as $.ajax var loc = null; var body = ""; var campuses = new Array(8); campuses[0] = "34.47242,-84.42489,1"; campuses[1] = "33.81488,-84.62048,2"; campuses[2] = "34.27502,-84.46976,3"; campuses[3] = "33.92987,-84.55065,4"; campuses[4] = "34.03433,-84.46723,5"; campuses[5] = "34.08362,-84.67115,6"; campuses[6] = "33.91124,-84.82634,7"; campuses[7] = "34.10409,-84.51804,8"; function getWeather(campusArray) { body += '<p class="topTitle">Campus Weather</p>'; var cSplit = new Array(); cSplit = campusArray.split(','); var loc = "http://www.worldweatheronline.com/feed/weather.ashx?q="+cSplit[0]+","+cSplit[1]+"&format=json&num_of_days=2&key=0a05fff921162948110401&callback=?"; $.ajax({ url: loc, async: true, dataType: "json", success: function(js) { var data = js.data; var humidity = data.current_condition[0].humidity; var tempF = data.current_condition[0].temp_F; var iconDESC = data.current_condition[0].weatherDesc[0].value; var iconURL = data.current_condition[0].weatherIconUrl[0].value; var windDir = data.current_condition[0].winddir16Point; var windSpeed = data.current_condition[0].windspeedMiles; var tempMaxF = data.weather[0].tempMaxF; var tempMinF = data.weather[0].tempMinF; body += '<p class="title">'+cSplit[2]+'</p>'+ '<span class="body">'+tempF+ ' '+windSpeed+ '<img src="'+iconURL+'" /></span>'; $('#content').html(body); } }); } getWeather(campuses[0]); getWeather(campuses[1]); getWeather(campuses[2]); getWeather(campuses[3]); getWeather(campuses[4]); getWeather(campuses[5]); getWeather(campuses[6]); getWeather(campuses[7]); EDIT: example of json output: { "data": { "current_condition": [ {"cloudcover": "100", "humidity": "93", "observation_time": "04:04 PM", "precipMM": "0.0", "pressure": "1009", "temp_C": "2", "temp_F": "36", "visibility": "8", "weatherCode": "116", "weatherDesc": [ {"value": "Mist" } ], "weatherIconUrl": [ {"value": "http:\/\/www.worldweatheronline.com\/images\/wsymbols01_png_64\/wsymbol_0006_mist.png" } ], "winddir16Point": "WNW", "winddirDegree": "290", "windspeedKmph": "7", "windspeedMiles": "4" } ], "request": [ {"query": "Lat 34.47 and Lon -84.42", "type": "LatLon" } ], "weather": [ {"date": "2011-01-06", "precipMM": "9.3", "tempMaxC": "7", "tempMaxF": "45", "tempMinC": "2", "tempMinF": "35", "weatherCode": "113", "weatherDesc": [ {"value": "Sunny" } ], "weatherIconUrl": [ {"value": "http:\/\/www.worldweatheronline.com\/images\/wsymbols01_png_64\/wsymbol_0001_sunny.png" } ], "winddir16Point": "WNW", "winddirDegree": "293", "winddirection": "WNW", "windspeedKmph": "20", "windspeedMiles": "13" }, {"date": "2011-01-07", "precipMM": "0.0", "tempMaxC": "6", "tempMaxF": "44", "tempMinC": "0", "tempMinF": "31", "weatherCode": "116", "weatherDesc": [ {"value": "Partly Cloudy" } ], "weatherIconUrl": [ {"value": "http:\/\/www.worldweatheronline.com\/images\/wsymbols01_png_64\/wsymbol_0002_sunny_intervals.png" } ], "winddir16Point": "WNW", "winddirDegree": "286", "winddirection": "WNW", "windspeedKmph": "25", "windspeedMiles": "16" } ] }}

Read the article

Does my basic PHP Socket Server need optimization?

- by Tom

Like many people, I can do a lot of things with PHP. One problem I do face constantly is that other people can do it much cleaner, much more organized and much more structured. This also results in much faster execution times and much less bugs. I just finished writing a basic PHP Socket Server (the real core), and am asking you if you can tell me what I should do different before I start expanding the core. I'm not asking about improvements such as encrypted data, authentication or multi-threading. I'm more wondering about questions like "should I maybe do it in a more object oriented way (using PHP5)?", or "is the general structure of the way the script works good, or should some things be done different?". Basically, "is this how the core of a socket server should work?" In fact, I think that if I just show you the code here many of you will immediately see room for improvements. Please be so kind to tell me. Thanks! #!/usr/bin/php -q <? // config $timelimit = 180; // amount of seconds the server should run for, 0 = run indefintely $address = $_SERVER['SERVER_ADDR']; // the server's external IP $port = 9000; // the port to listen on $backlog = SOMAXCONN; // the maximum of backlog incoming connections that will be queued for processing // configure custom PHP settings error_reporting(1); // report all errors ini_set('display_errors', 1); // display all errors set_time_limit($timelimit); // timeout after x seconds ob_implicit_flush(); // results in a flush operation after every output call //create master IPv4 based TCP socket if (!($master = socket_create(AF_INET, SOCK_STREAM, SOL_TCP))) die("Could not create master socket, error: ".socket_strerror(socket_last_error())); // set socket options (local addresses can be reused) if (!socket_set_option($master, SOL_SOCKET, SO_REUSEADDR, 1)) die("Could not set socket options, error: ".socket_strerror(socket_last_error())); // bind to socket server if (!socket_bind($master, $address, $port)) die("Could not bind to socket server, error: ".socket_strerror(socket_last_error())); // start listening if (!socket_listen($master, $backlog)) die("Could not start listening to socket, error: ".socket_strerror(socket_last_error())); //display startup information echo "[".date('Y-m-d H:i:s')."] SERVER CREATED (MAXCONN: ".SOMAXCONN.").\n"; //max connections is a kernel variable and can be adjusted with sysctl echo "[".date('Y-m-d H:i:s')."] Listening on ".$address.":".$port.".\n"; $time = time(); //set startup timestamp // init read sockets array $read_sockets = array($master); // continuously handle incoming socket messages, or close if time limit has been reached while ((!$timelimit) or (time() - $time < $timelimit)) { $changed_sockets = $read_sockets; socket_select($changed_sockets, $write = null, $except = null, null); foreach($changed_sockets as $socket) { if ($socket == $master) { if (($client = socket_accept($master)) < 0) { echo "[".date('Y-m-d H:i:s')."] Socket_accept() failed, error: ".socket_strerror(socket_last_error())."\n"; continue; } else { array_push($read_sockets, $client); echo "[".date('Y-m-d H:i:s')."] Client #".count($read_sockets)." connected (connections: ".count($read_sockets)."/".SOMAXCONN.")\n"; } } else { $data = @socket_read($socket, 1024, PHP_NORMAL_READ); //read a maximum of 1024 bytes until a new line has been sent if ($data === false) { //the client disconnected $index = array_search($socket, $read_sockets); unset($read_sockets[$index]); socket_close($socket); echo "[".date('Y-m-d H:i:s')."] Client #".($index-1)." disconnected (connections: ".count($read_sockets)."/".SOMAXCONN.")\n"; } else { if ($data = trim($data)) { //remove whitespace and continue only if the message is not empty switch ($data) { case "exit": //close connection when exit command is given $index = array_search($socket, $read_sockets); unset($read_sockets[$index]); socket_close($socket); echo "[".date('Y-m-d H:i:s')."] Client #".($index-1)." disconnected (connections: ".count($read_sockets)."/".SOMAXCONN.")\n"; break; default: //for experimental purposes, write the given data back socket_write($socket, "\n you wrote: ".$data); } } } } } } socket_close($master); //close the socket echo "[".date('Y-m-d H:i:s')."] SERVER CLOSED.\n"; ?>

Read the article

SQL: Using a CASE Statement to update 1000 rows at once

- by SoLoGHoST

Ok, I would like to use a CASE STATEMENT for this, but I am lost with this. Basically, I need to update a ton of rows, but just on the "position" column. I need to update all "position" values from 0 - count(position) for each id_layout_position column per id_layout column. OK, here is a pic of what the table looks like: Now let's say I delete the circled row, this will remove position = 2 and give me: 0, 1, 3, 5, 6, 7, and 4. But I want to add something at the end now and make sure that it has the last possible position, but the positions are already messed up, so I need to reorder them like so before I insert the new row: 0, 1, 2, 3, 4, 5, 6. But it must be ordered by lowest first. So 0 stays at 0, 1 stays at 1, 3 gets changed to 2, the 4 at the end gets changed to a 3, 5 gets changed to 4, 6 gets changed to 5, and 7 gets changed to 6. Hopefully you guys get the picture now. I'm completely lost here. Also, note, this table is tiny compared to how fast it can grow in size, so it needs to be able to do this FAST, thus I was thinking on the CASE STATEMENT for an UPDATE QUERY. Here's what I got for a regular update, but I don't wanna throw this into a foreach loop, as it would take forever to do it. I'm using SMF (Simple Machines Forums), so it might look a little different, but the idea is the same, and CASE statements are supported... $smcFunc['db_query']('', ' UPDATE {db_prefix}dp_positions SET position = {int:position} WHERE id_layout_position = {int:id_layout_position} AND id_layout = {int:id_layout}', array( 'position' => $position++, 'id_layout_position' => (int) $id_layout_position, 'id_layout' => (int) $id_layout, ) ); Anyways, I need to apply some sort of CASE on this so that I can auto-increment by 1 all values that it finds and update to the next possible value. I know I'm doing this wrong, even in this QUERY. But I'm totally lost when it comes to CASES. Here's an example of a CASE being used within SMF, so you can see this and hopefully relate: $conditions = ''; foreach ($postgroups as $id => $min_posts) { $conditions .= ' WHEN posts >= ' . $min_posts . (!empty($lastMin) ? ' AND posts <= ' . $lastMin : '') . ' THEN ' . $id; $lastMin = $min_posts; } // A big fat CASE WHEN... END is faster than a zillion UPDATE's ;). $smcFunc['db_query']('', ' UPDATE {db_prefix}members SET id_post_group = CASE ' . $conditions . ' ELSE 0 END' . ($parameter1 != null ? ' WHERE ' . (is_array($parameter1) ? 'id_member IN ({array_int:members})' : 'id_member = {int:members}') : ''), array( 'members' => $parameter1, ) ); Before I do the update, I actually have a SELECT which throws everything I need into arrays like so: $disabled_sections = array(); $positions = array(); while ($row = $smcFunc['db_fetch_assoc']($request)) { if (!isset($disabled_sections[$row['id_group']][$row['id_layout']])) $disabled_sections[$row['id_group']][$row['id_layout']] = array( 'info' => $module_info[$name], 'id_layout_position' => $row['id_layout_position'] ); // Increment the positions... if (!is_null($row['position'])) { if (!isset($positions[$row['id_layout']][$row['id_layout_position']])) $positions[$row['id_layout']][$row['id_layout_position']] = 1; else $positions[$row['id_layout']][$row['id_layout_position']]++; } else $positions[$row['id_layout']][$row['id_layout_position']] = 0; } Thanks, I know if anyone can help me here it's definitely you guys and gals... Anyways, here is my question: How do I use a CASE statement in the first code example, so that I can update all of the rows in the position column from 0 - total # of rows found, that have that id_layout value and that id_layout_position value, and continue this for all different id_layout values in that table? Can I use the arrays above somehow? I'm sure I'll have to use the id_layout and id_layout_position values for this right? But how can I do this? Ok, guy, I get an error, saying "Hacking Attempt" with the following code: // Updating all positions in here. $smcFunc['db_query']('', ' SET @pos = 0; UPDATE {db_prefix}dp_positions SET position=@pos:=@pos+1 ORDER BY id_layout_position, position', array( ) ); Am I doing something wrong? Perhaps SMF has safeguards against this approach?? Perhaps I need to use a CASE STATEMENT instead?

Read the article

$.post is not working

- by BEBO

i am trying to post data to Mysql using jquery $.post and php page. my code is not running and nothing is added to the mysql table. I am not sure if the path i am creating is wrong but any help would be appreciated. Jquery location: f_js/tasks/TaskTest.js <script type="text/javascript"> $(document).ready(function(){ $("#AddTask").click(function(){ var acct = $('#acct').val(); var quicktask = $('#quicktask').val(); var user = $('#user').val(); $.post('addTask.php',{acct:acct,quicktask:quicktask,user:user}, function(data){ $('#result').fadeIn('slow').html(data); }); }); }); </script> addTask.php (runs the jqeury code) <?php include 'dbconnect.php'; include 'sessions.php'; $acct = $_POST['acct']; $task = $_POST['quicktask']; $taskstatus = 'Active'; //get task Creator $user = $_POST['user']; //query task creator from users table $allusers = mysql_query("SELECT * FROM users WHERE username = '$user'"); while ($rows = mysql_fetch_array($allusers)) { //get first and last name for task creator $taskOwner = $rows['user_firstname']; $taskOwnerLast = $rows['user_lastname']; $taskOwnerFull = $taskOwner." ".$taskOwnerLast; mysql_query("INSERT INTO tasks (taskresource, tasktitle, taskdetail, taskstatus, taskowner, taskOwnerFullName) VALUES ('$acct', '$task', '$task', '$taskstatus', '$user', '$taskOwnerFull' )"); echo "inserted"; } ?> Accountview.php finally the front page <html> <div class="input-cont "> <input type="text" class="form-control col-lg-12" placeholder="Add a quick Task..." name ="quicktask" id="quicktask"> </div> <div class="form-group"> <div class="pull-right chat-features"> <a href="javascript:;"> <i class="icon-camera"></i> </a> <a href="javascript:;"> <i class="icon-link"></i> </a> <input type="button" class="btn btn-danger" name="AddTask" id="AddTask" value="Add" /> <input type="hidden" name="acct" id="acct" value="<?php echo $_REQUEST['acctname']?>"/> <input type="hidden" name="user" id="user" value="<?php $username = $_SESSION['username']; echo $username?>"/> <div id="result">result</div> </div> </div>  <script src="js/jquery.js"></script> <script src="f_js/tasks/TaskTest.js"></script>  <script src="js/common-scripts.js"></script> <script type="text/javascript" src="assets/gritter/js/jquery.gritter.js"></script> <script src="js/gritter.js" type="text/javascript"></script> <script> </html> Firebug reponse: Response Headers Connection Keep-Alive Content-Length 0 Content-Type text/html Date Fri, 08 Nov 2013 21:48:50 GMT Keep-Alive timeout=5, max=100 Server Apache/2.4.4 (Win32) OpenSSL/0.9.8y PHP/5.4.16 X-Powered-By PHP/5.4.16 refresh 5; URL=index.php Request Headers Accept */* Accept-Encoding gzip, deflate Accept-Language en-US,en;q=0.5 Content-Length 13 Content-Type application/x-www-form-urlencoded; charset=UTF-8 Cookie PHPSESSID=6gufl3guiiddreg8cdlc0htnc6 Host localhost Referer http://localhost/betahtml/AccountView.php?acctname=client%201 User-Agent Mozilla/5.0 (Windows NT 6.3; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 X-Requested-With XMLHttpRequest

Read the article

Accessing local variable doesn't improve performance

- by NicMagnier

The short version Why is this code: var index = (Math.floor(y / scale) * img.width + Math.floor(x / scale)) * 4; More performant than this one? var index = Math.floor(ref_index) * 4; The long version This week, the author of Impact js published an article about some rendering issue: http://www.phoboslab.org/log/2012/09/drawing-pixels-is-hard In the article there was the source of a function to scale an image by accessing pixels in the canvas. I wanted to suggest some traditional ways to optimize this kind of code so that the scaling would be shorter at loading time. But after testing it my result was most of the time worst that the original function. Guessing this was the JavaScript engine that was doing some smart optimization I tried to understand a bit more what was going on so I did a bunch of test. But my results are quite confusing and I would need some help to understand what's going on. I have a test page here: http://www.mx981.com/stuff/resize_bench/test.html jsPerf: http://jsperf.com/local-variable-due-to-the-scope-lookup To start the test, click the picture and the results will appear in the console. There are three different versions: The original code: for( var y = 0; y < heightScaled; y++ ) { for( var x = 0; x < widthScaled; x++ ) { var index = (Math.floor(y / scale) * img.width + Math.floor(x / scale)) * 4; var indexScaled = (y * widthScaled + x) * 4; scaledPixels.data[ indexScaled ] = origPixels.data[ index ]; scaledPixels.data[ indexScaled+1 ] = origPixels.data[ index+1 ]; scaledPixels.data[ indexScaled+2 ] = origPixels.data[ index+2 ]; scaledPixels.data[ indexScaled+3 ] = origPixels.data[ index+3 ]; } } jsPerf: http://jsperf.com/so-accessing-local-variable-doesn-t-improve-performance One of my attempt to optimize it: var ref_index = 0; var ref_indexScaled = 0 var ref_step = 1 / scale; for( var y = 0; y < heightScaled; y++ ) { for( var x = 0; x < widthScaled; x++ ) { var index = Math.floor(ref_index) * 4; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index ]; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+1 ]; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+2 ]; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+3 ]; ref_index+= ref_step; } } jsPerf: http://jsperf.com/so-accessing-local-variable-doesn-t-improve-performance The same optimized code but with recalculating the index variable each time (Hybrid) var ref_index = 0; var ref_indexScaled = 0 var ref_step = 1 / scale; for( var y = 0; y < heightScaled; y++ ) { for( var x = 0; x < widthScaled; x++ ) { var index = (Math.floor(y / scale) * img.width + Math.floor(x / scale)) * 4; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index ]; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+1 ]; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+2 ]; scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+3 ]; ref_index+= ref_step; } } jsPerf: http://jsperf.com/so-accessing-local-variable-doesn-t-improve-performance The only difference in the two last one is the calculation of the 'index' variable. And to my surprise the optimized version is slower in most browsers (except opera). Results of personal testing (not the jsPerf tests): Opera Original: 8668ms Optimized: 932ms Hybrid: 8696ms Chrome Original: 139ms Optimized: 145ms Hybrid: 136ms Safari Original: 433ms Optimized: 853ms Hybrid: 451ms Firefox Original: 343ms Optimized: 422ms Hybrid: 350ms After digging around, it seems an usual good practice is to access mainly local variable due to the scope lookup. Because The optimized version only call one local variable it should be faster that the Hybrid code which call multiple variable and object in addition to the various operation involved. So why the "optimized" version is slower? I thought that it might be because some JavaScript engine don't optimize the Optimized version because it is not hot enough but after using --trace-opt in chrome, it seems all version are properly compiled by V8. At this point I am a bit clueless and wonder if somebody would know what is going on? I did also some more test cases in this page: http://www.mx981.com/stuff/resize_bench/index.html

Read the article

mysql: Bind on unix socket: Permission denied

- by Alex

Can't start mysql with: sudo /usr/bin/mysqld_safe --datadir=/srv/mysql/myDB --log-error=/srv/mysql/logs/mysqld-myDB.log --pid-file=/srv/mysql/pids/mysqld-myDB.pid --user=mysql --socket=/srv/mysql/sockets/mysql-myDB.sock --port=3700 120222 13:40:48 mysqld_safe Starting mysqld daemon with databases from /srv/mysql/myDB 120222 13:40:54 mysqld_safe mysqld from pid file /srv/mysql/pids/mysqld-myDB.pid ended /srv/mysql/logs/mysqld-myDB.log: 120222 13:43:53 mysqld_safe Starting mysqld daemon with databases from /srv/mysql/myDB 120222 13:43:53 [Note] Plugin 'FEDERATED' is disabled. /usr/sbin/mysqld: Table 'plugin' is read only 120222 13:43:53 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. 120222 13:43:53 InnoDB: Completed initialization of buffer pool 120222 13:43:53 InnoDB: Started; log sequence number 32 4232720908 120222 13:43:53 [ERROR] Can't start server : Bind on unix socket: Permission denied 120222 13:43:53 [ERROR] Do you already have another mysqld server running on socket: /srv/mysql/sockets/mysql-myDB.sock ? 120222 13:43:53 [ERROR] Aborting 120222 13:43:53 InnoDB: Starting shutdown... One instance mysqld is running: $ ps aux | grep mysql mysql 1093 0.0 0.2 169972 18700 ? Ssl 11:50 0:02 /usr/sbin/mysqld $ Port 3700 is available: $ netstat -a | grep 3700 $ Directory with sockets is empty: $ ls /srv/mysql/sockets/ $ There are all permissions: $ ls -l /srv/mysql/ total 20 drwxrwxrwx 2 mysql mysql 4096 2012-02-22 13:28 logs drwxrwxrwx 13 mysql mysql 4096 2012-02-22 13:44 myDB drwxrwxrwx 2 mysql mysql 4096 2012-02-22 12:55 pids drwxrwxrwx 2 mysql mysql 4096 2012-02-22 12:55 sockets drwxrwxrwx 2 mysql mysql 4096 2012-02-22 13:25 version Apparmor config: $cat /etc/apparmor.d/usr.sbin.mysqld # vim:syntax=apparmor # Last Modified: Tue Jun 19 17:37:30 2007 #include <tunables/global> /usr/sbin/mysqld flags=(complain) { #include <abstractions/base> #include <abstractions/nameservice> #include <abstractions/user-tmp> #include <abstractions/mysql> #include <abstractions/winbind> capability dac_override, capability sys_resource, capability setgid, capability setuid, network tcp, /etc/hosts.allow r, /etc/hosts.deny r, /etc/mysql/*.pem r, /etc/mysql/conf.d/ r, /etc/mysql/conf.d/* r, /etc/mysql/*.cnf r, /usr/lib/mysql/plugin/ r, /usr/lib/mysql/plugin/*.so* mr, /usr/sbin/mysqld mr, /usr/share/mysql/** r, /var/log/mysql.log rw, /var/log/mysql.err rw, /var/lib/mysql/ r, /var/lib/mysql/** rwk, /var/log/mysql/ r, /var/log/mysql/* rw, /{,var/}run/mysqld/mysqld.pid w, /{,var/}run/mysqld/mysqld.sock w, /srv/mysql/ r, /srv/mysql/** rwk, /sys/devices/system/cpu/ r, # Site-specific additions and overrides. See local/README for details. #include <local/usr.sbin.mysqld> } Any suggestions? UPD1: $ touch /srv/mysql/sockets/mysql-myDB.sock $ sudo chown mysql:mysql /srv/mysql/sockets/mysql-myDB.sock $ ls -l /srv/mysql/sockets/mysql-myDB.sock -rw-rw-r-- 1 mysql mysql 0 2012-02-22 14:29 /srv/mysql/sockets/mysql-myDB.sock $ sudo /usr/bin/mysqld_safe --datadir=/srv/mysql/myDB --log-error=/srv/mysql/logs/mysqld-myDB.log --pid-file=/srv/mysql/pids/mysqld-myDB.pid --user=mysql --socket=/srv/mysql/sockets/mysql-myDB.sock --port=3700 120222 14:30:18 mysqld_safe Can't log to error log and syslog at the same time. Remove all --log-error configuration options for --syslog to take effect. 120222 14:30:18 mysqld_safe Logging to '/srv/mysql/logs/mysqld-myDB.log'. 120222 14:30:18 mysqld_safe Starting mysqld daemon with databases from /srv/mysqlmyDB 120222 14:30:24 mysqld_safe mysqld from pid file /srv/mysql/pids/mysqld-myDB.pid ended $ ls -l /srv/mysql/sockets/mysql-myDB.sock ls: cannot access /srv/mysql/sockets/mysql-myDB.sock: No such file or directory $ UPD2: $ sudo netstat -lnp | grep mysql tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 1093/mysqld unix 2 [ ACC ] STREAM LISTENING 5912 1093/mysqld /var/run/mysqld/mysqld.sock $ sudo lsof | grep /srv/mysql/sockets/mysql-myDB.sock lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/sears/.gvfs Output information may be incomplete. UPD3: $ cat /etc/mysql/my.cnf # # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 16M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! #general_log_file = /var/log/mysql/mysql.log #general_log = 1 log_error = /var/log/mysql/error.log # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

Read the article

Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article

A Taxonomy of Numerical Methods v1

- by JoshReuben

Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

Read the article

Toorcon14

- by danx

Toorcon 2012 Information Security Conference San Diego, CA, http://www.toorcon.org/ Dan Anderson, October 2012 It's almost Halloween, and we all know what that means—yes, of course, it's time for another Toorcon Conference! Toorcon is an annual conference for people interested in computer security. This includes the whole range of hackers, computer hobbyists, professionals, security consultants, press, law enforcement, prosecutors, FBI, etc. We're at Toorcon 14—see earlier blogs for some of the previous Toorcon's I've attended (back to 2003). This year's "con" was held at the Westin on Broadway in downtown San Diego, California. The following are not necessarily my views—I'm just the messenger—although I could have misquoted or misparaphrased the speakers. Also, I only reviewed some of the talks, below, which I attended and interested me. MalAndroid—the Crux of Android Infections, Aditya K. Sood Programming Weird Machines with ELF Metadata, Rebecca "bx" Shapiro Privacy at the Handset: New FCC Rules?, Valkyrie Hacking Measured Boot and UEFI, Dan Griffin You Can't Buy Security: Building the Open Source InfoSec Program, Boris Sverdlik What Journalists Want: The Investigative Reporters' Perspective on Hacking, Dave Maas & Jason Leopold Accessibility and Security, Anna Shubina Stop Patching, for Stronger PCI Compliance, Adam Brand McAfee Secure & Trustmarks — a Hacker's Best Friend, Jay James & Shane MacDougall MalAndroid—the Crux of Android Infections Aditya K. Sood, IOActive, Michigan State PhD candidate Aditya talked about Android smartphone malware. There's a lot of old Android software out there—over 50% Gingerbread (2.3.x)—and most have unpatched vulnerabilities. Of 9 Android vulnerabilities, 8 have known exploits (such as the old Gingerbread Global Object Table exploit). Android protection includes sandboxing, security scanner, app permissions, and screened Android app market. The Android permission checker has fine-grain resource control, policy enforcement. Android static analysis also includes a static analysis app checker (bouncer), and a vulnerablity checker. What security problems does Android have? User-centric security, which depends on the user to grant permission and make smart decisions. But users don't care or think about malware (the're not aware, not paranoid). All they want is functionality, extensibility, mobility Android had no "proper" encryption before Android 3.0 No built-in protection against social engineering and web tricks Alternative Android app markets are unsafe. Simply visiting some markets can infect Android Aditya classified Android Malware types as: Type A—Apps. These interact with the Android app framework. For example, a fake Netflix app. Or Android Gold Dream (game), which uploads user files stealthy manner to a remote location. Type K—Kernel. Exploits underlying Linux libraries or kernel Type H—Hybrid. These use multiple layers (app framework, libraries, kernel). These are most commonly used by Android botnets, which are popular with Chinese botnet authors What are the threats from Android malware? These incude leak info (contacts), banking fraud, corporate network attacks, malware advertising, malware "Hackivism" (the promotion of social causes. For example, promiting specific leaders of the Tunisian or Iranian revolutions. Android malware is frequently "masquerated". That is, repackaged inside a legit app with malware. To avoid detection, the hidden malware is not unwrapped until runtime. The malware payload can be hidden in, for example, PNG files. Less common are Android bootkits—there's not many around. What they do is hijack the Android init framework—alteering system programs and daemons, then deletes itself. For example, the DKF Bootkit (China). Android App Problems: no code signing! all self-signed native code execution permission sandbox — all or none alternate market places no robust Android malware detection at network level delayed patch process Programming Weird Machines with ELF Metadata Rebecca "bx" Shapiro, Dartmouth College, NH https://github.com/bx/elf-bf-tools @bxsays on twitter Definitions. "ELF" is an executable file format used in linking and loading executables (on UNIX/Linux-class machines). "Weird machine" uses undocumented computation sources (I think of them as unintended virtual machines). Some examples of "weird machines" are those that: return to weird location, does SQL injection, corrupts the heap. Bx then talked about using ELF metadata as (an uintended) "weird machine". Some ELF background: A compiler takes source code and generates a ELF object file (hello.o). A static linker makes an ELF executable from the object file. A runtime linker and loader takes ELF executable and loads and relocates it in memory. The ELF file has symbols to relocate functions and variables. ELF has two relocation tables—one at link time and another one at loading time: .rela.dyn (link time) and .dynsym (dynamic table). GOT: Global Offset Table of addresses for dynamically-linked functions. PLT: Procedure Linkage Tables—works with GOT. The memory layout of a process (not the ELF file) is, in order: program (+ heap), dynamic libraries, libc, ld.so, stack (which includes the dynamic table loaded into memory) For ELF, the "weird machine" is found and exploited in the loader. ELF can be crafted for executing viruses, by tricking runtime into executing interpreted "code" in the ELF symbol table. One can inject parasitic "code" without modifying the actual ELF code portions. Think of the ELF symbol table as an "assembly language" interpreter. It has these elements: instructions: Add, move, jump if not 0 (jnz) Think of symbol table entries as "registers" symbol table value is "contents" immediate values are constants direct values are addresses (e.g., 0xdeadbeef) move instruction: is a relocation table entry add instruction: relocation table "addend" entry jnz instruction: takes multiple relocation table entries The ELF weird machine exploits the loader by relocating relocation table entries. The loader will go on forever until told to stop. It stores state on stack at "end" and uses IFUNC table entries (containing function pointer address). The ELF weird machine, called "Brainfu*k" (BF) has: 8 instructions: pointer inc, dec, inc indirect, dec indirect, jump forward, jump backward, print. Three registers - 3 registers Bx showed example BF source code that implemented a Turing machine printing "hello, world". More interesting was the next demo, where bx modified ping. Ping runs suid as root, but quickly drops privilege. BF modified the loader to disable the library function call dropping privilege, so it remained as root. Then BF modified the ping -t argument to execute the -t filename as root. It's best to show what this modified ping does with an example: $ whoami bx $ ping localhost -t backdoor.sh # executes backdoor $ whoami root $ The modified code increased from 285948 bytes to 290209 bytes. A BF tool compiles "executable" by modifying the symbol table in an existing ELF executable. The tool modifies .dynsym and .rela.dyn table, but not code or data. Privacy at the Handset: New FCC Rules? "Valkyrie" (Christie Dudley, Santa Clara Law JD candidate) Valkyrie talked about mobile handset privacy. Some background: Senator Franken (also a comedian) became alarmed about CarrierIQ, where the carriers track their customers. Franken asked the FCC to find out what obligations carriers think they have to protect privacy. The carriers' response was that they are doing just fine with self-regulation—no worries! Carriers need to collect data, such as missed calls, to maintain network quality. But carriers also sell data for marketing. Verizon sells customer data and enables this with a narrow privacy policy (only 1 month to opt out, with difficulties). The data sold is not individually identifiable and is aggregated. But Verizon recommends, as an aggregation workaround to "recollate" data to other databases to identify customers indirectly. The FCC has regulated telephone privacy since 1934 and mobile network privacy since 2007. Also, the carriers say mobile phone privacy is a FTC responsibility (not FCC). FTC is trying to improve mobile app privacy, but FTC has no authority over carrier / customer relationships. As a side note, Apple iPhones are unique as carriers have extra control over iPhones they don't have with other smartphones. As a result iPhones may be more regulated. Who are the consumer advocates? Everyone knows EFF, but EPIC (Electrnic Privacy Info Center), although more obsecure, is more relevant. What to do? Carriers must be accountable. Opt-in and opt-out at any time. Carriers need incentive to grant users control for those who want it, by holding them liable and responsible for breeches on their clock. Location information should be added current CPNI privacy protection, and require "Pen/trap" judicial order to obtain (and would still be a lower standard than 4th Amendment). Politics are on a pro-privacy swing now, with many senators and the Whitehouse. There will probably be new regulation soon, and enforcement will be a problem, but consumers will still have some benefit. Hacking Measured Boot and UEFI Dan Griffin, JWSecure, Inc., Seattle, @JWSdan Dan talked about hacking measured UEFI boot. First some terms: UEFI is a boot technology that is replacing BIOS (has whitelisting and blacklisting). UEFI protects devices against rootkits. TPM - hardware security device to store hashs and hardware-protected keys "secure boot" can control at firmware level what boot images can boot "measured boot" OS feature that tracks hashes (from BIOS, boot loader, krnel, early drivers). "remote attestation" allows remote validation and control based on policy on a remote attestation server. Microsoft pushing TPM (Windows 8 required), but Google is not. Intel TianoCore is the only open source for UEFI. Dan has Measured Boot Tool at http://mbt.codeplex.com/ with a demo where you can also view TPM data. TPM support already on enterprise-class machines. UEFI Weaknesses. UEFI toolkits are evolving rapidly, but UEFI has weaknesses: assume user is an ally trust TPM implicitly, and attached to computer hibernate file is unprotected (disk encryption protects against this) protection migrating from hardware to firmware delays in patching and whitelist updates will UEFI really be adopted by the mainstream (smartphone hardware support, bank support, apathetic consumer support) You Can't Buy Security: Building the Open Source InfoSec Program Boris Sverdlik, ISDPodcast.com co-host Boris talked about problems typical with current security audits. "IT Security" is an oxymoron—IT exists to enable buiness, uptime, utilization, reporting, but don't care about security—IT has conflict of interest. There's no Magic Bullet ("blinky box"), no one-size-fits-all solution (e.g., Intrusion Detection Systems (IDSs)). Regulations don't make you secure. The cloud is not secure (because of shared data and admin access). Defense and pen testing is not sexy. Auditors are not solution (security not a checklist)—what's needed is experience and adaptability—need soft skills. Step 1: First thing is to Google and learn the company end-to-end before you start. Get to know the management team (not IT team), meet as many people as you can. Don't use arbitrary values such as CISSP scores. Quantitive risk assessment is a myth (e.g. AV*EF-SLE). Learn different Business Units, legal/regulatory obligations, learn the business and where the money is made, verify company is protected from script kiddies (easy), learn sensitive information (IP, internal use only), and start with low-hanging fruit (customer service reps and social engineering). Step 2: Policies. Keep policies short and relevant. Generic SANS "security" boilerplate policies don't make sense and are not followed. Focus on acceptable use, data usage, communications, physical security. Step 3: Implementation: keep it simple stupid. Open source, although useful, is not free (implementation cost). Access controls with authentication & authorization for local and remote access. MS Windows has it, otherwise use OpenLDAP, OpenIAM, etc. Application security Everyone tries to reinvent the wheel—use existing static analysis tools. Review high-risk apps and major revisions. Don't run different risk level apps on same system. Assume host/client compromised and use app-level security control. Network security VLAN != segregated because there's too many workarounds. Use explicit firwall rules, active and passive network monitoring (snort is free), disallow end user access to production environment, have a proxy instead of direct Internet access. Also, SSL certificates are not good two-factor auth and SSL does not mean "safe." Operational Controls Have change, patch, asset, & vulnerability management (OSSI is free). For change management, always review code before pushing to production For logging, have centralized security logging for business-critical systems, separate security logging from administrative/IT logging, and lock down log (as it has everything). Monitor with OSSIM (open source). Use intrusion detection, but not just to fulfill a checkbox: build rules from a whitelist perspective (snort). OSSEC has 95% of what you need. Vulnerability management is a QA function when done right: OpenVas and Seccubus are free. Security awareness The reality is users will always click everything. Build real awareness, not compliance driven checkbox, and have it integrated into the culture. Pen test by crowd sourcing—test with logging COSSP http://www.cossp.org/ - Comprehensive Open Source Security Project What Journalists Want: The Investigative Reporters' Perspective on Hacking Dave Maas, San Diego CityBeat Jason Leopold, Truthout.org The difference between hackers and investigative journalists: For hackers, the motivation varies, but method is same, technological specialties. For investigative journalists, it's about one thing—The Story, and they need broad info-gathering skills. J-School in 60 Seconds: Generic formula: Person or issue of pubic interest, new info, or angle. Generic criteria: proximity, prominence, timeliness, human interest, oddity, or consequence. Media awareness of hackers and trends: journalists becoming extremely aware of hackers with congressional debates (privacy, data breaches), demand for data-mining Journalists, use of coding and web development for Journalists, and Journalists busted for hacking (Murdock). Info gathering by investigative journalists include Public records laws. Federal Freedom of Information Act (FOIA) is good, but slow. California Public Records Act is a lot stronger. FOIA takes forever because of foot-dragging—it helps to be specific. Often need to sue (especially FBI). CPRA is faster, and requests can be vague. Dumps and leaks (a la Wikileaks) Journalists want: leads, protecting ourselves, our sources, and adapting tools for news gathering (Google hacking). Anonomity is important to whistleblowers. They want no digital footprint left behind (e.g., email, web log). They don't trust encryption, want to feel safe and secure. Whistleblower laws are very weak—there's no upside for whistleblowers—they have to be very passionate to do it. Accessibility and Security or: How I Learned to Stop Worrying and Love the Halting Problem Anna Shubina, Dartmouth College Anna talked about how accessibility and security are related. Accessibility of digital content (not real world accessibility). mostly refers to blind users and screenreaders, for our purpose. Accessibility is about parsing documents, as are many security issues. "Rich" executable content causes accessibility to fail, and often causes security to fail. For example MS Word has executable format—it's not a document exchange format—more dangerous than PDF or HTML. Accessibility is often the first and maybe only sanity check with parsing. They have no choice because someone may want to read what you write. Google, for example, is very particular about web browser you use and are bad at supporting other browsers. Uses JavaScript instead of links, often requiring mouseover to display content. PDF is a security nightmare. Executible format, embedded flash, JavaScript, etc. 15 million lines of code. Google Chrome doesn't handle PDF correctly, causing several security bugs. PDF has an accessibility checker and PDF tagging, to help with accessibility. But no PDF checker checks for incorrect tags, untagged content, or validates lists or tables. None check executable content at all. The "Halting Problem" is: can one decide whether a program will ever stop? The answer, in general, is no (Rice's theorem). The same holds true for accessibility checkers. Language-theoretic Security says complicated data formats are hard to parse and cannot be solved due to the Halting Problem. W3C Web Accessibility Guidelines: "Perceivable, Operable, Understandable, Robust" Not much help though, except for "Robust", but here's some gems: * all information should be parsable (paraphrasing) * if not parsable, cannot be converted to alternate formats * maximize compatibility in new document formats Executible webpages are bad for security and accessibility. They say it's for a better web experience. But is it necessary to stuff web pages with JavaScript for a better experience? A good example is The Drudge Report—it has hand-written HTML with no JavaScript, yet drives a lot of web traffic due to good content. A bad example is Google News—hidden scrollbars, guessing user input. Solutions: Accessibility and security problems come from same source Expose "better user experience" myth Keep your corner of Internet parsable Remember "Halting Problem"—recognize false solutions (checking and verifying tools) Stop Patching, for Stronger PCI Compliance Adam Brand, protiviti @adamrbrand, http://www.picfun.com/ Adam talked about PCI compliance for retail sales. Take an example: for PCI compliance, 50% of Brian's time (a IT guy), 960 hours/year was spent patching POSs in 850 restaurants. Often applying some patches make no sense (like fixing a browser vulnerability on a server). "Scanner worship" is overuse of vulnerability scanners—it gives a warm and fuzzy and it's simple (red or green results—fix reds). Scanners give a false sense of security. In reality, breeches from missing patches are uncommon—more common problems are: default passwords, cleartext authentication, misconfiguration (firewall ports open). Patching Myths: Myth 1: install within 30 days of patch release (but PCI Â§6.1 allows a "risk-based approach" instead). Myth 2: vendor decides what's critical (also PCI Â§6.1). But Â§6.2 requires user ranking of vulnerabilities instead. Myth 3: scan and rescan until it passes. But PCI Â§11.2.1b says this applies only to high-risk vulnerabilities. Adam says good recommendations come from NIST 800-40. Instead use sane patching and focus on what's really important. From NIST 800-40: Proactive: Use a proactive vulnerability management process: use change control, configuration management, monitor file integrity. Monitor: start with NVD and other vulnerability alerts, not scanner results. Evaluate: public-facing system? workstation? internal server? (risk rank) Decide:on action and timeline Test: pre-test patches (stability, functionality, rollback) for change control Install: notify, change control, tickets McAfee Secure & Trustmarks — a Hacker's Best Friend Jay James, Shane MacDougall, Tactical Intelligence Inc., Canada "McAfee Secure Trustmark" is a website seal marketed by McAfee. A website gets this badge if they pass their remote scanning. The problem is a removal of trustmarks act as flags that you're vulnerable. Easy to view status change by viewing McAfee list on website or on Google. "Secure TrustGuard" is similar to McAfee. Jay and Shane wrote Perl scripts to gather sites from McAfee and search engines. If their certification image changes to a 1x1 pixel image, then they are longer certified. Their scripts take deltas of scans to see what changed daily. The bottom line is change in TrustGuard status is a flag for hackers to attack your site. Entire idea of seals is silly—you're raising a flag saying if you're vulnerable.

Read the article

Toorcon 15 (2013)

- by danx

The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

Read the article

Scrum in 5 Minutes

- by Stephen.Walther

The goal of this blog entry is to explain the basic concepts of Scrum in less than five minutes. You learn how Scrum can help a team of developers to successfully complete a complex software project. Product Backlog and the Product Owner Imagine that you are part of a team which needs to create a new website – for example, an e-commerce website. You have an overwhelming amount of work to do. You need to build (or possibly buy) a shopping cart, install an SSL certificate, create a product catalog, create a Facebook page, and at least a hundred other things that you have not thought of yet. According to Scrum, the first thing you should do is create a list. Place the highest priority items at the top of the list and the lower priority items lower in the list. For example, creating the shopping cart and buying the domain name might be high priority items and creating a Facebook page might be a lower priority item. In Scrum, this list is called the Product Backlog. How do you prioritize the items in the Product Backlog? Different stakeholders in the project might have different priorities. Gary, your division VP, thinks that it is crucial that the e-commerce site has a mobile app. Sally, your direct manager, thinks taking advantage of new HTML5 features is much more important. Multiple people are pulling you in different directions. According to Scrum, it is important that you always designate one person, and only one person, as the Product Owner. The Product Owner is the person who decides what items should be added to the Product Backlog and the priority of the items in the Product Backlog. The Product Owner could be the customer who is paying the bills, the project manager who is responsible for delivering the project, or a customer representative. The critical point is that the Product Owner must always be a single person and that single person has absolute authority over the Product Backlog. Sprints and the Sprint Backlog So now the developer team has a prioritized list of items and they can start work. The team starts implementing the first item in the Backlog — the shopping cart — and the team is making good progress. Unfortunately, however, half-way through the work of implementing the shopping cart, the Product Owner changes his mind. The Product Owner decides that it is much more important to create the product catalog before the shopping cart. With some frustration, the team switches their developmental efforts to focus on implementing the product catalog. However, part way through completing this work, once again the Product Owner changes his mind about the highest priority item. Getting work done when priorities are constantly shifting is frustrating for the developer team and it results in lower productivity. At the same time, however, the Product Owner needs to have absolute authority over the priority of the items which need to get done. Scrum solves this conflict with the concept of Sprints. In Scrum, a developer team works in Sprints. At the beginning of a Sprint the developers and the Product Owner agree on the items from the backlog which they will complete during the Sprint. This subset of items from the Product Backlog becomes the Sprint Backlog. During the Sprint, the Product Owner is not allowed to change the items in the Sprint Backlog. In other words, the Product Owner cannot shift priorities on the developer team during the Sprint. Different teams use Sprints of different lengths such as one month Sprints, two-week Sprints, and one week Sprints. For high-stress, time critical projects, teams typically choose shorter sprints such as one week sprints. For more mature projects, longer one month sprints might be more appropriate. A team can pick whatever Sprint length makes sense for them just as long as the team is consistent. You should pick a Sprint length and stick with it. Daily Scrum During a Sprint, the developer team needs to have meetings to coordinate their work on completing the items in the Sprint Backlog. For example, the team needs to discuss who is working on what and whether any blocking issues have been discovered. Developers hate meetings (well, sane developers hate meetings). Meetings take developers away from their work of actually implementing stuff as opposed to talking about implementing stuff. However, a developer team which never has meetings and never coordinates their work also has problems. For example, Fred might get stuck on a programming problem for days and never reach out for help even though Tom (who sits in the cubicle next to him) has already solved the very same problem. Or, both Ted and Fred might have started working on the same item from the Sprint Backlog at the same time. In Scrum, these conflicting needs – limiting meetings but enabling team coordination – are resolved with the idea of the Daily Scrum. The Daily Scrum is a meeting for coordinating the work of the developer team which happens once a day. To keep the meeting short, each developer answers only the following three questions: 1. What have you done since yesterday? 2. What do you plan to do today? 3. Any impediments in your way? During the Daily Scrum, developers are not allowed to talk about issues with their cat, do demos of their latest work, or tell heroic stories of programming problems overcome. The meeting must be kept short — typically about 15 minutes. Issues which come up during the Daily Scrum should be discussed in separate meetings which do not involve the whole developer team. Stories and Tasks Items in the Product or Sprint Backlog – such as building a shopping cart or creating a Facebook page – are often referred to as User Stories or Stories. The Stories are created by the Product Owner and should represent some business need. Unlike the Product Owner, the developer team needs to think about how a Story should be implemented. At the beginning of a Sprint, the developer team takes the Stories from the Sprint Backlog and breaks the stories into tasks. For example, the developer team might take the Create a Shopping Cart story and break it into the following tasks: · Enable users to add and remote items from shopping cart · Persist the shopping cart to database between visits · Redirect user to checkout page when Checkout button is clicked During the Daily Scrum, members of the developer team volunteer to complete the tasks required to implement the next Story in the Sprint Backlog. When a developer talks about what he did yesterday or plans to do tomorrow then the developer should be referring to a task. Stories are owned by the Product Owner and a story is all about business value. In contrast, the tasks are owned by the developer team and a task is all about implementation details. A story might take several days or weeks to complete. A task is something which a developer can complete in less than a day. Some teams get lazy about breaking stories into tasks. Neglecting to break stories into tasks can lead to “Never Ending Stories” If you don’t break a story into tasks, then you can’t know how much of a story has actually been completed because you don’t have a clear idea about the implementation steps required to complete the story. Scrumboard During the Daily Scrum, the developer team uses a Scrumboard to coordinate their work. A Scrumboard contains a list of the stories for the current Sprint, the tasks associated with each Story, and the state of each task. The developer team uses the Scrumboard so everyone on the team can see, at a glance, what everyone is working on. As a developer works on a task, the task moves from state to state and the state of the task is updated on the Scrumboard. Common task states are ToDo, In Progress, and Done. Some teams include additional task states such as Needs Review or Needs Testing. Some teams use a physical Scrumboard. In that case, you use index cards to represent the stories and the tasks and you tack the index cards onto a physical board. Using a physical Scrumboard has several disadvantages. A physical Scrumboard does not work well with a distributed team – for example, it is hard to share the same physical Scrumboard between Boston and Seattle. Also, generating reports from a physical Scrumboard is more difficult than generating reports from an online Scrumboard. Estimating Stories and Tasks Stakeholders in a project, the people investing in a project, need to have an idea of how a project is progressing and when the project will be completed. For example, if you are investing in creating an e-commerce site, you need to know when the site can be launched. It is not enough to just say that “the project will be done when it is done” because the stakeholders almost certainly have a limited budget to devote to the project. The people investing in the project cannot determine the business value of the project unless they can have an estimate of how long it will take to complete the project. Developers hate to give estimates. The reason that developers hate to give estimates is that the estimates are almost always completely made up. For example, you really don’t know how long it takes to build a shopping cart until you finish building a shopping cart, and at that point, the estimate is no longer useful. The problem is that writing code is much more like Finding a Cure for Cancer than Building a Brick Wall. Building a brick wall is very straightforward. After you learn how to add one brick to a wall, you understand everything that is involved in adding a brick to a wall. There is no additional research required and no surprises. If, on the other hand, I assembled a team of scientists and asked them to find a cure for cancer, and estimate exactly how long it will take, they would have no idea. The problem is that there are too many unknowns. I don’t know how to cure cancer, I need to do a lot of research here, so I cannot even begin to estimate how long it will take. So developers hate to provide estimates, but the Product Owner and other product stakeholders, have a legitimate need for estimates. Scrum resolves this conflict by using the idea of Story Points. Different teams use different units to represent Story Points. For example, some teams use shirt sizes such as Small, Medium, Large, and X-Large. Some teams prefer to use Coffee Cup sizes such as Tall, Short, and Grande. Finally, some teams like to use numbers from the Fibonacci series. These alternative units are converted into a Story Point value. Regardless of the type of unit which you use to represent Story Points, the goal is the same. Instead of attempting to estimate a Story in hours (which is doomed to failure), you use a much less fine-grained measure of work. A developer team is much more likely to be able to estimate that a Story is Small or X-Large than the exact number of hours required to complete the story. So you can think of Story Points as a compromise between the needs of the Product Owner and the developer team. When a Sprint starts, the developer team devotes more time to thinking about the Stories in a Sprint and the developer team breaks the Stories into Tasks. In Scrum, you estimate the work required to complete a Story by using Story Points and you estimate the work required to complete a task by using hours. The difference between Stories and Tasks is that you don’t create a task until you are just about ready to start working on a task. A task is something that you should be able to create within a day, so you have a much better chance of providing an accurate estimate of the work required to complete a task than a story. Burndown Charts In Scrum, you use Burndown charts to represent the remaining work on a project. You use Release Burndown charts to represent the overall remaining work for a project and you use Sprint Burndown charts to represent the overall remaining work for a particular Sprint. You create a Release Burndown chart by calculating the remaining number of uncompleted Story Points for the entire Product Backlog every day. The vertical axis represents Story Points and the horizontal axis represents time. A Sprint Burndown chart is similar to a Release Burndown chart, but it focuses on the remaining work for a particular Sprint. There are two different types of Sprint Burndown charts. You can either represent the remaining work in a Sprint with Story Points or with task hours (the following image, taken from Wikipedia, uses hours). When each Product Backlog Story is completed, the Release Burndown chart slopes down. When each Story or task is completed, the Sprint Burndown chart slopes down. Burndown charts typically do not always slope down over time. As new work is added to the Product Backlog, the Release Burndown chart slopes up. If new tasks are discovered during a Sprint, the Sprint Burndown chart will also slope up. The purpose of a Burndown chart is to give you a way to track team progress over time. If, halfway through a Sprint, the Sprint Burndown chart is still climbing a hill then you know that you are in trouble. Team Velocity Stakeholders in a project always want more work done faster. For example, the Product Owner for the e-commerce site wants the website to launch before tomorrow. Developers tend to be overly optimistic. Rarely do developers acknowledge the physical limitations of reality. So Project stakeholders and the developer team often collude to delude themselves about how much work can be done and how quickly. Too many software projects begin in a state of optimism and end in frustration as deadlines zoom by. In Scrum, this problem is overcome by calculating a number called the Team Velocity. The Team Velocity is a measure of the average number of Story Points which a team has completed in previous Sprints. Knowing the Team Velocity is important during the Sprint Planning meeting when the Product Owner and the developer team work together to determine the number of stories which can be completed in the next Sprint. If you know the Team Velocity then you can avoid committing to do more work than the team has been able to accomplish in the past, and your team is much more likely to complete all of the work required for the next Sprint. Scrum Master There are three roles in Scrum: the Product Owner, the developer team, and the Scrum Master. I’v e already discussed the Product Owner. The Product Owner is the one and only person who maintains the Product Backlog and prioritizes the stories. I’ve also described the role of the developer team. The members of the developer team do the work of implementing the stories by breaking the stories into tasks. The final role, which I have not discussed, is the role of the Scrum Master. The Scrum Master is responsible for ensuring that the team is following the Scrum process. For example, the Scrum Master is responsible for making sure that there is a Daily Scrum meeting and that everyone answers the standard three questions. The Scrum Master is also responsible for removing (non-technical) impediments which the team might encounter. For example, if the team cannot start work until everyone installs the latest version of Microsoft Visual Studio then the Scrum Master has the responsibility of working with management to get the latest version of Visual Studio as quickly as possible. The Scrum Master can be a member of the developer team. Furthermore, different people can take on the role of the Scrum Master over time. The Scrum Master, however, cannot be the same person as the Product Owner. Using SonicAgile SonicAgile (SonicAgile.com) is an online tool which you can use to manage your projects using Scrum. You can use the SonicAgile Product Backlog to create a prioritized list of stories. You can estimate the size of the Stories using different Story Point units such as Shirt Sizes and Coffee Cup sizes. You can use SonicAgile during the Sprint Planning meeting to select the Stories that you want to complete during a particular Sprint. You can configure Sprints to be any length of time. SonicAgile calculates Team Velocity automatically and displays a warning when you add too many stories to a Sprint. In other words, it warns you when it thinks you are overcommitting in a Sprint. SonicAgile also includes a Scrumboard which displays the list of Stories selected for a Sprint and the tasks associated with each story. You can drag tasks from one task state to another. Finally, SonicAgile enables you to generate Release Burndown and Sprint Burndown charts. You can use these charts to view the progress of your team. To learn more about SonicAgile, visit SonicAgile.com. Summary In this post, I described many of the basic concepts of Scrum. You learned how a Product Owner uses a Product Backlog to create a prioritized list of tasks. I explained why work is completed in Sprints so the developer team can be more productive. I also explained how a developer team uses the daily scrum to coordinate their work. You learned how the developer team uses a Scrumboard to see, at a glance, who is working on what and the state of each task. I also discussed Burndown charts. You learned how you can use both Release and Sprint Burndown charts to track team progress in completing a project. Finally, I described the crucial role of the Scrum Master – the person who is responsible for ensuring that the rules of Scrum are being followed. My goal was not to describe all of the concepts of Scrum. This post was intended to be an introductory overview. For a comprehensive explanation of Scrum, I recommend reading Ken Schwaber’s book Agile Project Management with Scrum: http://www.amazon.com/Agile-Project-Management-Microsoft-Professional/dp/073561993X/ref=la_B001H6ODMC_1_1?ie=UTF8&qid=1345224000&sr=1-1

Read the article

What Every Developer Should Know About MSI Components

- by Alois Kraus

Hopefully nothing. But if you have to do more than simple XCopy deployment and you need to support updates, upgrades and perhaps side by side scenarios there is no way around MSI. You can create Msi files with a Visual Studio Setup project which is severely limited or you can use the Windows Installer Toolset. I cannot talk about WIX with my German colleagues because WIX has a very special meaning. It is funny to always use the long name when I talk about deployment possibilities. Alternatively you can buy commercial tools which help you to author Msi files but I am not sure how good they are. Given enough pain with existing solutions you can also learn the MSI Apis and create your own packaging solution. If I were you I would use either a commercial visual tool when you do easy deployments or use the free Windows Installer Toolset. Once you know the WIX schema you can create well formed wix xml files easily with any editor. Then you can “compile” from the wxs files your Msi package. Recently I had the “pleasure” to get my hands dirty with C++ (again) and the MSI technology. Installation is a complex topic but after several month of digging into arcane MSI issues I can safely say that there should exist an easier way to install and update files as today. I am not alone with this statement as John Robbins (creator of the cool tool Paraffin) states: “.. It's a brittle and scary API in Windows …”. To help other people struggling with installation issues I present you the advice I (and others) found useful and what will happen if you ignore this advice. What is a MSI file? A MSI file is basically a database with tables which reference each other to control how your un/installation should work. The basic idea is that you declare via these tables what you want to install and MSI controls the how to get your stuff onto or off your machine. Your “stuff” consists usually of files, registry keys, shortcuts and environment variables. Therefore the most important tables are File, Registry, Environment and Shortcut table which define what will be un/installed. The key to master MSI is that every resource (file, registry key ,…) is associated with a MSI component. The actual payload consists of compressed files in the CAB format which can either be embedded into the MSI file or reside beside the MSI file or in a subdirectory below it. To examine MSI files you need Orca a free MSI editor provided by MS. There is also another free editor called Super Orca which does support diffs between MSI and it does not lock the MSI files. But since Orca comes with a shell extension I tend to use only Orca because it is so easy to right click on a MSI file and open it with this tool. How Do I Install It? Double click it. This does work for fresh installations as well as major upgrades. Updates need to be installed via the command line via msiexec /i <msi> REINSTALL=ALL REINSTALLMODE=vomus This tells the installer to reinstall all already installed features (new features will NOT be installed). The reinstallmode letters do force an overwrite of the old cached package in the %WINDIR%\Installer folder. All files, shortcuts and registry keys are redeployed if they are missing or need to be replaced with a newer version. When things did go really wrong and you want to overwrite everything unconditionally use REINSTALLMODE=vamus. How To Enable MSI Logs? You can download a MSI from Microsoft which installs some registry keys to enable full MSI logging. The log files can be found in your %TEMP% folder and are called MSIxxxx.log. Alternatively you can add to your msiexec command line the option msiexec …. /l*vx <LogFileName> Personally I find it rather strange that * does not mean full logging. To really get all logs I need to add v and x which is documented in the msiexec help but I still find this behavior unintuitive. What are MSI components? The whole MSI logic is bound to the concept of MSI components. Nearly every msi table has a Component column which binds an installable resource to a component. Below are the screenshots of the FeatureComponents and Component table of an example MSI. The Feature table defines basically the feature hierarchy. To find out what belongs to a feature you need to look at the FeatureComponents table where for each feature the components are listed which will be installed when a feature is installed. The MSI components are defined in the Component table. This table has as first column the component name and as second column the component id which is a GUID. All resources you want to install belong to a MSI component. Therefore nearly all MSI tables have a Component_ column which contains the component name. If you look e.g. a the File table you see that every file belongs to a component which is true for all other tables which install resources. The component table is the glue between all other tables which contain the resources you want to install. So far so easy. Why is MSI then so complex? Most MSI problems arise from the fact that you did violate a MSI component rule in one or the other way. When you install a feature the reference count for all components belonging to this feature will increase by one. If your component is installed by more than one feature it will get a higher refcount. When you uninstall a feature its refcount will drop by one. Interesting things happen if the component reference count reaches zero: Then all associated resources will be deleted. That looks like a reasonable thing and it is. What it makes complex are the strange component rules you have to follow. Below are some important component rules from the Tao of the Windows Installer … Rule 16: Follow Component Rules Components are a very important part of the Installer technology. They are the means whereby the Installer manages the resources that make up your application. The SDK provides the following guidelines for creating components in your package: Never create two components that install a resource under the same name and target location. If a resource must be duplicated in multiple components, change its name or target location in each component. This rule should be applied across applications, products, product versions, and companies. Two components must not have the same key path file. This is a consequence of the previous rule. The key path value points to a particular file or folder belonging to the component that the installer uses to detect the component. If two components had the same key path file, the installer would be unable to distinguish which component is installed. Two components however may share a key path folder. Do not create a version of a component that is incompatible with all previous versions of the component. This rule should be applied across applications, products, product versions, and companies. Do not create components containing resources that will need to be installed into more than one directory on the user’s system. The installer installs all of the resources in a component into the same directory. It is not possible to install some resources into subdirectories. Do not include more than one COM server per component. If a component contains a COM server, this must be the key path for the component. Do not specify more than one file per component as a target for the Start menu or a Desktop shortcut. … And these rules do not even talk about component ids, update packages and upgrades which you need to understand as well. Lets suppose you install two MSIs (MSI1 and MSI2) which have the same ComponentId but different component names. Both do install the same file. What will happen when you uninstall MSI2? Hm the file should stay there. But the component names are different. Yes and yes. But MSI uses not use the component name as key for the refcount. Instead the ComponentId column of the Component table which contains a GUID is used as identifier under which the refcount is stored. The components Comp1 and Comp2 are identical from the MSI perspective. After the installation of both MSIs the Component with the Id {100000….} has a refcount of two. After uninstallation of one MSI there is still a refcount of one which drops to zero just as expected when we uninstall the last msi. Then the file which was the same for both MSIs is deleted. You should remember that MSI keeps a refcount across MSIs for components with the same component id. MSI does manage components not the resources you did install. The resources associated with a component are then and only then deleted when the refcount of the component reaches zero. The dependencies between features, components and resources can be described as relations. m,k are numbers >= 1, n can be 0. Inside a MSI the following relations are valid Feature 1 –> n Components Component 1 –> m Features Component 1 –> k Resources These relations express that one feature can install several components and features can share components between them. Every (meaningful) component will install at least one resource which means that its name (primary key to stay in database speak) does occur in some other table in the Component column as value which installs some resource. Lets make it clear with an example. We want to install with the feature MainFeature some files a registry key and a shortcut. We can then create components Comp1..3 which are referenced by the resources defined in the corresponding tables. Feature Component Registry File Shortcuts MainFeature Comp1 RegistryKey1 MainFeature Comp2 File.txt MainFeature Comp3 File2.txt Shortcut to File2.txt It is illegal that the same resource is part of more than one component since this would break the refcount mechanism. Lets illustrate this: Feature ComponentId Resource Reference Count Feature1 {1000-…} File1.txt 1 Feature2 {2000-….} File1.txt 1 The installation part works well but what happens when you uninstall Feature2? Component {20000…} gets a refcount of zero where MSI deletes all resources belonging to this component. In this case File1.txt will be deleted. But Feature1 still has another component {10000…} with a refcount of one which means that the file was deleted too early. You just have ruined your installation. To fix it you then need to click on the Repair button under Add/Remove Programs to let MSI reinstall any missing registry keys, files or shortcuts. The vigilant reader might has noticed that there is more in the Component table. Beside its name and GUID it has also an installation directory, attributes and a KeyPath. The KeyPath is a reference to a file or registry key which is used to detect if the component is already installed. This becomes important when you repair or uninstall a component. To find out if the component is already installed MSI checks if the registry key or file referenced by the KeyPath property does exist. When it does not exist it assumes that it was either already uninstalled (can lead to problems during uninstall) or that it is already installed and all is fine. Why is this detail so important? Lets put all files into one component. The KeyPath should be then one of the files of your component to check if it was installed or not. When your installation becomes corrupt because a file was deleted you cannot repair it with the Repair button under Add/Remove Programs because MSI checks the component integrity via the Resource referenced by its KeyPath. As long as you did not delete the KeyPath file MSI thinks all resources with your component are installed and never executes any repair action. You get even more trouble when you try to remove files during an upgrade (you cannot remove files during an update) from your super component which contains all files. The only way out and therefore best practice is to assign for every resource you want to install an extra component. This ensures painless updatability and repairs and you have much less effort to remove specific files during an upgrade. In effect you get this best practice relation Feature 1 –> n Components Component 1 –> 1 Resources MSI Component Rules Rule 1 – One component per resource Every resource you want to install (file, registry key, value, environment value, shortcut, directory, …) must get its own component which does never change between versions as long as the install location is the same. Penalty If you add more than one resources to a component you will break the repair capability of MSI because the KeyPath is used to check if the component needs repair. MSI ComponentId Files MSI 1.0 {1000} File1-5 MSI 2.0 {2000} File2-5 You want to remove File1 in version 2.0 of your MSI. Since you want to keep the other files you create a new component and add them there. MSI will delete all files if the component refcount of {1000} drops to zero. The files you want to keep are added to the new component {2000}. Ok that does work if your upgrade does uninstall the old MSI first. This will cause the refcount of all previously installed components to reach zero which means that all files present in version 1.0 are deleted. But there is a faster way to perform your upgrade by first installing your new MSI and then remove the old one. If you choose this upgrade path then you will loose File1-5 after your upgrade and not only File1 as intended by your new component design. Rule 2 – Only add, never remove resources from a component If you did follow rule 1 you will not need Rule 2. You can add in a patch more resources to one component. That is ok. But you can never remove anything from it. There are tricky ways around that but I do not want to encourage bad component design. Penalty Lets assume you have 2 MSI files which install under the same component one file MSI1 MSI2 {1000} - ComponentId {1000} – ComponentId File1.txt File2.txt When you install and uninstall both MSIs you will end up with an installation where either File1 or File2 will be left. Why? It seems that MSI does not store the resources associated with each component in its internal database. Instead Windows will simply query the MSI that is currently uninstalled for all resources belonging to this component. Since it will find only one file and not two it will only uninstall one file. That is the main reason why you never can remove resources from a component! Rule 3 Never Remove A Component From an Update MSI. This is the same as if you change the GUID of a component by accident for your new update package. The resulting update package will not contain all components from the previously installed package. Penalty When you remove a component from a feature MSI will set the feature state during update to Advertised and log a warning message into its log file when you did enable MSI logging. SELMGR: ComponentId '{2DCEA1BA-3E27-E222-484C-D0D66AEA4F62}' is registered to feature 'xxxxxxx, but is not present in the Component table. Removal of components from a feature is not supported! MSI (c) (24:44) [07:53:13:436]: SELMGR: Removal of a component from a feature is not supported Advertised means that MSI treats all components of this feature as not installed. As a consequence during uninstall nothing will be removed since it is not installed! This is not only bad because uninstall does no longer work but this feature will also not get the required patches. All other features which have followed component versioning rules for update packages will be updated but the one faulty feature will not. This results in very hard to find bugs why an update was only partially successful. Things got better with Windows Installer 4.5 but you cannot rely on that nobody will use an older installer. It is a good idea to add to your update msiexec call MSIENFORCEUPGRADECOMPONENTRULES=1 which will abort the installation if you did violate this rule.

Read the article

Extending NerdDinner: Adding Geolocated Flair

- by Jon Galloway

NerdDinner is a website with the audacious goal of “Organizing the world’s nerds and helping them eat in packs.” Because nerds aren’t likely to socialize with others unless a website tells them to do it. Scott Hanselman showed off a lot of the cool features we’ve added to NerdDinner lately during his popular talk at MIX10, Beyond File | New Company: From Cheesy Sample to Social Platform. Did you miss it? Go ahead and watch it, I’ll wait. One of the features we wanted to add was flair. You know about flair, right? It’s a way to let folks who like your site show it off in their own site. For example, here’s my StackOverflow flair: Great! So how could we add some of this flair stuff to NerdDinner? What do we want to show? If we’re going to encourage our users to give up a bit of their beautiful website to show off a bit of ours, we need to think about what they’ll want to show. For instance, my StackOverflow flair is all about me, not StackOverflow. So how will this apply to NerdDinner? Since NerdDinner is all about organizing local dinners, in order for the flair to be useful it needs to make sense for the person viewing the web page. If someone visits from Egypt visits my blog, they should see information about NerdDinners in Egypt. That’s geolocation – localizing site content based on where the browser’s sitting, and it makes sense for flair as well as entire websites. So we’ll set up a simple little callout that prompts them to host a dinner in their area: Hopefully our flair works and there is a dinner near your viewers, so they’ll see another view which lists upcoming dinners near them: The Geolocation Part Generally website geolocation is done by mapping the requestor’s IP address to a geographic area. It’s not an exact science, but I’ve always found it to be pretty accurate. There are (at least) three ways to handle it: You pay somebody like MaxMind for a database (with regular updates) that sits on your server, and you use their API to do lookups. I used this on a pretty big project a few years ago and it worked well. You use HTML 5 Geolocation API or Google Gears or some other browser based solution. I think those are cool (I use Google Gears a lot), but they’re both in flux right now and I don’t think either has a wide enough of an install base yet to rely on them. You might want to, but I’ve heard you do all kinds of crazy stuff, and sometimes it gets you in trouble. I don’t mean talk out of line, but we all laugh behind your back a bit. But, hey, it’s up to you. It’s your flair or whatever. There are some free webservices out there that will take an IP address and give you location information. Easy, and works for everyone. That’s what we’re doing. I looked at a few different services and settled on IPInfoDB. It’s free, has a great API, and even returns JSON, which is handy for Javascript use. The IP query is pretty simple. We hit a URL like this: http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=false … and we get an XML response back like this… <?xml version="1.0" encoding="UTF-8"?> <Response> <Ip>74.125.45.100</Ip> <Status>OK</Status> <CountryCode>US</CountryCode> <CountryName>United States</CountryName> <RegionCode>06</RegionCode> <RegionName>California</RegionName> <City>Mountain View</City> <ZipPostalCode>94043</ZipPostalCode> <Latitude>37.4192</Latitude> <Longitude>-122.057</Longitude> </Response> So we’ll build some data transfer classes to hold the location information, like this: public class LocationInfo { public string Country { get; set; } public string RegionName { get; set; } public string City { get; set; } public string ZipPostalCode { get; set; } public LatLong Position { get; set; } } public class LatLong { public float Lat { get; set; } public float Long { get; set; } } And now hitting the service is pretty simple: public static LocationInfo HostIpToPlaceName(string ip) { string url = "http://ipinfodb.com/ip_query.php?ip={0}&timezone=false"; url = String.Format(url, ip); var result = XDocument.Load(url); var location = (from x in result.Descendants("Response") select new LocationInfo { City = (string)x.Element("City"), RegionName = (string)x.Element("RegionName"), Country = (string)x.Element("CountryName"), ZipPostalCode = (string)x.Element("CountryName"), Position = new LatLong { Lat = (float)x.Element("Latitude"), Long = (float)x.Element("Longitude") } }).First(); return location; } Getting The User’s IP Okay, but first we need the end user’s IP, and you’d think it would be as simple as reading the value from HttpContext: HttpContext.Current.Request.UserHostAddress But you’d be wrong. Sorry. UserHostAddress just wraps HttpContext.Current.Request.ServerVariables["REMOTE_ADDR"], but that doesn’t get you the IP for users behind a proxy. That’s in another header, “HTTP_X_FORWARDED_FOR". So you can either hit a wrapper and then check a header, or just check two headers. I went for uniformity: string SourceIP = string.IsNullOrEmpty(Request.ServerVariables["HTTP_X_FORWARDED_FOR"]) ? Request.ServerVariables["REMOTE_ADDR"] : Request.ServerVariables["HTTP_X_FORWARDED_FOR"]; We’re almost set to wrap this up, but first let’s talk about our views. Yes, views, because we’ll have two. Selecting the View We wanted to make it easy for people to include the flair in their sites, so we looked around at how other people were doing this. The StackOverflow folks have a pretty good flair system, which allows you to include the flair in your site as either an IFRAME reference or a Javascript include. We’ll do both. We have a ServicesController to handle use of the site information outside of NerdDinner.com, so this fits in pretty well there. We’ll be displaying the same information for both HTML and Javascript flair, so we can use one Flair controller action which will return a different view depending on the requested format. Here’s our general flow for our controller action: Get the user’s IP Translate it to a location Grab the top three upcoming dinners that are near that location Select the view based on the format (defaulted to “html”) Return a FlairViewModel which contains the list of dinners and the location information public ActionResult Flair(string format = "html") { string SourceIP = string.IsNullOrEmpty( Request.ServerVariables["HTTP_X_FORWARDED_FOR"]) ? Request.ServerVariables["REMOTE_ADDR"] : Request.ServerVariables["HTTP_X_FORWARDED_FOR"]; var location = GeolocationService.HostIpToPlaceName(SourceIP); var dinners = dinnerRepository. FindByLocation(location.Position.Lat, location.Position.Long). OrderByDescending(p => p.EventDate).Take(3); // Select the view we'll return. // Using a switch because we'll add in JSON and other formats later. string view; switch (format.ToLower()) { case "javascript": view = "JavascriptFlair"; break; default: view = "Flair"; break; } return View( view, new FlairViewModel { Dinners = dinners.ToList(), LocationName = string.IsNullOrEmpty(location.City) ? "you" : String.Format("{0}, {1}", location.City, location.RegionName) } ); } Note: I’m not in love with the logic here, but it seems like overkill to extract the switch statement away when we’ll probably just have two or three views. What do you think? The HTML View The HTML version of the view is pretty simple – the only thing of any real interest here is the use of an extension method to truncate strings that are would cause the titles to wrap. public static string Truncate(this string s, int maxLength) { if (string.IsNullOrEmpty(s) || maxLength <= 0) return string.Empty; else if (s.Length > maxLength) return s.Substring(0, maxLength) + "..."; else return s; } So here’s how the HTML view ends up looking: <%@ Page Title="" Language="C#" Inherits="System.Web.Mvc.ViewPage<FlairViewModel>" %> <%@ Import Namespace="NerdDinner.Helpers" %> <%@ Import Namespace="NerdDinner.Models" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Nerd Dinner</title> <link href="/Content/Flair.css" rel="stylesheet" type="text/css" /> </head> <body> <div id="nd-wrapper"> <h2 id="nd-header">NerdDinner.com</h2> <div id="nd-outer"> <% if (Model.Dinners.Count == 0) { %> <div id="nd-bummer"> Looks like there's no Nerd Dinners near <%:Model.LocationName %> in the near future. Why not <a target="_blank" href="http://www.nerddinner.com/Dinners/Create">host one</a>?</div> <% } else { %> <h3> Dinners Near You</h3> <ul> <% foreach (var item in Model.Dinners) { %> <li> <%: Html.ActionLink(String.Format("{0} with {1} on {2}", item.Title.Truncate(20), item.HostedBy, item.EventDate.ToShortDateString()), "Details", "Dinners", new { id = item.DinnerID }, new { target = "_blank" })%></li> <% } %> </ul> <% } %> <div id="nd-footer"> More dinners and fun at <a target="_blank" href="http://nrddnr.com">http://nrddnr.com</a></div> </div> </div> </body> </html> You’d include this in a page using an IFRAME, like this: <IFRAME height=230 marginHeight=0 src="http://nerddinner.com/services/flair" frameBorder=0 width=160 marginWidth=0 scrolling=no></IFRAME> The Javascript view The Javascript flair is written so you can include it in a webpage with a simple script include, like this: <script type="text/javascript" src="http://nerddinner.com/services/flair?format=javascript"></script> The goal of this view is very similar to the HTML embed view, with a few exceptions: We’re creating a script element and adding it to the head of the document, which will then document.write out the content. Note that you have to consider if your users will actually have a <head> element in their documents, but for website flair use cases I think that’s a safe bet. Since the content is being added to the existing page rather than shown in an IFRAME, all links need to be absolute. That means we can’t use Html.ActionLink, since it generates relative routes. We need to escape everything since it’s being written out as strings. We need to set the content type to application/x-javascript. The easiest way to do that is to use the <%@ Page ContentType%> directive. <%@ Page Language="C#" Inherits="System.Web.Mvc.ViewPage<NerdDinner.Models.FlairViewModel>" ContentType="application/x-javascript" %> <%@ Import Namespace="NerdDinner.Helpers" %> <%@ Import Namespace="NerdDinner.Models" %> document.write('<script>var link = document.createElement(\"link\");link.href = \"http://nerddinner.com/content/Flair.css\";link.rel = \"stylesheet\";link.type = \"text/css\";var head = document.getElementsByTagName(\"head\")[0];head.appendChild(link);</script>'); document.write('<div id=\"nd-wrapper\"><h2 id=\"nd-header\">NerdDinner.com</h2><div id=\"nd-outer\">'); <% if (Model.Dinners.Count == 0) { %> document.write('<div id=\"nd-bummer\">Looks like there\'s no Nerd Dinners near <%:Model.LocationName %> in the near future. Why not <a target=\"_blank\" href=\"http://www.nerddinner.com/Dinners/Create\">host one</a>?</div>'); <% } else { %> document.write('<h3> Dinners Near You</h3><ul>'); <% foreach (var item in Model.Dinners) { %> document.write('<li><a target=\"_blank\" href=\"http://nrddnr.com/<%: item.DinnerID %>\"><%: item.Title.Truncate(20) %> with <%: item.HostedBy %> on <%: item.EventDate.ToShortDateString() %></a></li>'); <% } %> document.write('</ul>'); <% } %> document.write('<div id=\"nd-footer\"> More dinners and fun at <a target=\"_blank\" href=\"http://nrddnr.com\">http://nrddnr.com</a></div></div></div>'); Getting IP’s for Testing There are a variety of online services that will translate a location to an IP, which were handy for testing these out. I found http://www.itouchmap.com/latlong.html to be most useful, but I’m open to suggestions if you know of something better. Next steps I think the next step here is to minimize load – you know, in case people start actually using this flair. There are two places to think about – the NerdDinner.com servers, and the services we’re using for Geolocation. I usually think about caching as a first attack on server load, but that’s less helpful here since every user will have a different IP. Instead, I’d look at taking advantage of Asynchronous Controller Actions, a cool new feature in ASP.NET MVC 2. Async Actions let you call a potentially long-running webservice without tying up a thread on the server while waiting for the response. There’s some good info on that in the MSDN documentation, and Dino Esposito wrote a great article on Asynchronous ASP.NET Pages in the April 2010 issue of MSDN Magazine. But let’s think of the children, shall we? What about ipinfodb.com? Well, they don’t have specific daily limits, but they do throttle you if you put a lot of traffic on them. From their FAQ: We do not have a specific daily limit but queries that are at a rate faster than 2 per second will be put in "queue". If you stay below 2 queries/second everything will be normal. If you go over the limit, you will still get an answer for all queries but they will be slowed down to about 1 per second. This should not affect most users but for high volume websites, you can either use our IP database on your server or we can whitelist your IP for 5$/month (simply use the donate form and leave a comment with your server IP). Good programming practices such as not querying our API for all page views (you can store the data in a cookie or a database) will also help not reaching the limit. So the first step there is to save the geolocalization information in a time-limited cookie, which will allow us to look up the local dinners immediately without having to hit the geolocation service.

Read the article

ASP.NET Frameworks and Raw Throughput Performance

- by Rick Strahl

A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012 removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this: It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type. ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results: The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

Advanced TSQL Tuning: Why Internals Knowledge Matters

- by Paul White

There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes. Query tuning is not complete as soon as the query returns results quickly in the development or test environments. In production, your query will compete for memory, CPU, locks, I/O and other resources on the server. Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data. In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row. The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type. A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL, CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL, CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table. The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results. This seems slow, considering that the table only has 50,000 rows. Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time. Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring. The script below monitors STATISTICS IO output and the amount of tempdb used by the test query. We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB. The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row. With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first). This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts. In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted. SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed. Sorts typically require a very large number of comparisons, so this is usually a very effective optimization. One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself. SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use. The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory. Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant. The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources. More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1). Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB. The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file. Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case. In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms. If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much. Why is the data size so much smaller? The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored. By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output. You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead. SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows. The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes. The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page. There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option. By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row. SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage. You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now. This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query). SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant. No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers. Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’. Why? Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed. SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages. This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this? Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need. 245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory. Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily? That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column. That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal. I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator. Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted. We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need. The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb. The small remaining inefficiency is in reading the id column values from the clustered primary key index. As a clustered index, it contains all the in-row data at its leaf. The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead. The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure. This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only. This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data. The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings. It isn’t about the internal differences between TEXT and MAX data types either. It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load. No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance. Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows. 5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is! If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb. Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough. Tuning for logical reads and adding covering indexes is not enough. If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on. The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008. They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator. Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

Read the article

Using the ASP.NET Cache to cache data in a Model or Business Object layer, without a dependency on System.Web in the layer - Part One.

- by Rhames

ASP.NET applications can make use of the System.Web.Caching.Cache object to cache data and prevent repeated expensive calls to a database or other store. However, ideally an application should make use of caching at the point where data is retrieved from the database, which typically is inside a Business Objects or Model layer. One of the key features of using a UI pattern such as Model-View-Presenter (MVP) or Model-View-Controller (MVC) is that the Model and Presenter (or Controller) layers are developed without any knowledge of the UI layer. Introducing a dependency on System.Web into the Model layer would break this independence of the Model from the View. This article gives a solution to this problem, using dependency injection to inject the caching implementation into the Model layer at runtime. This allows caching to be used within the Model layer, without any knowledge of the actual caching mechanism that will be used. Create a sample application to use the caching solution Create a test SQL Server database This solution uses a SQL Server database with the same Sales data used in my previous post on calculating running totals. The advantage of using this data is that it gives nice slow queries that will exaggerate the effect of using caching! To create the data, first create a new SQL database called CacheSample. Next run the following script to create the Sale table and populate it: USE CacheSample GO CREATE TABLE Sale(DayCount smallint, Sales money) CREATE CLUSTERED INDEX ndx_DayCount ON Sale(DayCount) go INSERT Sale VALUES (1,120) INSERT Sale VALUES (2,60) INSERT Sale VALUES (3,125) INSERT Sale VALUES (4,40) DECLARE @DayCount smallint, @Sales money SET @DayCount = 5 SET @Sales = 10 WHILE @DayCount < 5000 BEGIN INSERT Sale VALUES (@DayCount,@Sales) SET @DayCount = @DayCount + 1 SET @Sales = @Sales + 15 END Next create a stored procedure to calculate the running total, and return a specified number of rows from the Sale table, using the following script: USE [CacheSample] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO -- ============================================= -- Author: Robin -- Create date: -- Description: -- ============================================= CREATE PROCEDURE [dbo].[spGetRunningTotals] -- Add the parameters for the stored procedure here @HighestDayCount smallint = null AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; IF @HighestDayCount IS NULL SELECT @HighestDayCount = MAX(DayCount) FROM dbo.Sale DECLARE @SaleTbl TABLE (DayCount smallint, Sales money, RunningTotal money) DECLARE @DayCount smallint, @Sales money, @RunningTotal money SET @RunningTotal = 0 SET @DayCount = 0 DECLARE rt_cursor CURSOR FOR SELECT DayCount, Sales FROM Sale ORDER BY DayCount OPEN rt_cursor FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales WHILE @@FETCH_STATUS = 0 AND @DayCount <= @HighestDayCount BEGIN SET @RunningTotal = @RunningTotal + @Sales INSERT @SaleTbl VALUES (@DayCount,@Sales,@RunningTotal) FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales END CLOSE rt_cursor DEALLOCATE rt_cursor SELECT DayCount, Sales, RunningTotal FROM @SaleTbl END GO Create the Sample ASP.NET application In Visual Studio create a new solution and add a class library project called CacheSample.BusinessObjects and an ASP.NET web application called CacheSample.UI. The CacheSample.BusinessObjects project will contain a single class to represent a Sale data item, with all the code to retrieve the sales from the database included in it for simplicity (normally I would at least have a separate Repository or other object that is responsible for retrieving data, and probably a data access layer as well, but for this sample I want to keep it simple). The C# code for the Sale class is shown below: using System; using System.Collections.Generic; using System.Data; using System.Data.SqlClient; namespace CacheSample.BusinessObjects { public class Sale { public Int16 DayCount { get; set; } public decimal Sales { get; set; } public decimal RunningTotal { get; set; } public static IEnumerable<Sale> GetSales(int? highestDayCount) { List<Sale> sales = new List<Sale>(); SqlParameter highestDayCountParameter = new SqlParameter("@HighestDayCount", SqlDbType.SmallInt); if (highestDayCount.HasValue) highestDayCountParameter.Value = highestDayCount; else highestDayCountParameter.Value = DBNull.Value; string connectionStr = System.Configuration.ConfigurationManager .ConnectionStrings["CacheSample"].ConnectionString; using(SqlConnection sqlConn = new SqlConnection(connectionStr)) using (SqlCommand sqlCmd = sqlConn.CreateCommand()) { sqlCmd.CommandText = "spGetRunningTotals"; sqlCmd.CommandType = CommandType.StoredProcedure; sqlCmd.Parameters.Add(highestDayCountParameter); sqlConn.Open(); using (SqlDataReader dr = sqlCmd.ExecuteReader()) { while (dr.Read()) { Sale newSale = new Sale(); newSale.DayCount = dr.GetInt16(0); newSale.Sales = dr.GetDecimal(1); newSale.RunningTotal = dr.GetDecimal(2); sales.Add(newSale); } } } return sales; } } } The static GetSale() method makes a call to the spGetRunningTotals stored procedure and then reads each row from the returned SqlDataReader into an instance of the Sale class, it then returns a List of the Sale objects, as IEnnumerable<Sale>. A reference to System.Configuration needs to be added to the CacheSample.BusinessObjects project so that the connection string can be read from the web.config file. In the CacheSample.UI ASP.NET project, create a single web page called ShowSales.aspx, and make this the default start up page. This page will contain a single button to call the GetSales() method and a label to display the results. The html mark up and the C# code behind are shown below: ShowSales.aspx <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="ShowSales.aspx.cs" Inherits="CacheSample.UI.ShowSales" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title>Cache Sample - Show All Sales</title> </head> <body> <form id="form1" runat="server"> <div> <asp:Button ID="btnTest1" runat="server" onclick="btnTest1_Click" Text="Get All Sales" />     <asp:Label ID="lblResults" runat="server"></asp:Label> </div> </form> </body> </html> ShowSales.aspx.cs using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.UI; using System.Web.UI.WebControls; using CacheSample.BusinessObjects; namespace CacheSample.UI { public partial class ShowSales : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { } protected void btnTest1_Click(object sender, EventArgs e) { System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch(); stopWatch.Start(); var sales = Sale.GetSales(null); var lastSales = sales.Last(); stopWatch.Stop(); lblResults.Text = string.Format( "Count of Sales: {0}, Last DayCount: {1}, Total Sales: {2}. Query took {3} ms", sales.Count(), lastSales.DayCount, lastSales.RunningTotal, stopWatch.ElapsedMilliseconds); } } } Finally we need to add a connection string to the CacheSample SQL Server database, called CacheSample, to the web.config file: <?xmlversion="1.0"?> <configuration> <connectionStrings> <addname="CacheSample" connectionString="data source=.\SQLEXPRESS;Integrated Security=SSPI;Initial Catalog=CacheSample" providerName="System.Data.SqlClient" /> </connectionStrings> <system.web> <compilationdebug="true"targetFramework="4.0" /> </system.web> </configuration> Run the application and click the button a few times to see how long each call to the database takes. On my system, each query takes about 450ms. Next I shall look at a solution to use the ASP.NET caching to cache the data returned by the query, so that subsequent requests to the GetSales() method are much faster. Adding Data Caching Support I am going to create my caching support in a separate project called CacheSample.Caching, so the next step is to add a class library to the solution. We shall be using the application configuration to define the implementation of our caching system, so we need a reference to System.Configuration adding to the project. ICacheProvider<T> Interface The first step in adding caching to our application is to define an interface, called ICacheProvider, in the CacheSample.Caching project, with methods to retrieve any data from the cache or to retrieve the data from the data source if it is not present in the cache. Dependency Injection will then be used to inject an implementation of this interface at runtime, allowing the users of the interface (i.e. the CacheSample.BusinessObjects project) to be completely unaware of how the caching is actually implemented. As data of any type maybe retrieved from the data source, it makes sense to use generics in the interface, with a generic type parameter defining the data type associated with a particular instance of the cache interface implementation. The C# code for the ICacheProvider interface is shown below: using System; using System.Collections.Generic; namespace CacheSample.Caching { public interface ICacheProvider { } public interface ICacheProvider<T> : ICacheProvider { T Fetch(string key, Func<T> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry); IEnumerable<T> Fetch(string key, Func<IEnumerable<T>> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry); } } The empty non-generic interface will be used as a type in a Dictionary generic collection later to store instances of the ICacheProvider<T> implementation for reuse, I prefer to use a base interface when doing this, as I think the alternative of using object makes for less clear code. The ICacheProvider<T> interface defines two overloaded Fetch methods, the difference between these is that one will return a single instance of the type T and the other will return an IEnumerable<T>, providing support for easy caching of collections of data items. Both methods will take a key parameter, which will uniquely identify the cached data, a delegate of type Func<T> or Func<IEnumerable<T>> which will provide the code to retrieve the data from the store if it is not present in the cache, and absolute or relative expiry policies to define when a cached item should expire. Note that at present there is no support for cache dependencies, but I shall be showing a method of adding this in part two of this article. CacheProviderFactory Class We need a mechanism of creating instances of our ICacheProvider<T> interface, using Dependency Injection to get the implementation of the interface. To do this we shall create a CacheProviderFactory static class in the CacheSample.Caching project. This factory will provide a generic static method called GetCacheProvider<T>(), which shall return instances of ICacheProvider<T>. We can then call this factory method with the relevant data type (for example the Sale class in the CacheSample.BusinessObject project) to get a instance of ICacheProvider for that type (e.g. call CacheProviderFactory.GetCacheProvider<Sale>() to get the ICacheProvider<Sale> implementation). The C# code for the CacheProviderFactory is shown below: using System; using System.Collections.Generic; using CacheSample.Caching.Configuration; namespace CacheSample.Caching { public static class CacheProviderFactory { private static Dictionary<Type, ICacheProvider> cacheProviders = new Dictionary<Type, ICacheProvider>(); private static object syncRoot = new object(); ///<summary> /// Factory method to create or retrieve an implementation of the /// ICacheProvider interface for type <typeparamref name="T"/>. ///</summary> ///<typeparam name="T"> /// The type that this cache provider instance will work with ///</typeparam> ///<returns>An instance of the implementation of ICacheProvider for type ///<typeparamref name="T"/>, as specified by the application /// configuration</returns> public static ICacheProvider<T> GetCacheProvider<T>() { ICacheProvider<T> cacheProvider = null; // Get the Type reference for the type parameter T Type typeOfT = typeof(T); // Lock the access to the cacheProviders dictionary // so multiple threads can work with it lock (syncRoot) { // First check if an instance of the ICacheProvider implementation // already exists in the cacheProviders dictionary for the type T if (cacheProviders.ContainsKey(typeOfT)) cacheProvider = (ICacheProvider<T>)cacheProviders[typeOfT]; else { // There is not already an instance of the ICacheProvider in // cacheProviders for the type T // so we need to create one // Get the Type reference for the application's implementation of // ICacheProvider from the configuration Type cacheProviderType = Type.GetType(CacheProviderConfigurationSection.Current. CacheProviderType); if (cacheProviderType != null) { // Now get a Type reference for the Cache Provider with the // type T generic parameter Type typeOfCacheProviderTypeForT = cacheProviderType.MakeGenericType(new Type[] { typeOfT }); if (typeOfCacheProviderTypeForT != null) { // Create the instance of the Cache Provider and add it to // the cacheProviders dictionary for future use cacheProvider = (ICacheProvider<T>)Activator. CreateInstance(typeOfCacheProviderTypeForT); cacheProviders.Add(typeOfT, cacheProvider); } } } } return cacheProvider; } } } As this code uses Activator.CreateInstance() to create instances of the ICacheProvider<T> implementation, which is a slow process, the factory class maintains a Dictionary of the previously created instances so that a cache provider needs to be created only once for each type. The type of the implementation of ICacheProvider<T> is read from a custom configuration section in the application configuration file, via the CacheProviderConfigurationSection class, which is described below. CacheProviderConfigurationSection Class The implementation of ICacheProvider<T> will be specified in a custom configuration section in the application’s configuration. To handle this create a folder in the CacheSample.Caching project called Configuration, and add a class called CacheProviderConfigurationSection to this folder. This class will extend the System.Configuration.ConfigurationSection class, and will contain a single string property called CacheProviderType. The C# code for this class is shown below: using System; using System.Configuration; namespace CacheSample.Caching.Configuration { internal class CacheProviderConfigurationSection : ConfigurationSection { public static CacheProviderConfigurationSection Current { get { return (CacheProviderConfigurationSection) ConfigurationManager.GetSection("cacheProvider"); } } [ConfigurationProperty("type", IsRequired=true)] public string CacheProviderType { get { return (string)this["type"]; } } } } Adding Data Caching to the Sales Class We now have enough code in place to add caching to the GetSales() method in the CacheSample.BusinessObjects.Sale class, even though we do not yet have an implementation of the ICacheProvider<T> interface. We need to add a reference to the CacheSample.Caching project to CacheSample.BusinessObjects so that we can use the ICacheProvider<T> interface within the GetSales() method. Once the reference is added, we can first create a unique string key based on the method name and the parameter value, so that the same cache key is used for repeated calls to the method with the same parameter values. Then we get an instance of the cache provider for the Sales type, using the CacheProviderFactory, and pass the existing code to retrieve the data from the database as the retrievalMethod delegate in a call to the Cache Provider Fetch() method. The C# code for the modified GetSales() method is shown below: public static IEnumerable<Sale> GetSales(int? highestDayCount) { string cacheKey = string.Format("CacheSample.BusinessObjects.GetSalesWithCache({0})", highestDayCount); return CacheSample.Caching.CacheProviderFactory. GetCacheProvider<Sale>().Fetch(cacheKey, delegate() { List<Sale> sales = new List<Sale>(); SqlParameter highestDayCountParameter = new SqlParameter("@HighestDayCount", SqlDbType.SmallInt); if (highestDayCount.HasValue) highestDayCountParameter.Value = highestDayCount; else highestDayCountParameter.Value = DBNull.Value; string connectionStr = System.Configuration.ConfigurationManager. ConnectionStrings["CacheSample"].ConnectionString; using (SqlConnection sqlConn = new SqlConnection(connectionStr)) using (SqlCommand sqlCmd = sqlConn.CreateCommand()) { sqlCmd.CommandText = "spGetRunningTotals"; sqlCmd.CommandType = CommandType.StoredProcedure; sqlCmd.Parameters.Add(highestDayCountParameter); sqlConn.Open(); using (SqlDataReader dr = sqlCmd.ExecuteReader()) { while (dr.Read()) { Sale newSale = new Sale(); newSale.DayCount = dr.GetInt16(0); newSale.Sales = dr.GetDecimal(1); newSale.RunningTotal = dr.GetDecimal(2); sales.Add(newSale); } } } return sales; }, null, new TimeSpan(0, 10, 0)); } This example passes the code to retrieve the Sales data from the database to the Cache Provider as an anonymous method, however it could also be written as a lambda. The main advantage of using an anonymous function (method or lambda) is that the code inside the anonymous function can access the parameters passed to the GetSales() method. Finally the absolute expiry is set to null, and the relative expiry set to 10 minutes, to indicate that the cache entry should be removed 10 minutes after the last request for the data. As the ICacheProvider<T> has a Fetch() method that returns IEnumerable<T>, we can simply return the results of the Fetch() method to the caller of the GetSales() method. This should be all that is needed for the GetSales() method to now retrieve data from a cache after the first time the data has be retrieved from the database. Implementing a ASP.NET Cache Provider The final step is to actually implement the ICacheProvider<T> interface, and add the implementation details to the web.config file for the dependency injection. The cache provider implementation needs to have access to System.Web. Therefore it could be placed in the CacheSample.UI project, or in its own project that has a reference to System.Web. Implementing the Cache Provider in a separate project is my favoured approach. Create a new project inside the solution called CacheSample.CacheProvider, and add references to System.Web and CacheSample.Caching to this project. Add a class to the project called AspNetCacheProvider. Make the class a generic class by adding the generic parameter <T> and indicate that the class implements ICacheProvider<T>. The C# code for the AspNetCacheProvider class is shown below: using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.Caching; using CacheSample.Caching; namespace CacheSample.CacheProvider { public class AspNetCacheProvider<T> : ICacheProvider<T> { #region ICacheProvider<T> Members public T Fetch(string key, Func<T> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry) { return FetchAndCache<T>(key, retrieveData, absoluteExpiry, relativeExpiry); } public IEnumerable<T> Fetch(string key, Func<IEnumerable<T>> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry) { return FetchAndCache<IEnumerable<T>>(key, retrieveData, absoluteExpiry, relativeExpiry); } #endregion #region Helper Methods private U FetchAndCache<U>(string key, Func<U> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry) { U value; if (!TryGetValue<U>(key, out value)) { value = retrieveData(); if (!absoluteExpiry.HasValue) absoluteExpiry = Cache.NoAbsoluteExpiration; if (!relativeExpiry.HasValue) relativeExpiry = Cache.NoSlidingExpiration; HttpContext.Current.Cache.Insert(key, value, null, absoluteExpiry.Value, relativeExpiry.Value); } return value; } private bool TryGetValue<U>(string key, out U value) { object cachedValue = HttpContext.Current.Cache.Get(key); if (cachedValue == null) { value = default(U); return false; } else { try { value = (U)cachedValue; return true; } catch { value = default(U); return false; } } } #endregion } } The two interface Fetch() methods call a private method called FetchAndCache(). This method first checks for a element in the HttpContext.Current.Cache with the specified cache key, and if so tries to cast this to the specified type (either T or IEnumerable<T>). If the cached element is found, the FetchAndCache() method simply returns it. If it is not found in the cache, the method calls the retrievalMethod delegate to get the data from the data source, and then adds this to the HttpContext.Current.Cache. The final step is to add the AspNetCacheProvider class to the relevant custom configuration section in the CacheSample.UI.Web.Config file. To do this there needs to be a <configSections> element added as the first element in <configuration>. This will match a custom section called <cacheProvider> with the CacheProviderConfigurationSection. Then we add a <cacheProvider> element, with a type property set to the fully qualified assembly name of the AspNetCacheProvider class, as shown below: <?xmlversion="1.0"?> <configuration> <configSections> <sectionname="cacheProvider" type="CacheSample.Base.Configuration.CacheProviderConfigurationSection, CacheSample.Base" /> </configSections> <connectionStrings> <addname="CacheSample" connectionString="data source=.\SQLEXPRESS;Integrated Security=SSPI;Initial Catalog=CacheSample" providerName="System.Data.SqlClient" /> </connectionStrings> <cacheProvidertype="CacheSample.CacheProvider.AspNetCacheProvider`1, CacheSample.CacheProvider, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null"> </cacheProvider> <system.web> <compilationdebug="true"targetFramework="4.0" /> </system.web> </configuration> One point to note is that the fully qualified assembly name of the AspNetCacheProvider class includes the notation `1 after the class name, which indicates that it is a generic class with a single generic type parameter. The CacheSample.UI project needs to have references added to CacheSample.Caching and CacheSample.CacheProvider so that the actual application is aware of the relevant cache provider implementation. Conclusion After implementing this solution, you should have a working cache provider mechanism, that will allow the middle and data access layers to implement caching support when retrieving data, without any knowledge of the actually caching implementation. If the UI is not ASP.NET based, if for example it is Winforms or WPF, the implementation of ICacheProvider<T> would be written around whatever technology is available. It could even be a standalone caching system that takes full responsibility for adding and removing items from a global store. The next part of this article will show how this caching mechanism may be extended to provide support for cache dependencies, such as the System.Web.Caching.SqlCacheDependency. Another possible extension would be to cache the cache provider implementations instead of storing them in a static Dictionary in the CacheProviderFactory. This would prevent a build up of seldom used cache providers in the application memory, as they could be removed from the cache if not used often enough, although in reality there are probably unlikely to be vast numbers of cache provider implementation instances, as most applications do not have a massive number of business object or model types.

Read the article

iPhone SDK vs. Windows Phone 7 Series SDK Challenge, Part 2: MoveMe

In this series, I will be taking sample applications from the iPhone SDK and implementing them on Windows Phone 7 Series. My goal is to do as much of an apples-to-apples comparison as I can. This series will be written to not only compare and contrast how easy or difficult it is to complete tasks on either platform, how many lines of code, etc., but Id also like it to be a way for iPhone developers to either get started on Windows Phone 7 Series development, or for developers in general to learn the platform. Heres my methodology: Run the iPhone SDK app in the iPhone Simulator to get a feel for what it does and how it works, without looking at the implementation Implement the equivalent functionality on Windows Phone 7 Series using Silverlight. Compare the two implementations based on complexity, functionality, lines of code, number of files, etc. Add some functionality to the Windows Phone 7 Series app that shows off a way to make the scenario more interesting or leverages an aspect of the platform, or uses a better design pattern to implement the functionality. You can download Microsoft Visual Studio 2010 Express for Windows Phone CTP here, and the Expression Blend 4 Beta here. If youre seeing this series for the first time, check out Part 1: Hello World. A note on methodologyin the prior post there was some feedback about lines of code not being a very good metric for this exercise. I dont really disagree, theres a lot more to this than lines of code but I believe that is a relevant metric, even if its not the ultimate one. And theres no perfect answer here. So I am going to continue to report the number of lines of code that I, as a developer would need to write in these apps as a data point, and Ill leave it up to the reader to determine how that fits in with overall complexity, etc. The first example was so basic that I think it was difficult to talk about in real terms. I think that as these apps get more complex, the subjective differences in concept count and will be more important. MoveMe The MoveMe app is the main end-to-end app writing example in the iPhone SDK, called Creating an iPhone Application. This application demonstrates a few concepts, including handling touch input, how to do animations, and how to do some basic transforms. The behavior of the application is pretty simple. User touches the button: The button does a throb type animation where it scales up and then back down briefly. User drags the button: After a touch begins, moving the touch point will drag the button around with the touch. User lets go of the button: The button animates back to its original position, but does a few small bounces as it reaches its original point, which makes the app fun and gives it an extra bit of interactivity. Now, how would I write an app that meets this spec for Windows Phone 7 Series, and how hard would it be? Lets find out! Implementing the UI Okay, lets build the UI for this application. In the HelloWorld example, we did all the UI design in Visual Studio and/or by hand in XAML. In this example, were going to use the Expression Blend 4 Beta. You might be wondering when to use Visual Studio, when to use Blend, and when to do XAML by hand. Different people will have different takes on this, but heres mine: XAML by hand simple UI that doesnt contain animations, gradients, etc., and or UI that I want to really optimize and craft when I know exactly what I want to do. Visual Studio Basic UI layout, property setting, data binding, etc. Blend Any serious design work needs to be done in Blend, including animations, handling states and transitions, styling and templating, editing resources. As in Part 1, go ahead and fire up Visual Studio 2010 Express for Windows Phone (yes, soon it will take longer to say the name of our products than to start them up!), and create a new Windows Phone Application. As in Part 1, clear out the XAML from the designer. An easy way to do this is to just: Click on the design surface Hit Control+A Hit Delete Theres a little bit left over (the Grid.RowDefinitions element), just go ahead and delete that element so were starting with a clean state of only one outer Grid element. To use Blend, we need to save this project. See, when you create a project with Visual Studio Express, it doesnt commit it to the disk (well, in a place where you can find it, at least) until you actually save the project. This is handy if youre doing some fooling around, because it doesnt clutter your disk with WindowsPhoneApplication23-like directories. But its also kind of dangerous, since when you close VS, if you dont save the projectits all gone. Yes, this has bitten me since I was saving files and didnt remember that, so be careful to save the project/solution via Save All, at least once. So, save and note the location on disk. Start Expression Blend 4 Beta, and chose File > Open Project/Solution, and load your project. You should see just about the same thing you saw over in VS: a blank, black designer surface. Now, thinking about this application, we dont really need a button, even though it looks like one. We never click it. So were just going to create a visual and use that. This is also true in the iPhone example above, where the visual is actually not a button either but a jpg image with a nice gradient and round edges. Well do something simple here that looks pretty good. In Blend, look in the tool pane on the left for the icon that looks like the below (the highlighted one on the left), and hold it down to get the popout menu, and choose Border: Okay, now draw out a box in the middle of the design surface of about 300x100. The Properties Pane to the left should show the properties for this item. First, lets make it more visible by giving it a border brush. Set the BorderBrush to white by clicking BorderBrush and dragging the color selector all the way to the upper right in the palette. Then, down a bit farther, make the BorderThickness 4 all the way around, and the CornerRadius set to 6. In the Layout section, do the following to Width, Height, Horizontal and Vertical Alignment, and Margin (all 4 margin values): Youll see the outline now is in the middle of the design surface. Now lets give it a background color. Above BorderBrush select Background, and click the third tab over: Gradient Brush. Youll see a gradient slider at the bottom, and if you click the markers, you can edit the gradient stops individually (or add more). In this case, you can select something you like, but wheres what I chose: Left stop: #BFACCFE2 (I just picked a spot on the palette and set opacity to 75%, no magic here, feel free to fiddle these or just enter these numbers into the hex area and be done with it) Right stop: #FF3E738F Okay, looks pretty good. Finally set the name of the element in the Name field at the top of the Properties pane to welcome. Now lets add some text. Just hit T and itll select the TextBlock tool automatically: Now draw out some are inside our welcome visual and type Welcome!, then click on the design surface (to exit text entry mode) and hit V to go back into selection mode (or the top item in the tool pane that looks like a mouse pointer). Click on the text again to select it in the tool pane. Just like the border, we want to center this. So set HorizontalAlignment and VerticalAlignment to Center, and clear the Margins: Thats it for the UI. Heres how it looks, on the design surface: Not bad! Okay, now the fun part Adding Animations Using Blend to build animations is a lot of fun, and its easy. In XAML, I can not only declare elements and visuals, but also I can declare animations that will affect those visuals. These are called Storyboards. To recap, well be doing two animations: The throb animation when the element is touched The center animation when the element is released after being dragged. The throb animation is just a scale transform, so well do that first. In the Objects and Timeline Pane (left side, bottom half), click the little + icon to add a new Storyboard called touchStoryboard: The timeline view will appear. In there, click a bit to the right of 0 to create a keyframe at .2 seconds: Now, click on our welcome element (the Border, not the TextBlock in it), and scroll to the bottom of the Properties Pane. Open up Transform, click the third tab ("Scale), and set X and Y to 1.2: This all of this says that, at .2 seconds, I want the X and Y size of this element to scale to 1.2. In fact you can see this happen. Push the Play arrow in the timeline view, and youll see the animation run! Lets make two tweaks. First, we want the animation to automatically reverse so it scales up then back down nicely. Click in the dropdown that says touchStoryboard in Objects and Timeline, then in the Properties pane check Auto Reverse: Now run it again, and youll see it go both ways. Lets even make it nicer by adding an easing function. First, click on the Render Transform item in the Objects tree, then, in the Property Pane, youll see a bunch of easing functions to choose from. Feel free to play with this, then seeing how each runs. I chose Circle In, but some other ones are fun. Try them out! Elastic In is kind of fun, but well stick with Circle In. Thats it for that animation. Now, we also want an animation to move the Border back to its original position when the user ends the touch gesture. This is exactly the same process as above, but just targeting a different transform property. Create a new animation called releaseStoryboard Select a timeline point at 1.2 seconds. Click on the welcome Border element again Scroll to the Transforms panel at the bottom of the Properties Pane Choose the first tab (Translate), which may already be selected Set both X and Y values to 0.0 (we do this just to make the values stick, because the value is already 0 and we need Blend to know we want to save that value) Click on RenderTransform in the Objects tree In the properties pane, choose Bounce Out Set Bounces to 6, and Bounciness to 4 (feel free to play with these as well) Okay, were done. Note, if you want to test this Storyboard, you have to do something a little tricky because the final value is the same as the initial value, so playing it does nothing. If you want to play with it, do the following: Next to the selection dropdown, hit the little "x (Close Storyboard) Go to the Translate Transform value for welcome Set X,Y to 50, 200, respectively (or whatever) Select releaseStoryboard again from the dropdown Hit play, see it run Go into the object tree and select RenderTransform to change the easing function. When youre done, hit the Close Storyboard x again and set the values in Transform/Translate back to 0 Wiring Up the Animations Okay, now go back to Visual Studio. Youll get a prompt due to the modification of MainPage.xaml. Hit Yes. In the designer, click on the welcome Border element. In the Property Browser, hit the Events button, then double click each of ManipulationStarted, ManipulationDelta, ManipulationCompleted. Youll need to flip back to the designer from code, after each double click. Its code time. Here we go. Here, three event handlers have been created for us: welcome_ManipulationStarted: This will execute when a manipulation begins. Think of it as MouseDown. welcome_ManipulationDelta: This executes each time a manipulation changes. Think MouseMove. welcome_ManipulationCompleted: This will execute when the manipulation ends. Think MouseUp. Now, in ManipuliationStarted, we want to kick off the throb animation that we called touchAnimation. Thats easy: 1: private void welcome_ManipulationStarted(object sender, ManipulationStartedEventArgs e) 2: { 3: touchStoryboard.Begin(); 4: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Likewise, when the manipulation completes, we want to re-center the welcome visual with our bounce animation: 1: private void welcome_ManipulationCompleted(object sender, ManipulationCompletedEventArgs e) 2: { 3: releaseStoryboard.Begin(); 4: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Note there is actually a way to kick off these animations from Blend directly via something called Triggers, but I think its clearer to show whats going on like this. A Trigger basically allows you to say When this event fires, trigger this Storyboard, so its the exact same logical process as above, but without the code. But how do we get the object to move? Well, for that we really dont want an animation because we want it to respond immediately to user input. We do this by directly modifying the transform to match the offset for the manipulation, and then well let the animation bring it back to zero when the manipulation completes. The manipulation events do a great job of keeping track of all the stuff that you usually had to do yourself when doing drags: where you started from, how far youve moved, etc. So we can easily modify the position as below: 1: private void welcome_ManipulationDelta(object sender, ManipulationDeltaEventArgs e) 2: { 3: CompositeTransform transform = (CompositeTransform)welcome.RenderTransform; 4: 5: transform.TranslateX = e.CumulativeManipulation.Translation.X; 6: transform.TranslateY = e.CumulativeManipulation.Translation.Y; 7: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Thats it! Go ahead and run the app in the emulator. I suggest running without the debugger, its a little faster (CTRL+F5). If youve got a machine that supports DirectX 10, youll see nice smooth GPU accelerated graphics, which also what it looks like on the phone, running at about 60 frames per second. If your machine does not support DX10 (like the laptop Im writing this on!), it wont be quite a smooth so youll have to take my word for it! Comparing Against the iPhone This is an example where the flexibility and power of XAML meets the tooling of Visual Studio and Blend, and the whole experience really shines. So, for several things that are declarative and 100% toolable with the Windows Phone 7 Series, this example does them with code on the iPhone. In parens is the lines of code that I count to do these operations. PlacardView.m: 19 total LOC Creating the view that hosts the button-like image and the text Drawing the image that is the background of the button Drawing the Welcome text over the image (I think you could technically do this step and/or the prior one using Interface Builder) MoveMeView.m: 63 total LOC Constructing and running the scale (throb) animation (25) Constructing the path describing the animation back to center plus bounce effect (38) Beyond the code count, yy experience with doing this kind of thing in code is that its VERY time intensive. When I was a developer back on Windows Forms, doing GDI+ drawing, we did this stuff a lot, and it took forever! You write some code and even once you get it basically working, you see its not quite right, you go back, tweak the interval, or the math a bit, run it again, etc. You can take a look at the iPhone code here to judge for yourself. Scroll down to animatePlacardViewToCenter toward the bottom. I dont think this code is terribly complicated, but its not what Id call simple and its not at all simple to get right. And then theres a few other lines of code running around for setting up the ViewController and the Views, about 15 lines between MoveMeAppDelegate, PlacardView, and MoveMeView, plus the assorted decls in the h files. Adding those up, I conservatively get something like 100 lines of code (19+63+15+decls) on iPhone that I have to write, by hand, to make this project work. The lines of code that I wrote in the examples above is 5 lines of code on Windows Phone 7 Series. In terms of incremental concept counts beyond the HelloWorld app, heres a shot at that: iPhone: Drawing Images Drawing Text Handling touch events Creating animations Scaling animations Building a path and animating along that Windows Phone 7 Series: Laying out UI in Blend Creating & testing basic animations in Blend Handling touch events Invoking animations from code This was actually the first example I tried converting, even before I did the HelloWorld, and I was pretty surprised. Some of this is luck that this app happens to match up with the Windows Phone 7 Series platform just perfectly. In terms of time, I wrote the above application, from scratch, in about 10 minutes. I dont know how long it would take a very skilled iPhone developer to write MoveMe on that iPhone from scratch, but if I was to write it on Silverlight in the same way (e.g. all via code), I think it would likely take me at least an hour or two to get it all working right, maybe more if I ended up picking the wrong strategy or couldnt get the math right, etc. Making Some Tweaks Silverlight contains a feature called Projections to do a variety of 3D-like effects with a 2D surface. So lets play with that a bit. Go back to Blend and select the welcome Border in the object tree. In its properties, scroll down to the bottom, open Transform, and see Projection at the bottom. Set X,Y,Z to 90. Youll see the element kind of disappear, replaced by a thin blue line. Now Create a new animation called startupStoryboard. Set its key time to .5 seconds in the timeline view Set the projection values above to 0 for X, Y, and Z. Save Go back to Visual Studio, and in the constructor, add the following bold code (lines 7-9 to the constructor: 1: public MainPage() 2: { 3: InitializeComponent(); 4: 5: SupportedOrientations = SupportedPageOrientation.Portrait; 6: 7: this.Loaded += (s, e) => 8: { 9: startupStoryboard.Begin(); 10: }; 11: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } If the code above looks funny, its using something called a lambda in C#, which is an inline anonymous method. Its just a handy shorthand for creating a handler like the manipulation ones above. So with this youll get a nice 3D looking fly in effect when the app starts up. Here it is, in flight: Pretty cool!Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article

Partner Blog Series: PwC Perspectives - The Gotchas, The Do's and Don'ts for IDM Implementations

- by Tanu Sood

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:12.0pt; mso-para-margin-left:0in; line-height:12.0pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Arial","sans-serif"; mso-ascii-font-family:Arial; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Arial; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} table.MsoTableMediumList1Accent6 {mso-style-name:"Medium List 1 - Accent 6"; mso-tstyle-rowband-size:1; mso-tstyle-colband-size:1; mso-style-priority:65; mso-style-unhide:no; border-top:solid #E0301E 1.0pt; mso-border-top-themecolor:accent6; border-left:none; border-bottom:solid #E0301E 1.0pt; mso-border-bottom-themecolor:accent6; border-right:none; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Georgia","serif"; color:black; mso-themecolor:text1; mso-ansi-language:EN-GB;} table.MsoTableMediumList1Accent6FirstRow {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:first-row; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-border-top:cell-none; mso-tstyle-border-bottom:1.0pt solid #E0301E; mso-tstyle-border-bottom-themecolor:accent6; font-family:"Verdana","sans-serif"; mso-ascii-font-family:Georgia; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Georgia; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:major-bidi;} table.MsoTableMediumList1Accent6LastRow {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:last-row; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-border-top:1.0pt solid #E0301E; mso-tstyle-border-top-themecolor:accent6; mso-tstyle-border-bottom:1.0pt solid #E0301E; mso-tstyle-border-bottom-themecolor:accent6; color:#968C6D; mso-themecolor:text2; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableMediumList1Accent6FirstCol {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:first-column; mso-style-priority:65; mso-style-unhide:no; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableMediumList1Accent6LastCol {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:last-column; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-border-top:1.0pt solid #E0301E; mso-tstyle-border-top-themecolor:accent6; mso-tstyle-border-bottom:1.0pt solid #E0301E; mso-tstyle-border-bottom-themecolor:accent6; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableMediumList1Accent6OddColumn {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:odd-column; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-shading:#F7CBC7; mso-tstyle-shading-themecolor:accent6; mso-tstyle-shading-themetint:63;} table.MsoTableMediumList1Accent6OddRow {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:odd-row; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-shading:#F7CBC7; mso-tstyle-shading-themecolor:accent6; mso-tstyle-shading-themetint:63;} Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:12.0pt; mso-para-margin-left:0in; line-height:12.0pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Arial","sans-serif"; mso-ascii-font-family:Arial; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Arial; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} table.MsoTableMediumList1Accent6 {mso-style-name:"Medium List 1 - Accent 6"; mso-tstyle-rowband-size:1; mso-tstyle-colband-size:1; mso-style-priority:65; mso-style-unhide:no; border-top:solid #E0301E 1.0pt; mso-border-top-themecolor:accent6; border-left:none; border-bottom:solid #E0301E 1.0pt; mso-border-bottom-themecolor:accent6; border-right:none; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Georgia","serif"; color:black; mso-themecolor:text1; mso-ansi-language:EN-GB;} table.MsoTableMediumList1Accent6FirstRow {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:first-row; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-border-top:cell-none; mso-tstyle-border-bottom:1.0pt solid #E0301E; mso-tstyle-border-bottom-themecolor:accent6; font-family:"Arial Narrow","sans-serif"; mso-ascii-font-family:Georgia; mso-ascii-theme-font:major-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Georgia; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:major-bidi;} table.MsoTableMediumList1Accent6LastRow {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:last-row; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-border-top:1.0pt solid #E0301E; mso-tstyle-border-top-themecolor:accent6; mso-tstyle-border-bottom:1.0pt solid #E0301E; mso-tstyle-border-bottom-themecolor:accent6; color:#968C6D; mso-themecolor:text2; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableMediumList1Accent6FirstCol {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:first-column; mso-style-priority:65; mso-style-unhide:no; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableMediumList1Accent6LastCol {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:last-column; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-border-top:1.0pt solid #E0301E; mso-tstyle-border-top-themecolor:accent6; mso-tstyle-border-bottom:1.0pt solid #E0301E; mso-tstyle-border-bottom-themecolor:accent6; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableMediumList1Accent6OddColumn {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:odd-column; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-shading:#F7CBC7; mso-tstyle-shading-themecolor:accent6; mso-tstyle-shading-themetint:63;} table.MsoTableMediumList1Accent6OddRow {mso-style-name:"Medium List 1 - Accent 6"; mso-table-condition:odd-row; mso-style-priority:65; mso-style-unhide:no; mso-tstyle-shading:#F7CBC7; mso-tstyle-shading-themecolor:accent6; mso-tstyle-shading-themetint:63;} It is generally accepted among business communities that technology by itself is not a silver bullet to all problems, but when it is combined with leading practices, strategy, careful planning and execution, it can create a recipe for success. This post attempts to highlight some of the best practices along with dos & don’ts that our practice has accumulated over the years in the identity & access management space in general, and also in the context of R2, in particular. Best Practices The following section illustrates the leading practices in “How” to plan, implement and sustain a successful OIM deployment, based on our collective experience. Planning is critical, but often overlooked A common approach to planning an IAM program that we identify with our clients is the three step process involving a current state assessment, a future state roadmap and an executable strategy to get there. It is extremely beneficial for clients to assess their current IAM state, perform gap analysis, document the recommended controls to address the gaps, align future state roadmap to business initiatives and get buy in from all stakeholders involved to improve the chances of success. When designing an enterprise-wide solution, the scalability of the technology must accommodate the future growth of the enterprise and the projected identity transactions over several years. Aligning the implementation schedule of OIM to related information technology projects increases the chances of success. As a baseline, it is recommended to match hardware specifications to the sizing guide for R2 published by Oracle. Adherence to this will help ensure that the hardware used to support OIM will not become a bottleneck as the adoption of new services increases. If your Organization has numerous connected applications that rely on reconciliation to synchronize the access data into OIM, consider hosting dedicated instances to handle reconciliation. Finally, ensure the use of clustered environment for development and have at least three total environments to help facilitate a controlled migration to production. If your Organization is planning to implement role based access control, we recommend performing a role mining exercise and consolidate your enterprise roles to keep them manageable. In addition, many Organizations have multiple approval flows to control access to critical roles, applications and entitlements. If your Organization falls into this category, we highly recommend that you limit the number of approval workflows to a small set. Most Organizations have operations managed across data centers with backend database synchronization, if your Organization falls into this category, ensure that the overall latency between the datacenters when replicating the databases is less than ten milliseconds to ensure that there are no front office performance impacts. Ingredients for a successful implementation During the development phase of your project, there are a number of guidelines that can be followed to help increase the chances for success. Most implementations cannot be completed without the use of customizations. If your implementation requires this, it’s a good practice to perform code reviews to help ensure quality and reduce code bottlenecks related to performance. We have observed at our clients that the development process works best when team members adhere to coding leading practices. Plan for time to correct coding defects and ensure developers are empowered to report their own bugs for maximum transparency. Many organizations struggle with defining a consistent approach to managing logs. This is particularly important due to the amount of information that can be logged by OIM. We recommend Oracle Diagnostics Logging (ODL) as an alternative to be used for logging. ODL allows log files to be formatted in XML for easy parsing and does not require a server restart when the log levels are changed during troubleshooting. Testing is a vital part of any large project, and an OIM R2 implementation is no exception. We suggest that at least one lower environment should use production-like data and connectors. Configurations should match as closely as possible. For example, use secure channels between OIM and target platforms in pre-production environments to test the configurations, the migration processes of certificates, and the additional overhead that encryption could impose. Finally, we ask our clients to perform database backups regularly and before any major change event, such as a patch or migration between environments. In the lowest environments, we recommend to have at least a weekly backup in order to prevent significant loss of time and effort. Similarly, if your organization is using virtual machines for one or more of the environments, it is recommended to take frequent snapshots so that rollbacks can occur in the event of improper configuration. Operate & sustain the solution to derive maximum benefits When migrating OIM R2 to production, it is important to perform certain activities that will help achieve a smoother transition. At our clients, we have seen that splitting the OIM tables into their own tablespaces by categories (physical tables, indexes, etc.) can help manage database growth effectively. If we notice that a client hasn’t enabled the Oracle-recommended indexing in the applicable database, we strongly suggest doing so to improve performance. Additionally, we work with our clients to make sure that the audit level is set to fit the organization’s auditing needs and sometimes even allocate UPA tables and indexes into their own table-space for better maintenance. Finally, many of our clients have set up schedules for reconciliation tables to be archived at regular intervals in order to keep the size of the database(s) reasonable and result in optimal database performance. For our clients that anticipate availability issues with target applications, we strongly encourage the use of the offline provisioning capabilities of OIM R2. This reduces the provisioning process for a given target application dependency on target availability and help avoid broken workflows. To account for this and other abnormalities, we also advocate that OIM’s monitoring controls be configured to alert administrators on any abnormal situations. Within OIM R2, we have begun advising our clients to utilize the ‘profile’ feature to encapsulate multiple commonly requested accounts, roles, and/or entitlements into a single item. By setting up a number of profiles that can be searched for and used, users will spend less time performing the same exact steps for common tasks. We advise our clients to follow the Oracle recommended guides for database and application server tuning which provides a good baseline configuration. It offers guidance on database connection pools, connection timeouts, user interface threads and proper handling of adapters/plug-ins. All of these can be important configurations that will allow faster provisioning and web page response times. Many of our clients have begun to recognize the value of data mining and a remediation process during the initial phases of an implementation (to help ensure high quality data gets loaded) and beyond (to support ongoing maintenance and business-as-usual processes). A successful program always begins with identifying the data elements and assigning a classification level based on criticality, risk, and availability. It should finish by following through with a remediation process. Dos & Don’ts Here are the most common dos and don'ts that we socialize with our clients, derived from our experience implementing the solution. Dos Don’ts Scope the project into phases with realistic goals. Look for quick wins to show success and value to the stake holders. Avoid “boiling the ocean” and trying to integrate all enterprise applications in the first phase. Establish an enterprise ID (universal unique ID across the enterprise) earlier in the program. Avoid major UI customizations that require code changes. Have a plan in place to patch during the project, which helps alleviate any major issues or roadblocks (product and database). Avoid publishing all the target entitlements if you don't anticipate their usage during access request. Assess your current state and prepare a roadmap to address your operations, tactical and strategic goals, align it with your business priorities. Avoid integrating non-production environments with your production target systems. Defer complex integrations to the later phases and take advantage of lessons learned from previous phases Avoid creating multiple accounts for the same user on the same system, if there is an opportunity to do so. Have an identity and access data quality initiative built into your plan to identify and remediate data related issues early on. Avoid creating complex approval workflows that would negative impact productivity and SLAs. Identify the owner of the identity systems with fair IdM knowledge and empower them with authority to make product related decisions. This will help ensure overcome any design hurdles. Avoid creating complex designs that are not sustainable long term and would need major overhaul during upgrades. Shadow your internal or external consulting resources during the implementation to build the necessary product skills needed to operate and sustain the solution. Avoid treating IAM as a point solution and have appropriate level of communication and training plan for the IT and business users alike. Conclusion In our experience, Identity programs will struggle with scope, proper resourcing, and more. We suggest that companies consider the suggestions discussed in this post and leverage them to help enable their identity and access program. This concludes PwC blog series on R2 for the month and we sincerely hope that the information we have shared thus far has been beneficial. For more information or if you have questions, you can reach out to Rex Thexton, Senior Managing Director, PwC and or Dharma Padala, Director, PwC. We look forward to hearing from you. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:12.0pt; mso-para-margin-left:0in; line-height:12.0pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Arial","sans-serif"; mso-ascii-font-family:Arial; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Arial; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Meet the Writers: Dharma Padala is a Director in the Advisory Security practice within PwC. He has been implementing medium to large scale Identity Management solutions across multiple industries including utility, health care, entertainment, retail and financial sectors. Dharma has 14 years of experience in delivering IT solutions out of which he has been implementing Identity Management solutions for the past 8 years. Praveen Krishna is a Manager in the Advisory Security practice within PwC. Over the last decade Praveen has helped clients plan, architect and implement Oracle identity solutions across diverse industries. His experience includes delivering security across diverse topics like network, infrastructure, application and data where he brings a holistic point of view to problem solving. Scott MacDonald is a Director in the Advisory Security practice within PwC. He has consulted for several clients across multiple industries including financial services, health care, automotive and retail. Scott has 10 years of experience in delivering Identity Management solutions. John Misczak is a member of the Advisory Security practice within PwC. He has experience implementing multiple Identity and Access Management solutions, specializing in Oracle Identity Manager and Business Process Engineering Language (BPEL).

Read the article

High Load mysql on Debian server stops every day. Why?

- by Oleg Abrazhaev

I have Debian server with 32 gb memory. And there is apache2, memcached and nginx on this server. Memory load always on maximum. Only 500m free. Most memory leak do MySql. Apache only 70 clients configured, other services small memory usage. When mysql use all memory it stops. And nothing works, need mysql reboot. Mysql configured use maximum 24 gb memory. I have hight weight InnoDB bases. (400000 rows, 30 gb). And on server multithread daemon, that makes many inserts in this tables, thats why InnoDB. There is my mysql config. [mysqld] # # * Basic Settings # default-time-zone = "+04:00" user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking default-time-zone='Europe/Moscow' # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. # # * Fine Tuning # #low_priority_updates = 1 concurrent_insert = ALWAYS wait_timeout = 600 interactive_timeout = 600 #normal key_buffer_size = 2024M #key_buffer_size = 1512M #70% hot cache key_cache_division_limit= 70 #16-32 max_allowed_packet = 32M #1-16M thread_stack = 8M #40-50 thread_cache_size = 50 #orderby groupby sort sort_buffer_size = 64M #same myisam_sort_buffer_size = 400M #temp table creates when group_by tmp_table_size = 3000M #tables in memory max_heap_table_size = 3000M #on disk open_files_limit = 10000 table_cache = 10000 join_buffer_size = 5M # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #myisam_use_mmap = 1 max_connections = 200 thread_concurrency = 8 # # * Query Cache Configuration # #more ignored query_cache_limit = 50M query_cache_size = 210M #on query cache query_cache_type = 1 # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log server-id = 1 log-bin = /var/lib/mysql/mysql-bin #replicate-do-db = gate log-bin-index = /var/lib/mysql/mysql-bin.index log-error = /var/lib/mysql/mysql-bin.err relay-log = /var/lib/mysql/relay-bin relay-log-info-file = /var/lib/mysql/relay-bin.info relay-log-index = /var/lib/mysql/relay-bin.index binlog_do_db = 24avia expire_logs_days = 10 max_binlog_size = 100M read_buffer_size = 4024288 innodb_buffer_pool_size = 5000M innodb_flush_log_at_trx_commit = 2 innodb_thread_concurrency = 8 table_definition_cache = 2000 group_concat_max_len = 16M #binlog_do_db = gate #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. #skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 500M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 32M key_buffer_size = 512M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/ Please, help me make it stable. Memory used /etc/mysql # free total used free shared buffers cached Mem: 32930800 32766424 164376 0 139208 23829196 -/+ buffers/cache: 8798020 24132780 Swap: 33553328 44660 33508668 Maybe my problem not in memory, but MySQL stops every day. As you can see, cache memory free 24 gb. Thank to Michael Hampton? for correction. Load overage on server 3.5. Maybe hdd or another problem? Maybe my config not optimal for 30gb InnoDB ? I'm already try mysqltuner and tunung-primer.sh , but they marked all green. Mysqltuner output mysqltuner >> MySQLTuner 1.0.1 - Major Hayden <[email protected]> >> Bug reports, feature requests, and downloads at http://mysqltuner.com/ >> Run with '--help' for additional options and output filtering -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.5.24-9-log [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 112G (Tables: 1528) [--] Data in InnoDB tables: 39G (Tables: 340) [--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17) [!!] Total fragmented tables: 344 -------- Performance Metrics ------------------------------------------------- [--] Up for: 8h 18m 33s (14M q [478.333 qps], 259K conn, TX: 9B, RX: 5B) [--] Reads / Writes: 84% / 16% [--] Total buffers: 10.5G global + 81.1M per thread (200 max threads) [OK] Maximum possible memory usage: 26.3G (83% of installed RAM) [OK] Slow queries: 1% (259K/14M) [!!] Highest connection usage: 100% (201/200) [OK] Key buffer size / total MyISAM indexes: 1.5G/5.6G [OK] Key buffer hit rate: 100.0% (6B cached / 1M reads) [OK] Query cache efficiency: 74.3% (8M cached / 11M selects) [OK] Query cache prunes per day: 0 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 247K sorts) [!!] Joins performed without indexes: 106025 [!!] Temporary tables created on disk: 49% (351K on disk / 715K total) [OK] Thread cache hit rate: 99% (249 created / 259K connections) [!!] Table cache hit rate: 15% (2K open / 13K opened) [OK] Open file limit used: 15% (3K/20K) [OK] Table locks acquired immediately: 99% (4M immediate / 4M locks) [!!] InnoDB data size / buffer pool: 39.4G/5.9G -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Reduce or eliminate persistent connections to reduce connection usage Adjust your join queries to always utilize indexes Temporary table size is already large - reduce result set size Reduce your SELECT DISTINCT queries without LIMIT clauses Increase table_cache gradually to avoid file descriptor limits Variables to adjust: max_connections (> 200) wait_timeout (< 600) interactive_timeout (< 600) join_buffer_size (> 5.0M, or always use indexes with joins) table_cache (> 10000) innodb_buffer_pool_size (>= 39G) Mysql primer output -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.5.24-9-log x86_64 Uptime = 0 days 8 hrs 20 min 50 sec Avg. qps = 478 Total Questions = 14369568 Threads Connected = 16 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1.000000 sec. You have 260626 out of 14369701 that take longer than 1.000000 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is enabled Binlog sync is not enabled, you could loose binlog records during a server crash WORKER THREADS Current thread_cache_size = 50 Current threads_cached = 45 Current threads_per_sec = 0 Historic threads_per_sec = 0 Your thread_cache_size is fine MAX CONNECTIONS Current max_connections = 200 Current threads_connected = 11 Historic max_used_connections = 201 The number of used connections is 100% of the configured maximum. You should raise max_connections INNODB STATUS Current InnoDB index space = 214 M Current InnoDB data space = 39.40 G Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 5.85 G Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 23.46 G Configured Max Per-thread Buffers : 15.84 G Configured Max Global Buffers : 7.54 G Configured Max Memory Limit : 23.39 G Physical Memory : 31.40 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 5.61 G Current key_buffer_size = 1.47 G Key cache miss rate is 1 : 5578 Key buffer free ratio = 77 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is enabled Current query_cache_size = 200 M Current query_cache_used = 101 M Current query_cache_limit = 50 M Current Query cache Memory fill ratio = 50.59 % Current query_cache_min_res_unit = 4 K MySQL won't cache query results that are larger than query_cache_limit in size SORT OPERATIONS Current sort_buffer_size = 64 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 5.00 M You have had 106606 queries where a join could not use an index properly You have had 8 joins without keys that check for key usage after each row join_buffer_size >= 4 M This is not advised You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. OPEN FILES LIMIT Current open_files_limit = 20210 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_open_cache = 10000 tables Current table_definition_cache = 2000 tables You have a total of 1910 tables You have 2151 open tables. The table_cache value seems to be fine TEMP TABLES Current max_heap_table_size = 2.92 G Current tmp_table_size = 2.92 G Of 366426 temp tables, 49% were created on disk Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables Note! BLOB and TEXT columns are not allow in memory tables. If you are using these columns raising these values might not impact your ratio of on disk temp tables. TABLE SCANS Current read_buffer_size = 3 M Current table scan ratio = 2846 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 185 You may benefit from selective use of InnoDB. If you have long running SELECT's against MyISAM tables and perform frequent updates consider setting 'low_priority_updates=1'

Read the article

What approach to take for SIMD optimizations

- by goldenmean

Hi, I am trying to optimize below code for SIMD operations (8way/4way/2way SIMD whiechever possible and if it gives gains in performance) I am tryin to analyze it first on paper to understand the algorithm used. How can i optimize it for SIMD:- void idct(uint8_t *dst, int stride, int16_t *input, int type) { int16_t *ip = input; uint8_t *cm = ff_cropTbl + MAX_NEG_CROP; int A, B, C, D, Ad, Bd, Cd, Dd, E, F, G, H; int Ed, Gd, Add, Bdd, Fd, Hd; int i; /* Inverse DCT on the rows now */ for (i = 0; i < 8; i++) { /* Check for non-zero values */ if ( ip[0] | ip[1] | ip[2] | ip[3] | ip[4] | ip[5] | ip[6] | ip[7] ) { A = M(xC1S7, ip[1]) + M(xC7S1, ip[7]); B = M(xC7S1, ip[1]) - M(xC1S7, ip[7]); C = M(xC3S5, ip[3]) + M(xC5S3, ip[5]); D = M(xC3S5, ip[5]) - M(xC5S3, ip[3]); Ad = M(xC4S4, (A - C)); Bd = M(xC4S4, (B - D)); Cd = A + C; Dd = B + D; E = M(xC4S4, (ip[0] + ip[4])); F = M(xC4S4, (ip[0] - ip[4])); G = M(xC2S6, ip[2]) + M(xC6S2, ip[6]); H = M(xC6S2, ip[2]) - M(xC2S6, ip[6]); Ed = E - G; Gd = E + G; Add = F + Ad; Bdd = Bd - H; Fd = F - Ad; Hd = Bd + H; /* Final sequence of operations over-write original inputs. */ ip[0] = (int16_t)(Gd + Cd) ; ip[7] = (int16_t)(Gd - Cd ); ip[1] = (int16_t)(Add + Hd); ip[2] = (int16_t)(Add - Hd); ip[3] = (int16_t)(Ed + Dd) ; ip[4] = (int16_t)(Ed - Dd ); ip[5] = (int16_t)(Fd + Bdd); ip[6] = (int16_t)(Fd - Bdd); } ip += 8; /* next row */ } ip = input; for ( i = 0; i < 8; i++) { /* Check for non-zero values (bitwise or faster than ||) */ if ( ip[1 * 8] | ip[2 * 8] | ip[3 * 8] | ip[4 * 8] | ip[5 * 8] | ip[6 * 8] | ip[7 * 8] ) { A = M(xC1S7, ip[1*8]) + M(xC7S1, ip[7*8]); B = M(xC7S1, ip[1*8]) - M(xC1S7, ip[7*8]); C = M(xC3S5, ip[3*8]) + M(xC5S3, ip[5*8]); D = M(xC3S5, ip[5*8]) - M(xC5S3, ip[3*8]); Ad = M(xC4S4, (A - C)); Bd = M(xC4S4, (B - D)); Cd = A + C; Dd = B + D; E = M(xC4S4, (ip[0*8] + ip[4*8])) + 8; F = M(xC4S4, (ip[0*8] - ip[4*8])) + 8; if(type==1){ //HACK E += 16*128; F += 16*128; } G = M(xC2S6, ip[2*8]) + M(xC6S2, ip[6*8]); H = M(xC6S2, ip[2*8]) - M(xC2S6, ip[6*8]); Ed = E - G; Gd = E + G; Add = F + Ad; Bdd = Bd - H; Fd = F - Ad; Hd = Bd + H; /* Final sequence of operations over-write original inputs. */ if(type==0){ ip[0*8] = (int16_t)((Gd + Cd ) >> 4); ip[7*8] = (int16_t)((Gd - Cd ) >> 4); ip[1*8] = (int16_t)((Add + Hd ) >> 4); ip[2*8] = (int16_t)((Add - Hd ) >> 4); ip[3*8] = (int16_t)((Ed + Dd ) >> 4); ip[4*8] = (int16_t)((Ed - Dd ) >> 4); ip[5*8] = (int16_t)((Fd + Bdd ) >> 4); ip[6*8] = (int16_t)((Fd - Bdd ) >> 4); }else if(type==1){ dst[0*stride] = cm[(Gd + Cd ) >> 4]; dst[7*stride] = cm[(Gd - Cd ) >> 4]; dst[1*stride] = cm[(Add + Hd ) >> 4]; dst[2*stride] = cm[(Add - Hd ) >> 4]; dst[3*stride] = cm[(Ed + Dd ) >> 4]; dst[4*stride] = cm[(Ed - Dd ) >> 4]; dst[5*stride] = cm[(Fd + Bdd ) >> 4]; dst[6*stride] = cm[(Fd - Bdd ) >> 4]; }else{ dst[0*stride] = cm[dst[0*stride] + ((Gd + Cd ) >> 4)]; dst[7*stride] = cm[dst[7*stride] + ((Gd - Cd ) >> 4)]; dst[1*stride] = cm[dst[1*stride] + ((Add + Hd ) >> 4)]; dst[2*stride] = cm[dst[2*stride] + ((Add - Hd ) >> 4)]; dst[3*stride] = cm[dst[3*stride] + ((Ed + Dd ) >> 4)]; dst[4*stride] = cm[dst[4*stride] + ((Ed - Dd ) >> 4)]; dst[5*stride] = cm[dst[5*stride] + ((Fd + Bdd ) >> 4)]; dst[6*stride] = cm[dst[6*stride] + ((Fd - Bdd ) >> 4)]; } } else { if(type==0){ ip[0*8] = ip[1*8] = ip[2*8] = ip[3*8] = ip[4*8] = ip[5*8] = ip[6*8] = ip[7*8] = ((xC4S4 * ip[0*8] + (IdctAdjustBeforeShift<<16))>>20); }else if(type==1){ dst[0*stride]= dst[1*stride]= dst[2*stride]= dst[3*stride]= dst[4*stride]= dst[5*stride]= dst[6*stride]= dst[7*stride]= cm[128 + ((xC4S4 * ip[0*8] + (IdctAdjustBeforeShift<<16))>>20)]; }else{ if(ip[0*8]){ int v= ((xC4S4 * ip[0*8] + (IdctAdjustBeforeShift<<16))>>20); dst[0*stride] = cm[dst[0*stride] + v]; dst[1*stride] = cm[dst[1*stride] + v]; dst[2*stride] = cm[dst[2*stride] + v]; dst[3*stride] = cm[dst[3*stride] + v]; dst[4*stride] = cm[dst[4*stride] + v]; dst[5*stride] = cm[dst[5*stride] + v]; dst[6*stride] = cm[dst[6*stride] + v]; dst[7*stride] = cm[dst[7*stride] + v]; } } } ip++; /* next column */ dst++; } }

Read the article

Records not being saved to core data sqlite file

- by esd100

I'm a complete newbie when it comes to iOS programming and much less Core Data. It's rather non-intuitive for me, as I really came into my own with programming with MATLAB, which I guess is more of a 'scripting' language. At any rate, my problem is that I had no idea what I had to do to create a database for my application. So I read a little bit and thought I had to create a SQL database of my stuff and then import it. Long story short, I created a SQLite db and I want to use the work I have already done to import stuff into my CoreData database. I tried exporting to comma-delimited files and xml files and reading those in, but I didn't like it and it seemed like an extra step that I shouldn't need to do. So, I imported the SQLite database into my resources and added the sqlite framework. I have my core data model setup and it is setting up the SQLite database for the model correctly in the background. When I run through my program to add objects to my entities, it seems to work and I can even fetch results afterward. However, when I inspect the Core Data Database SQLite file, no records have been saved. How is it possible for it to fetch results but not save them to the database? - (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions{ //load in the path for resources NSString *paths = [[NSBundle mainBundle] resourcePath]; NSString *databaseName = @"histology.sqlite"; NSString *databasePath = [paths stringByAppendingPathComponent:databaseName]; [self createDatabase:databasePath ]; NSError *error; if ([[self managedObjectContext] save:&error]) { NSLog(@"Whoops, couldn't save: %@", [error localizedDescription]); } // Test listing all CELLS from the store NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init]; NSEntityDescription *entityMO = [NSEntityDescription entityForName:@"CELL" inManagedObjectContext:[self managedObjectContext]]; [fetchRequest setEntity:entityMO]; NSArray *fetchedObjects = [[self managedObjectContext] executeFetchRequest:fetchRequest error:&error]; for (CELL *cellName in fetchedObjects) { //NSLog(@"cellName: %@", cellName); } -(void) createDatabase:databasePath { NSLog(@"The createDatabase function was entered."); NSLog(@"The databasePath is %@ ",[databasePath description]); // Setup the database object sqlite3 *histoDatabase; // Open the database from filessytem if(sqlite3_open([databasePath UTF8String], &histoDatabase) == SQLITE_OK) { NSLog(@"The database was opened"); // Setup the SQL Statement and compile it for faster access const char *sqlStatement = "SELECT * FROM CELL"; sqlite3_stmt *compiledStatement; if(sqlite3_prepare_v2(histoDatabase, sqlStatement, -1, &compiledStatement, NULL) != SQLITE_OK) { NSAssert1(0, @"Error while creating add statement. '%s'", sqlite3_errmsg(histoDatabase)); } if(sqlite3_prepare_v2(histoDatabase, sqlStatement, -1, &compiledStatement, NULL) == SQLITE_OK) { // Loop through the results and add them to cell MO array while(sqlite3_step(compiledStatement) == SQLITE_ROW) { CELL *cellMO = [NSEntityDescription insertNewObjectForEntityForName:@"CELL" inManagedObjectContext:[self managedObjectContext]]; if (sqlite3_column_type(compiledStatement, 0) != SQLITE_NULL) { cellMO.cellName = [NSString stringWithUTF8String:(char *)sqlite3_column_text(compiledStatement, 0)]; } else { cellMO.cellName = @"undefined"; } if (sqlite3_column_type(compiledStatement, 1) != SQLITE_NULL) { cellMO.cellDescription = [NSString stringWithUTF8String:(char *)sqlite3_column_text(compiledStatement, 1)]; } else { cellMO.cellDescription = @"undefined"; } NSLog(@"The contents of NSString *cellName = %@",[cellMO.cellName description]); } } // Release the compiled statement from memory sqlite3_finalize(compiledStatement); } sqlite3_close(histoDatabase); } I have a feeling that it has something to do with the timing of opening/closing both of the databases? Attached I have some SQL debugging output to the terminal 2012-05-28 16:03:39.556 MedPix[34751:fb03] The createDatabase function was entered. 2012-05-28 16:03:39.557 MedPix[34751:fb03] The databasePath is /Users/jack/Library/Application Support/iPhone Simulator/5.1/Applications/A6B2A79D-BA93-4E24-9291-5B7948A3CDF4/MedPix.app/histology.sqlite 2012-05-28 16:03:39.559 MedPix[34751:fb03] The database was opened 2012-05-28 16:03:39.560 MedPix[34751:fb03] The database was prepared 2012-05-28 16:03:39.575 MedPix[34751:fb03] CoreData: annotation: Connecting to sqlite database file at "/Users/jack/Library/Application Support/iPhone Simulator/5.1/Applications/A6B2A79D-BA93-4E24-9291-5B7948A3CDF4/Documents/MedPix.sqlite" 2012-05-28 16:03:39.576 MedPix[34751:fb03] CoreData: annotation: creating schema. 2012-05-28 16:03:39.577 MedPix[34751:fb03] CoreData: sql: pragma page_size=4096 2012-05-28 16:03:39.578 MedPix[34751:fb03] CoreData: sql: pragma auto_vacuum=2 2012-05-28 16:03:39.630 MedPix[34751:fb03] CoreData: sql: BEGIN EXCLUSIVE 2012-05-28 16:03:39.631 MedPix[34751:fb03] CoreData: sql: SELECT TBL_NAME FROM SQLITE_MASTER WHERE TBL_NAME = 'Z_METADATA' 2012-05-28 16:03:39.632 MedPix[34751:fb03] CoreData: sql: CREATE TABLE ZCELL ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZCELLDESCRIPTION VARCHAR, ZCELLNAME VARCHAR ) ... 2012-05-28 16:03:39.669 MedPix[34751:fb03] CoreData: annotation: Creating primary key table. 2012-05-28 16:03:39.671 MedPix[34751:fb03] CoreData: sql: CREATE TABLE Z_PRIMARYKEY (Z_ENT INTEGER PRIMARY KEY, Z_NAME VARCHAR, Z_SUPER INTEGER, Z_MAX INTEGER) 2012-05-28 16:03:39.672 MedPix[34751:fb03] CoreData: sql: INSERT INTO Z_PRIMARYKEY(Z_ENT, Z_NAME, Z_SUPER, Z_MAX) VALUES(1, 'CELL', 0, 0) ... 2012-05-28 16:03:39.701 MedPix[34751:fb03] CoreData: sql: CREATE TABLE Z_METADATA (Z_VERSION INTEGER PRIMARY KEY, Z_UUID VARCHAR(255), Z_PLIST BLOB) 2012-05-28 16:03:39.702 MedPix[34751:fb03] CoreData: sql: SELECT TBL_NAME FROM SQLITE_MASTER WHERE TBL_NAME = 'Z_METADATA' 2012-05-28 16:03:39.703 MedPix[34751:fb03] CoreData: sql: DELETE FROM Z_METADATA WHERE Z_VERSION = ? 2012-05-28 16:03:39.704 MedPix[34751:fb03] CoreData: sql: INSERT INTO Z_METADATA (Z_VERSION, Z_UUID, Z_PLIST) VALUES (?, ?, ?) 2012-05-28 16:03:39.705 MedPix[34751:fb03] CoreData: sql: COMMIT 2012-05-28 16:03:39.710 MedPix[34751:fb03] CoreData: sql: pragma cache_size=200 2012-05-28 16:03:39.711 MedPix[34751:fb03] CoreData: sql: SELECT Z_VERSION, Z_UUID, Z_PLIST FROM Z_METADATA 2012-05-28 16:03:39.712 MedPix[34751:fb03] The contents of NSString *cellName = Beta Cell 2012-05-28 16:03:39.712 MedPix[34751:fb03] The contents of NSString *cellName = Gastric Chief Cell ... 2012-05-28 16:03:39.714 MedPix[34751:fb03] The database was prepared 2012-05-28 16:03:39.764 MedPix[34751:fb03] The createDatabase function has finished. Now fetching. 2012-05-28 16:03:39.765 MedPix[34751:fb03] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZCELLDESCRIPTION, t0.ZCELLNAME FROM ZCELL t0 2012-05-28 16:03:39.766 MedPix[34751:fb03] CoreData: annotation: sql connection fetch time: 0.0008s 2012-05-28 16:03:39.767 MedPix[34751:fb03] CoreData: annotation: total fetch execution time: 0.0016s for 0 rows. 2012-05-28 16:03:39.768 MedPix[34751:fb03] cellName: <CELL: 0x6bbc120> (entity: CELL; id: 0x6bbc160 <x-coredata:///CELL/t57D10DDD-74E2-474F-97EE-E3BD0FF684DA34> ; data: { cellDescription = "S cells are cells which release secretin, found in the jejunum and duodenum. They are stimulated by a drop in pH to 4 or below in the small intestine's lumen. The released secretin will increase the s"; cellName = "S Cell"; organs = ( ); specimens = ( ); systems = ( ); tissues = ( ); }) ... Sections were cut short to abbreviate. But note that the fetch results contain information, but it says that total fetch execution was for "0" rows? How can that be? Any help will be greatly appreciated, especially detailed explanations. :) Thanks.

Read the article

What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article

Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

- by darvids0n

I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

Read the article

rotating bitmaps. In code.

- by Marco van de Voort

Is there a faster way to rotate a large bitmap by 90 or 270 degrees than simply doing a nested loop with inverted coordinates? The bitmaps are 8bpp and typically 2048*2400*8bpp Currently I do this by simply copying with argument inversion, roughly (pseudo code: for x = 0 to 2048-1 for y = 0 to 2048-1 dest[x][y]=src[y][x]; (In reality I do it with pointers, for a bit more speed, but that is roughly the same magnitude) GDI is quite slow with large images, and GPU load/store times for textures (GF7 cards) are in the same magnitude as the current CPU time. Any tips, pointers? An in-place algorithm would even be better, but speed is more important than being in-place. Target is Delphi, but it is more an algorithmic question. SSE(2) vectorization no problem, it is a big enough problem for me to code it in assembler Duplicates How do you rotate a two dimensional array?. Follow up to Nils' answer Image 2048x2700 - 2700x2048 Compiler Turbo Explorer 2006 with optimization on. Windows: Power scheme set to "Always on". (important!!!!) Machine: Core2 6600 (2.4 GHz) time with old routine: 32ms (step 1) time with stepsize 8 : 12ms time with stepsize 16 : 10ms time with stepsize 32+ : 9ms Meanwhile I also tested on a Athlon 64 X2 (5200+ iirc), and the speed up there was slightly more than a factor four (80 to 19 ms). The speed up is well worth it, thanks. Maybe that during the summer months I'll torture myself with a SSE(2) version. However I already thought about how to tackle that, and I think I'll run out of SSE2 registers for an straight implementation: for n:=0 to 7 do begin load r0, <source+n*rowsize> shift byte from r0 into r1 shift byte from r0 into r2 .. shift byte from r0 into r8 end; store r1, <target> store r2, <target+1*<rowsize> .. store r8, <target+7*<rowsize> So 8x8 needs 9 registers, but 32-bits SSE only has 8. Anyway that is something for the summer months :-) Note that the pointer thing is something that I do out of instinct, but it could be there is actually something to it, if your dimensions are not hardcoded, the compiler can't turn the mul into a shift. While muls an sich are cheap nowadays, they also generate more register pressure afaik. The code (validated by subtracting result from the "naieve" rotate1 implementation): const stepsize = 32; procedure rotatealign(Source: tbw8image; Target:tbw8image); var stepsx,stepsy,restx,resty : Integer; RowPitchSource, RowPitchTarget : Integer; pSource, pTarget,ps1,ps2 : pchar; x,y,i,j: integer; rpstep : integer; begin RowPitchSource := source.RowPitch; // bytes to jump to next line. Can be negative (includes alignment) RowPitchTarget := target.RowPitch; rpstep:=RowPitchTarget*stepsize; stepsx:=source.ImageWidth div stepsize; stepsy:=source.ImageHeight div stepsize; // check if mod 16=0 here for both dimensions, if so -> SSE2. for y := 0 to stepsy - 1 do begin psource:=source.GetImagePointer(0,y*stepsize); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(target.imagewidth-(y+1)*stepsize,0); for x := 0 to stepsx - 1 do begin for i := 0 to stepsize - 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[stepsize-1-i]; // (maxx-i,0); for j := 0 to stepsize - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize); inc(ptarget,rpstep); end; end; // 3 more areas to do, with dimensions // - stepsy*stepsize * restx // right most column of restx width // - stepsx*stepsize * resty // bottom row with resty height // - restx*resty // bottom-right rectangle. restx:=source.ImageWidth mod stepsize; // typically zero because width is // typically 1024 or 2048 resty:=source.Imageheight mod stepsize; if restx>0 then begin // one loop less, since we know this fits in one line of "blocks" psource:=source.GetImagePointer(source.ImageWidth-restx,0); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(Target.imagewidth-stepsize,Target.imageheight-restx); for y := 0 to stepsy - 1 do begin for i := 0 to stepsize - 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[stepsize-1-i]; // (maxx-i,0); for j := 0 to restx - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize*RowPitchSource); dec(ptarget,stepsize); end; end; if resty>0 then begin // one loop less, since we know this fits in one line of "blocks" psource:=source.GetImagePointer(0,source.ImageHeight-resty); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(0,0); for x := 0 to stepsx - 1 do begin for i := 0 to resty- 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[resty-1-i]; // (maxx-i,0); for j := 0 to stepsize - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize); inc(ptarget,rpstep); end; end; if (resty>0) and (restx>0) then begin // another loop less, since only one block psource:=source.GetImagePointer(source.ImageWidth-restx,source.ImageHeight-resty); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(0,target.ImageHeight-restx); for i := 0 to resty- 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[resty-1-i]; // (maxx-i,0); for j := 0 to restx - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; end; end;

Search Results

Search found 4580 results on 184 pages for 'what is faster'.

Page 182/184 | < Previous Page | 178 179 180 181 182 183 184 | Next Page >

- by joelpt

- by dcp3450

- by Tom

- by SoLoGHoST

- by BEBO

- by NicMagnier

- by Alex

- by Paul White

- by JoshReuben

- by danx

- by danx

- by Stephen.Walther

- by Alois Kraus

- by Jon Galloway

- by Rick Strahl

- by Paul White

- by Rhames

- by Tanu Sood

- by Oleg Abrazhaev

- by goldenmean

- by esd100

- by Alex Dunlop

- by darvids0n

- by Marco van de Voort

< Previous Page | 178 179 180 181 182 183 184 | Next Page >