Search Results

Search found 108599 results on 4344 pages for 'one click publish'.

Page 448/4344 | < Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455 | Next Page >

Creating 3rd party library in Android. Need suggestions!!

- by Lalith

Hello all, I am planning to create a 3rd party library for android application developers. Also the main concern is I do not want the users to make many changes to their code base. The following are the main things I want to clarify before diving into the project. 1) I want to attach my own event handlers dynamically to the application. As per above limitations I mentioned, I cannot make users put the relevant code in their handlers as I need to support existing applications as well. I found out that we cannot attach more than one event handler per one listener. For example Button.setOnclickListener(somelistener) .. will set one listener. In traditional desktop applications we can add more than one handlers but here if I do one more Button.setOnclickListener(someOtherListener) this one overwrites the previous listener. Is there any way I can attach my own listener along with application's existing listeners dynamically? Also is it really not supported on android yet? 2) Is there any 3rd party alternative that exists already that monitors user's interactions in androidi? 3) If this is not supported now are there any future plans to support this feature? Please let me know about this as I did not get much feedback on Stackoverflow. Any suggestions and workarounds would really help me continue with this project. Regards, Lalith

Read the article
preg_match to find the current directory in a URL

- by Ian

I'm trying to detect the current section of a site that a user is viewing by checking for the final directory in the URL. I'm using a PHP and regex to do it and I think I'm close but unfortunately not quite there yet. Here's what I currently have: <?php $url = $_SERVER['REQUEST_URI_PATH'] = preg_replace('/\\?.*/', '', $_SERVER['REQUEST_URI']); $one = '/one/'; $two = '/three/'; $three = '/three/'; $four = '/four/'; $five = '/five/'; echo $url; if (substr($_SERVER['REQUEST_URI_PATH'], 0, strlen($one)) == $one) { // URI path starts with "/one/" echo "The section is one."; } elseif (substr($_SERVER['REQUEST_URI_PATH'], 0, strlen($two)) == $two) { // URI path starts with "/two/" echo "The section is two."; } elseif (substr($_SERVER['REQUEST_URI_PATH'], 0, strlen($three)) == $three) { // URI path starts with "/three/" echo "The section is three."; } elseif (substr($_SERVER['REQUEST_URI_PATH'], 0, strlen($four)) == $four) { // URI path starts with "/four/" echo "The section is four."; } elseif (substr($_SERVER['REQUEST_URI_PATH'], 0, strlen($five)) == $five) { // URI path starts with "/five/" echo "The section is five."; } ?> I've placed in the echo before the if statements just to get confirmation of the value of $url. This outputs /currentdirectory/file.php However the conditions themselves don't match anything and my individual echo for each section never displays. Also if there's a simpler way of doing it then I'm open to suggestions. Thanks

Read the article
jQuery for dynamic Add/Remove row function, it's clone() objcet cannot modify element name

- by wcy0942

I'm try jQuery for dynamic Add/Remove row function, but I meet some question in IE8 , it's clone() objcet cannot modify element name and cannot use javascript form (prhIndexed[i].prhSrc).functionKey, but in FF it works very well, source code as attachment, please give me a favor to solve the problem. <html> $(document).ready(function() { //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //Define some variables - edit to suit your needs //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // table id var _table = jQuery("#prh"); // modify here // tbody tbody var _tableBody = jQuery("tbody",_table); // buttons var _addRowBtn = jQuery("#controls #addRow"); var _insertRowBtn= jQuery("#controls #insertRow"); var _removeRowBtn= jQuery("#controls #removeRow"); //check box all var _cbAll= jQuery(".checkBoxAll", _table ); // add how many rows var _addRowsNumber= jQuery("#controls #add_rows_number"); var _hiddenControls = jQuery("#controls .hiddenControls"); var blankRowID = "blankRow"; //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //click the add row button //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _addRowBtn.click(function(){ // when input not isNaN do add row if (! isNaN(_addRowsNumber.attr('value')) ){ for (var i = 0 ; i < _addRowsNumber.attr('value') ;i++){ var newRow = jQuery("#"+blankRowID).clone(true).appendTo(_tableBody) .attr("style", "display: ''") .addClass("rowData") .removeAttr("id"); } refreshTable(_table); } return false; //kill the browser default action }); //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //checkbox select all //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _cbAll.click(function(){ var checked_status = this.checked; var prefixName = _cbAll.attr('name'); // find name prefix match check box (group of table) jQuery("input[name^='"+prefixName+"']").each(function() { this.checked = checked_status; }); }); //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //Click the remove all button //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _removeRowBtn.click(function(){ var prefixName = _cbAll.attr('name'); // find name prefix match check box (group of table) jQuery("input[name^='"+prefixName+"']").not(_cbAll).each(function() { if (this.checked){ // remove tr row , ckbox name the same with rowid jQuery("#"+this.name).remove(); } }); refreshTable(_table); return false; }); //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //Click the insert row button //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _insertRowBtn.click(function(){ var prefixName = _cbAll.attr('name'); jQuery("input[name^='"+prefixName+"']").each(function(){ var currentRow = this.name;// ckbox name the same with rowid if (this.checked == true){ newRow = jQuery("#"+blankRowID).clone(true).insertAfter(jQuery("#"+currentRow)) .attr("style", "display: ''") .addClass("rowData") .removeAttr("id"); } }); refreshTable(_table); return false; }); //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //Function to refresh new row //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ function refreshTable(_table){ var tableId = _table.attr('id'); var count =1; // ignore hidden column // update tr rowid jQuery ( "#"+tableId ).find(".rowData").each(function(){ jQuery(this).attr('id', tableId + "_" + count ); count ++; }); count =0; jQuery ( "#"+tableId ).find("input[type='checkbox'][name^='"+tableId+"']").not(".checkBoxAll").each(function(){ // update check box id and name (not check all) jQuery(this).attr('id', tableId + "_ckbox" + count ); jQuery(this).attr('name', tableId + "_" + count ); count ++; }); // write customize code here customerRow(_table); }; //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //Function to customer new row : modify here //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ function customerRow(_table){ var form = document.myform; var pageColumns = ["prhSeq", "prhChannelproperty", "prhSrc"]; //modify here var tableId = _table.attr('id'); var count =1; // ignore hidden column // update tr rowid jQuery ( "#"+tableId ).find(".rowData").each(function(){ for(var i = 0; i < pageColumns.length; i++){ jQuery ( this ).find("input[name$='"+pageColumns[i]+"']").each(function(){ jQuery(this).attr('name', 'prhIndexed['+count+'].'+pageColumns[i] ); // update prhSeq Value if (pageColumns[i] == "prhSeq") { jQuery(this).attr('value', count ); } if (pageColumns[i] == "prhSrc") { // clear default onfocus //jQuery(this).attr("onfocus", ""); jQuery(this).focus(function() { // doSomething }); } }); jQuery ( this ).find("select[name$='"+pageColumns[i]+"']").each(function(){ jQuery(this).attr('name', 'prhIndexed['+count+'].'+pageColumns[i] ); }); }// end of for count ++; }); jQuery ( "#"+tableId ).find(".rowData").each(function(){ // only for debug alert ( jQuery(this).html() ) }); }; //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ }); <div id="controls"> <table width="350px" border="0"> <tr><td> <input id="addRow" type="button" name="addRows" value="Add Row" /> <input id="add_rows_number" type="text" name="add_rows_number" value="1" style="width:20px;" maxlength="2" /> <input id="insertRow" type="button" name="insert" value="Insert Row" /> <input id="removeRow" type="button" name="deleteRows" value="Delete Row" /> </td></tr> </table></div> <table id="prh" width="350px" border="1"> <thead> <tr class="listheader"> <td nowrap width="21"><input type="checkbox" name="prh_" class="checkBoxAll"/></td> <td nowrap width="32">Sequence</td> <td nowrap width="153" align="center">Channel</td> <td nowrap width="200">Source</td> </tr> </thead> <tbody>  <tr id='blankRow' style="display:none" > <td><input type="checkbox" id='prh_ckbox0' name='prh_0' value=""/></td> <td align="right"><input type="text" name="prhIndexed[0].prhSeq" maxlength="10" value="" onkeydown="" onblur="" onfocus="" readonly="readonly" style="width:30px;background-color:transparent;border:0;line-height:13pt;color: #993300;background-color:transparent;border:0;line-height:13pt;color: #993300;"></td> <td><select name="prhIndexed[0].prhChannelproperty"><option value=""></option> <option value="A01">A01</option> <option value="A02">A02</option> <option value="A03">A03</option> <option value="A04">A04</option> </select></td> <td><input type="text" name="prhIndexed[0].prhSrc" maxlength="6" value="new" style="width:80px;background-color:#FFFFD7;"> <div id='displayPrhSrcName0'></div> </td> </tr>  <tr id='prh_1' class="rowData"> <td><input type="checkbox" id='prh_ckbox1' name='prh_1' value=""/></td> <td align="right"><input type="text" name="prhIndexed[1].prhSeq" maxlength="10" value="1" onkeydown="" onblur="" onfocus="" readonly="readonly" style="width:30px;background-color:transparent;border:0;line-height:13pt;color: #993300;background-color:transparent;border:0;line-height:13pt;color: #993300;"></td> <td><select name="prhIndexed[1].prhChannelproperty"><option value=""></option> <option value="A01">A01</option> <option value="A02">A02</option> <option value="A03">A03</option> <option value="A04">A04</option> </select></td> <td><input type="text" name="prhIndexed[1].prhSrc" maxlength="6" value="new" style="width:80px;background-color:#FFFFD7;"> <div id='displayPrhSrcName0'></div> </td> </tr> <tr id='prh_2' class="rowData"> <td><input type="checkbox" id='prh_ckbox2' name='prh_2' value=""/></td> <td align="right"><input type="text" name="prhIndexed[2].prhSeq" maxlength="10" value="2" onkeydown="" onblur="" onfocus="" readonly="readonly" style="width:30px;background-color:transparent;border:0;line-height:13pt;color: #993300;background-color:transparent;border:0;line-height:13pt;color: #993300;"></td> <td><select name="prhIndexed[2].prhChannelproperty"><option value=""></option> <option value="A01">A01</option> <option value="A02">A02</option> <option value="A03">A03</option> <option value="A04">A04</option> </select></td> <td><input type="text" name="prhIndexed[2].prhSrc" maxlength="6" value="new" style="width:80px;background-color:#FFFFD7;"> <div id='displayPrhSrcName0'></div> </td> </tr> </tbody> </table>

Read the article
Question About Example In Robert C Martin's _Clean Code_

- by Jonah

This is a question about the concept of a function doing only one thing. It won't make sense without some relevant passages for context, so I'll quote them here. They appear on pgs 37-38: To say this differently, we want to be able to read the program as though it were a set of TO paragraphs, each of which is describing the current level of abstraction and referencing subsequent TO paragraphs at the next level down. To include the setups and teardowns, we include setups, then we include the test page content, and then we include the teardowns. To include the setups, we include the suite setup if this is a suite, then we include the regular setup. It turns out to be very dif?cult for programmers to learn to follow this rule and write functions that stay at a single level of abstraction. But learning this trick is also very important. It is the key to keeping functions short and making sure they do “one thing.” Making the code read like a top-down set of TO paragraphs is an effective technique for keeping the abstraction level consistent. He then gives the following example of poor code: public Money calculatePay(Employee e) throws InvalidEmployeeType { switch (e.type) { case COMMISSIONED: return calculateCommissionedPay(e); case HOURLY: return calculateHourlyPay(e); case SALARIED: return calculateSalariedPay(e); default: throw new InvalidEmployeeType(e.type); } } and explains the problems with it as follows: There are several problems with this function. First, it’s large, and when new employee types are added, it will grow. Second, it very clearly does more than one thing. Third, it violates the Single Responsibility Principle7 (SRP) because there is more than one reason for it to change. Fourth, it violates the Open Closed Principle8 (OCP) because it must change whenever new types are added. Now my questions. To begin, it's clear to me how it violates the OCP, and it's clear to me that this alone makes it poor design. However, I am trying to understand each principle, and it's not clear to me how SRP applies. Specifically, the only reason I can imagine for this method to change is the addition of new employee types. There is only one "axis of change." If details of the calculation needed to change, this would only affect the submethods like "calculateHourlyPay()" Also, while in one sense it is obviously doing 3 things, those three things are all at the same level of abstraction, and can all be put into a TO paragraph no different from the example one: TO calculate pay for an employee, we calculate commissioned pay if the employee is commissioned, hourly pay if he is hourly, etc. So aside from its violation of the OCP, this code seems to conform to Martin's other requirements of clean code, even though he's arguing it does not. Can someone please explain what I am missing? Thanks.

Read the article
Windows forms application blocks after station lock

- by Silviu

We're having a serious issue at work. We've discovered that after the station where the client was running is locked/unlocked the client is blocked. No repaint. So the UI thread is blocked with something. Looking at the callstack of the UI thread (thread 0) using windbg we see that a UserPreferenceChanged event gets raised. It is marshalled through a WindowsFormsSynchronizationContext using it's controlToSend field to the UI. It gets blocked by a call to the marshalling control. The method called is MarshaledInvoke it builds a ThreadMethodEntry entry = new ThreadMethodEntry(caller, method, args, synchronous, executionContext); This entry is supposed to do the magic. The call is a synchronous call and because of that (still in the MarshaledInvoke of the Control class) the wait call is reached: if (!entry.IsCompleted) { this.WaitForWaitHandle(entry.AsyncWaitHandle); } The last thing that i can see on the stack is the WaitOne called on the previously mentioned AsyncWaitHandle. This is very annoying because having just the callstack of the runtime and not one of our methods being invoked we cannot really point to a bug in our code. I might be wrong, but I'm guessing that the marshaling control is not "marshaling" to the ui thread. But another one...i don't really know which one because the other threads are being used by us and are blocked...maybe this is the issue. But none of the other threads are running a message loop. This is very annoying. We had some issues in the past with marshaling controls to the right ui thread. That is because the first form that is constructed is a splash form. Which is not the main form. We used to use the main form to marshal call to the ui thread. But from time to time some calls would go to a non ui thread and some grids would broke with a big red X on them. I fixed this by creating a specific class: public class WindowsFormsSynchronizer { private static readonly WindowsFormsSynchronizationContext = new WindowsFormsSynchronizationContext(); //Methods are following that would build the same interface of the synchronization context. } This class gets build as one of the first objects in the first form being constructed. We've noticed some other strange thing. Looking at the heap there are 7 WindowsFormsSynchronizationContext objects. 6 of these have the same instance of controlToSend, and the other one has some different instance of controlToSend. This last one is the one that should marshal the calls to the UI. I don't have any other idea...maybe some of you guys had this same issue?

Read the article
Ray-Box Intersection during Scene traversal with matrix transforms

- by Myx

Hello: There are a few ways that I'm testing my ray-box intersections: Using the ComputeIntersectionBox(...) method, that takes a ray and a box as arguments and computes the closest intersection of the ray and the box. This method works by forming a plane with each of the faces of the box and finding an intersection with each of the planes. Once an intersection is found, a check is made whether or not the point is on the surface of the box by checking that the intersection point is between the corner points. When I look at rays after running this algorithm on two different boxes, I obtain the correct intersections. Using ComputeIntersectionScene(...) method without using the matrix transformations on a scene that has two spheres, a dodecahedron (a triangular mesh), and two boxes. ComputeIntersectionScene(...) recursively traverses all of the nodes of the scene graph and computes the closest intersection with the given ray. This test in particular does not apply any transformations that parent nodes may have that also need to be applied to their children. With this test, I also obtain the correct intersections. Using ComputeIntersectionScene(...) method WITH the matrix transformations. This test works like the one above except that before finding an intersection between the ray and a node in the scene, the ray is transformed into the node's coordinate frame using the inverse of the node's transformation matrix and after the intersection has been computed, this intersection is transformed back into the world coordinates by applying the transformation matrix to the intersection point. When testing with the third method on the same scene file as described in 2, testing with 4 rays (thus one ray intersects the one sphere, one ray the the other sphere, one ray one box, and one ray the other box), only the two spheres get intersected and the two boxes do not get intersections. When I debug looking into my ComputeIntersectionBox(...) method, it actually tells me that the ray intersects every plane on the box but each intersection point does not lie on the box. This seems to be strange behavior, since when using test 2 without transformations, I obtain the correct box intersections (thus, I believe my ray-box intersection to be correct) and when using test 3 WITH transformations, I obtain the correct sphere intersections (thus, I believe my transformed ray should be OK). Any suggestions where I could be going wrong? Thank you in advance.

Read the article
ArithmeticException thrown during BigDecimal.divide

- by polygenelubricants

I thought java.math.BigDecimal is supposed to be The Answer™ to the need of performing infinite precision arithmetic with decimal numbers. Consider the following snippet: import java.math.BigDecimal; //... final BigDecimal one = BigDecimal.ONE; final BigDecimal three = BigDecimal.valueOf(3); final BigDecimal third = one.divide(three); assert third.multiply(three).equals(one); // this should pass, right? I expect the assert to pass, but in fact the execution doesn't even get there: one.divide(three) causes ArithmeticException to be thrown! Exception in thread "main" java.lang.ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result. at java.math.BigDecimal.divide It turns out that this behavior is explicitly documented in the API: In the case of divide, the exact quotient could have an infinitely long decimal expansion; for example, 1 divided by 3. If the quotient has a non-terminating decimal expansion and the operation is specified to return an exact result, an ArithmeticException is thrown. Otherwise, the exact result of the division is returned, as done for other operations. Browsing around the API further, one finds that in fact there are various overloads of divide that performs inexact division, i.e.: final BigDecimal third = one.divide(three, 33, RoundingMode.DOWN); System.out.println(three.multiply(third)); // prints "0.999999999999999999999999999999999" Of course, the obvious question now is "What's the point???". I thought BigDecimal is the solution when we need exact arithmetic, e.g. for financial calculations. If we can't even divide exactly, then how useful can this be? Does it actually serve a general purpose, or is it only useful in a very niche application where you fortunately just don't need to divide at all? If this is not the right answer, what CAN we use for exact division in financial calculation? (I mean, I don't have a finance major, but they still use division, right???).

Read the article
JSON or YAML encoding in GWT/Java on both client and server

- by KennethJ

I'm looking for a super simple JSON or YAML library (not particularly bothered which one) written in Java, and can be used in both GWT on the client, and in its original Java form on the server. What I'm trying to do is this: I have my models, which are shared between the client and the server, and these are the primary source of data interchange. I want to design the web service in between to be as simple as possible, and decided to take the RESTful approach. My problem is that I know our application will grow substantially in the future, and writing all the getters, setters, serialization, factories, etc. by hand fills me with absolute dread. So in order to avoid it, I decided to implement annotations to keep track of attributes on the models. The reason I can't just serialize everything directly, using GWT's own one, or one which works through reflection, is because we need a certain amount of logic going on in the serialization process. I.e. whether references to other models get serialized during the serialization of the original model, or whether an ID is just passed, and general simple things like that. I've then written an annotation processor to preprocess my shared models and generate an implementing class with all the getters, setters, serialization, lazy-loading, etc. To make a long story short, I need some type of simple YAML or JSON library, which allows me to encode and decode manually, so I can generate this code through my annotation processor. I have had a look around the interwebs, but every single one I ran into supported some reflection which, while all fine and dandy, make it pretty much useless for GWT. And in the case of GWT's own JSON library, it uses JSNI for speed purposes, making it useless server side. One solution I did think about involved writing writing two sets of serialization methods on the models, one for the client and one for the server, but I'd rather not do that. Also, I'm pretty new to GWT, and even though I have done a lot of Java, it was back in the 1.2 days, so it's a bit rusty. So if you think I'm going about this problem completely the wrong way, I'm open to suggestions.

Read the article
Move options between multiple lists

- by Martha

We currently have a form with the standard multi-select functionality of "here are the available options, here are the selected options, here are some buttons to move stuff back and forth." However, the client now wants the ability to not just select certain items, but to also categorize them. For example, given a list of books, they want to not just select the ones they own, but also the ones they've read, the ones they would like to read, and the ones they've heard about. (All examples fictional.) Thankfully, a selected item can only be in one category at a time. I can find many examples of moving items between listboxes, but not a single one for moving items between multiple listboxes. To add to the complication, the form needs to have two sets of list+categories, e.g. a list of movies that need to be categorized in addition to the aforementioned books. EDIT: Having now actually sat down to try to code the non-javascripty bits, I need to revise my question, because I realized that multiple select lists won't really work from the "how do I inform the server about all this lovely new information" standpoint. So the html code is now a pseudo-listbox, i.e. an unordered list (ul) displayed in a box with a scrollbar, and each list item (<li>) has a set of five radio buttons (unselected/own/read/like/heard). My task is still roughly the same: how to take this one list and make it easy to categorize the items, in such a way that the user can tell at a glance what is in what category. (The pseudo-listbox has some of the same disadvantages as a multi-select listbox, namely it's hard to tell what's selected if the list is long enough to scroll.) The dream solution would be a drag-and-drop type thing, but at this point even buttons would be OK. Another modification (a good one) is that the client has revised the lists, so the longest is now "only" 62 items long (instead of the many hundreds they had before). The categories will still mostly contain zero, one, or two selected items, possibly a couple more if the user was overzealous. As far as OS and stuff, the site is in classic asp (quit snickering!), the server-side code is VBScript, and so far we've avoided the various Javascript libraries by the simple expedient of almost never using client-side scripting. This one form for this one client is currently the big exception. Give 'em an inch and they want a mile... Oh, and I have to add: I suck at Javascript, or really at any C-descendant language. Curly braces give me hives. I'd really, really like something I can just copy & paste into my page, maybe tweak some variable names, and never look at it again. A girl can dream, can't she? :) [existing code deleted because it's largely irrelevant.]

Read the article
Separate specific #ifdef branches

- by detly

In short: I want to generate two different source trees from the current one, based only on one preprocessor macro being defined and another being undefined, with no other changes to the source. If you are interested, here is my story... In the beginning, my code was clean. Then we made a new product, and yea, it was better. But the code saw only the same peripheral devices, so we could keep the same code. Well, almost. There was one little condition that needed to be changed, so I added: #if defined(PRODUCT_A) condition = checkCat(); #elif defined(PRODUCT_B) condition = checkCat() && checkHat(); #endif ...to one and only one source file. In the general all-source-files-include-this header file, I had: #if !(defined(PRODUCT_A)||defined(PRODUCT_B)) #error "Don't make me replace you with a small shell script. RTFM." #endif ...so that people couldn't compile it unless they explicitly defined a product type. All was well. Oh... except that modifications were made, components changed, and since the new hardware worked better we could significantly re-write the control systems. Now when I look upon the face of the code, there are more than 60 separate areas delineated by either: #ifdef PRODUCT_A ... #else ... #endif ...or the same, but for PRODUCT_B. Or even: #if defined(PRODUCT_A) ... #elif defined(PRODUCT_B) ... #endif And of course, sometimes sanity took a longer holiday and: #ifdef PRODUCT_A ... #endif #ifdef PRODUCT_B ... #endif These conditions wrap anywhere from one to two hundred lines (you'd think that the last one could be done by switching header files, but the function names need to be the same). This is insane. I would be better off maintaining two separate product-based branches in the source repo and porting any common changes. I realise this now. Is there something that can generate the two different source trees I need, based only on PRODUCT_A being defined and PRODUCT_B being undefined (and vice-versa), without touching anything else (ie. no header inclusion, no macro expansion, etc)?

Read the article
LoaderContext and ApplicationDomain changes with Adobe AIR ?

- by Tyn

Hello, I'm currently experimenting with loading external SWF files from both an standard AS3 application, and an AIR application. It seems that the AIR application doesn't act the same way a standard SWF run by the Flash Player does. According to the documentation, the applicationDomain property of LoaderContext is usable in an AIR application too, but it just seems to be not working. I have the following code : package { import flash.display.Loader; import flash.display.LoaderInfo; import flash.display.Sprite; import flash.events.Event; import flash.net.URLRequest; import flash.system.ApplicationDomain; import flash.system.LoaderContext; public class Invoker extends Sprite { private var _ldr : Loader; public function Invoker() { _ldr = new Loader(); _ldr.contentLoaderInfo.addEventListener(Event.COMPLETE, onChildOneComplete); var ldrC : LoaderContext = new LoaderContext(false, new ApplicationDomain(ApplicationDomain.currentDomain) ); _ldr.load(new URLRequest("otherSwf.swf"), ldrC); } private function onChildOneComplete(e : Event) : void { var c1ad : ApplicationDomain = (e.target as LoaderInfo).applicationDomain; var inad : ApplicationDomain = ApplicationDomain.currentDomain; trace("Child One parentDomain : " + c1ad.parentDomain); trace("Invoker parentDomain : " + inad.parentDomain); trace("Child One has Invoker : " + c1ad.hasDefinition("Invoker")); trace("Invoker has Invoker : " + inad.hasDefinition("Invoker")); } } } Compiling this code as an SWF file and launching it with the Flash Player does this output, which seems right : Child One parentDomain : [object ApplicationDomain] Invoker parentDomain : null Child One has Invoker : true Invoker has Invoker : true But the same code as an AIR application does a different output : Child One parentDomain : null Invoker parentDomain : null Child One has Invoker : false Invoker has Invoker : true According to the documentation, the first output (using a SWF with Flash Player, and not an AIR application) is the right one. Also, playing around with this snippet and changing the application domain to others possible configurations (like new ApplicationDomain(null), or ApplicationDomain.currentDomain) does exaclty what the documentation says with the SWF, but does not change the output of the AIR application. Any clue why AIR is simply ignoring the application domain passed to the loader context ? Any documentation about this particular issue ? Thank you very much.

Read the article
Parallel programming in C#

- by Alxandr

I'm interested in learning about parallel programming in C#.NET (not like everything there is to know, but the basics and maybe some good-practices), therefore I've decided to reprogram an old program of mine which is called ImageSyncer. ImageSyncer is a really simple program, all it does is to scan trough a folder and find all files ending with .jpg, then it calculates the new position of the files based on the date they were taken (parsing of xif-data, or whatever it's called). After a location has been generated the program checks for any existing files at that location, and if one exist it looks at the last write-time of both the file to copy, and the file "in its way". If those are equal the file is skipped. If not a md5 checksum of both files is created and matched. If there is no match the file to be copied is given a new location to be copied to (for instance, if it was to be copied to "C:\test.jpg" it's copied to "C:\test(1).jpg" instead). The result of this operation is populated into a queue of a struct-type that contains two strings, the original file and the position to copy it to. Then that queue is iterated over untill it is empty and the files are copied. In other words there are 4 operations: 1. Scan directory for jpegs 2. Parse files for xif and generate copy-location 3. Check for file existence and if needed generate new path 4. Copy files And so I want to rewrite this program to make it paralell and be able to perform several of the operations at the same time, and I was wondering what the best way to achieve that would be. I've came up with two different models I can think of, but neither one of them might be any good at all. The first one is to parallelize the 4 steps of the old program, so that when step one is to be executed it's done on several threads, and when the entire of step 1 is finished step 2 is began. The other one (which I find more interesting because I have no idea of how to do that) is to create a sort of worker and consumer model, so when a thread is finished with step 1 another one takes over and performs step 2 at that object (or something like that). But as said, I don't know if any of these are any good solutions. Also, I don't know much about parallel programming at all. I know how to make a thread, and how to make it perform a function taking in an object as its only parameter, and I've also used the BackgroundWorker-class on one occasion, but I'm not that familiar with any of them. Any input would be appreciated.

Read the article
How do I stack images to simulate depth using Core Animation?

- by Jeffrey Berthiaume

I have a series of UIImages with which I need to simulate depth. I can't use scaling because I need to be able to rotate the parent view, and the images should look like they're stacked visibly in front of each other, not on the same plane. I made a new ViewController-based project and put this in the viewDidLoad (as well as attached three 120x120 pixel images named 1.png, 2.png, and 3.png): - (void)viewDidLoad { // display image 3 UIImageView *three = [[UIImageView alloc] initWithImage:[UIImage imageNamed:@"3.png"]]; three.center = CGPointMake(160 + 60, 240 - 60); [self.view addSubview:three]; // rotate image 3 around the z axis // THIS IS INCORRECT CATransform3D theTransform = three.layer.transform; theTransform.m34 = 1.0 / -1000; three.layer.transform = theTransform; // display image 2 UIImageView *two = [[UIImageView alloc] initWithImage:[UIImage imageNamed:@"2.png"]]; two.center = CGPointMake(160, 240); [self.view addSubview:two]; // display image 1 UIImageView *one = [[UIImageView alloc] initWithImage:[UIImage imageNamed:@"1.png"]]; one.center = CGPointMake(160 - 60, 240 + 60); [self.view addSubview:one]; // rotate image 3 around the z axis // THIS IS INCORRECT theTransform = one.layer.transform; theTransform.m34 = 1.0 / 1000; one.layer.transform = theTransform; // release the images [one release]; [two release]; [three release]; // rotate the parent view around the y axis theTransform = self.view.layer.transform; theTransform.m14 = 1.0 / -500; self.view.layer.transform = theTransform; [super viewDidLoad]; } I have very specific reasons why I'm not using an EAGLView and why I'm not loading the images as CALayers (i.e. why I'm using UIImageViews for each one). This is just a quick demo that I can use to work out exactly what I need in my parent application. Is there some matrix way to translate these 2d images along the z-axis so they will look like what I'm trying to represent? I've gone through the other StackOverflow articles as well as the Wikipedia references, and have not found what I'm looking for -- although I might not necessarily be using the right terms for what I'm trying to do.

Read the article
More efficient SQL than using "A UNION (B in A)"?

- by machinatus

Edit 1 (clarification): Thank you for the answers so far! The response is gratifying. I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself). Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num). The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result. Original question: Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred. Example... id order_num tstamp item_id __ _________ ___________________ _______ 0 1 100 1 2 101 2 2 2010-01-05 12:34:56 102 3 3 113 4 4 124 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137 8 6 100 9 6 2010-01-13 08:33:55 105 What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with. The result set should look like this: id order_num tstamp item_id __ _________ ___________________ _______ 1 2 101 2 2 2010-01-05 12:34:56 102 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137 The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution: SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09') AS history_orders UNION SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp IS NULL) AS current_orders WHERE current_orders.order_id IN (SELECT orders.order_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

Read the article
Beware Sneaky Reads with Unique Indexes

- by Paul White NZ

A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?” Leaving aside trivial cases (like selecting a computed column that does reference the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest ( pk INTEGER IDENTITY NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540); WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( some_value, lob_data ) SELECT TOP (1000) N.n, @Data FROM Numbers N WHERE N.n <= 1000; Test 1: A Simple Update Let’s run a query to subtract one from every value in the some_value column: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources. The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time. The query plan is also very simple: Looking at the Clustered Index Scan, we can see that SQL Server only retrieves the pk and some_value columns during the scan: The pk column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed. The some_value column is used by the Compute Scalar to calculate the new value. (In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT). Test 2: Simple Update with an Index Now let’s create a nonclustered index keyed on the some_value column, with lob_data as an included column: CREATE NONCLUSTERED INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); This is not a useful index for our simple update query; imagine that someone else created it for a different purpose. Let’s run our update query again: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; We find that it now requires 4,014 logical reads and the elapsed query time has increased to around 100ms. The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index. The query plan is very similar to before (click to enlarge): The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index. The new Compute Scalar operators detect whether the value in the some_value column has actually been changed by the update. SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post on non-updating updates for details). Our simple query does change the value of some_data in every row, so this optimization doesn’t add any value in this specific case. The output list of columns from the Clustered Index Scan hasn’t changed from the one shown previously: SQL Server still just reads the pk and some_data columns. Cool. Overall then, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table. Let’s see what happens if we make the nonclustered index unique. Test 3: Simple Update with a Unique Index Here’s the script to create a new unique index, and drop the old one: CREATE UNIQUE NONCLUSTERED INDEX [UQ dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); GO DROP INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest; Remember that SQL Server only enforces uniqueness on index keys (the some_data column). The lob_data column is simply stored at the leaf-level of the non-clustered index. With that in mind, we might expect this change to make very little difference. Let’s see: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; Whoa! Now look at the elapsed time and logical reads: Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992. CPU time = 172 ms, elapsed time = 16172 ms. Even with all the data and index pages in memory, the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead). Why on earth is SQL Server reading LOB data in a query that only updates a single integer column? The Query Plan The query plan for test 3 looks a bit more complex than before: In fact, the bottom level is exactly the same as we saw with the non-unique index. The top level has heaps of new stuff though, which I’ll come to in a moment. You might be expecting to find that the Clustered Index Scan is now reading the lob_data column (for some reason). After all, we need to explain where all the LOB logical reads are coming from. Sadly, when we look at the properties of the Clustered Index Scan, we see exactly the same as before: SQL Server is still only reading the pk and some_value columns – so what’s doing the LOB reads? Updates that Sneakily Read Data We have to go as far as the Clustered Index Update operator before we see LOB data in the output list: [Expr1020] is a bit flag added by an earlier Compute Scalar. It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier). The Clustered Index Update operator adds two new columns: the lob_data column, and some_value_OLD. The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column. At this point, the clustered index has already been updated with the new value, but we haven’t touched the nonclustered index yet. An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its update operation. SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the data through all the operations that occur prior to the Clustered Index Update. The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then. Sneaky! Why the LOB Data Is Needed This is all very interesting (I hope), but why is SQL Server reading the LOB data? For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered Index Update? The answer relates to the top row of the query plan for test 3. I’ll reproduce it here for convenience: Notice that this is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index too. I’ll talk more about this difference shortly. The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by breaking each update into a delete/insert pair, reordering the operations, removing any redundant operations, and finally applying the net effect of all the changes to the nonclustered index. Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3. If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4. The net effect of all the changes is the same as if we simply deleted the value 1, and added a new value 4. By applying net changes, SQL Server can also avoid false unique-key violations. If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3 of course) and the query would fail. You might argue that SQL Server could avoid the uniqueness violation by starting with the highest value (3) and working down. That’s fine, but it’s not possible to generalize this logic to work with every possible update query. SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible. Another benefit of this optimization is that it includes a sort on the index key as part of its work. Processing the index changes in index key order promotes sequential I/O against the nonclustered index. A side-effect of all this is that the net changes might include one or more inserts. In order to insert a new row in the index, SQL Server obviously needs all the columns – the key column and the included LOB column. This is the reason SQL Server reads the LOB data as part of the Clustered Index Update. In addition, the some_value_OLD column is required by the Split operator (it turns updates into delete/insert pairs). In order to generate the correct index key delete operation, it needs the old key value. The irony is that in this case the Split/Sort/Collapse optimization is anything but. Reading all that LOB data is extremely expensive, so it is sad that the current version of SQL Server has no way to avoid it. Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates. Beating the Set-Based Update with a Cursor One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated. Armed with this knowledge, we can write a cursor (or the WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data: SET NOCOUNT ON; SET STATISTICS XML, IO, TIME OFF; DECLARE @PK INTEGER, @StartTime DATETIME; SET @StartTime = GETUTCDATE(); DECLARE curUpdate CURSOR LOCAL FORWARD_ONLY KEYSET SCROLL_LOCKS FOR SELECT L.pk FROM LOBtest L ORDER BY L.pk ASC; OPEN curUpdate; WHILE (1 = 1) BEGIN FETCH NEXT FROM curUpdate INTO @PK; IF @@FETCH_STATUS = -1 BREAK; IF @@FETCH_STATUS = -2 CONTINUE; UPDATE dbo.LOBtest SET some_value = some_value - 1 WHERE CURRENT OF curUpdate; END; CLOSE curUpdate; DEALLOCATE curUpdate; SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE()); That completes the update in 1280 milliseconds (remember test 3 took over 16 seconds!) I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it. One could just as well use a WHERE clause that specified the primary key value instead. Clustered Indexes A clustered index is the ultimate index with included columns: all non-key columns are included columns in a clustered index. Let’s re-create the test table and data with an updatable primary key, and without any non-clustered indexes: IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL DROP TABLE dbo.LOBtest; GO CREATE TABLE dbo.LOBtest ( pk INTEGER NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540); WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( pk, some_value, lob_data ) SELECT TOP (1000) N.n, N.n, @Data FROM Numbers N WHERE N.n <= 1000; Now here’s a query to modify the cluster keys: UPDATE dbo.LOBtest SET pk = pk + 1; The query plan is: As you can see, the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection. In addition, SQL Server now has no choice but to read the LOB data in the Clustered Index Scan: The performance is not great, as you might expect (even though there is no non-clustered index to maintain): Table 'LOBtest'. Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992. Table 'Worktable'. Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0, lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000. SQL Server Execution Times: CPU time = 483 ms, elapsed time = 17884 ms. Notice how the LOB data is read twice: once from the Clustered Index Scan, and again from the work table in tempdb used by the Eager Spool. If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the cluster key (including uniqueifier) around (no LOB data or other non-key columns): A unique non-clustered index (on a heap) works well too: Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. (In fact the heap is rather more efficient). There are lots more fun combinations to try that I don’t have space for here. Final Thoughts The behaviour shown in this post is not limited to LOB data by any means. If the conditions are met, any unique index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve covering queries, perhaps. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article
The Product Owner

- by Robert May

In a previous post, I outlined the rules of Scrum. This post details one of those rules. Picking a most important part of Scrum is difficult. All of the rules are required, but if there were one rule that is “more” required that every other rule, its having a good Product Owner. Simply put, the Product Owner can make or break the project. Duties of the Product Owner A Product Owner has many duties and responsibilities. I’ll talk about each of these duties in detail below. A Product Owner: Discovers and records stories for the backlog. Prioritizes stories in the Product Backlog, Release Backlog and Iteration Backlog. Determines Release dates and Iteration Dates. Develops story details and helps the team understand those details. Helps QA to develop acceptance tests. Interact with the Customer to make sure that the product is meeting the customer’s needs. Discovers and Records Stories for the Backlog When I do Scrum, I always use User Stories as the means for capturing functionality that’s required in the system. Some people will use Use Cases, but the same rule applies. The Product Owner has the ultimate responsibility for figuring out what functionality will be in the system. Many different mechanisms for capturing this input can be used. User interviews are great, but all sources should be considered, including talking with Customer Support types. Often, they hear what users are struggling with the most and are a great source for stories that can make the application easier to use. Care should be taken when soliciting user stories from technical types such as programmers and the people that manage them. They will almost always give stories that are very technical in nature and may not have a direct benefit for the end user. Stories are about adding value to the company. If the stories don’t have direct benefit to the end user, the Product Owner should question whether or not the story should be implemented. In general, technical stories should be included as tasks in User Stories. Technical stories are often needed, but the ultimate value to the user is in user based functionality, so technical stories should be considered nothing more than overhead in providing that user functionality. Until the iteration prior to development, stories should be nothing more than short, one line placeholders. An exercise called Story Planning can be used to brainstorm and come up with stories. I’ll save the description of this activity for another blog post. For more information on User Stories, please read the book User Stories Applied by Mike Cohn. Prioritizes Stories in the Product Backlog, Release Backlog and Iteration Backlog Prioritization of stories is one of the most difficult tasks that a Product Owner must do. A key concept of Scrum done right is the need to have the team working from a single set of prioritized stories. If the team does not have a single set of prioritized stories, Scrum will likely fail at your organization. The Product Owner is the ONLY person who has the responsibility to prioritize that list. The Product Owner must be very diplomatic and sincerely listen to the people around him so that he can get the priorities correct. Just listening will still not yield the proper priorities. Care must also be taken to ensure that Return on Investment is also considered. Ultimately, determining which stories give the most value to the company for the least cost is the most important factor in determining priorities. Product Owners should be willing to look at cold, hard numbers to determine the order for stories. Even when many people want a feature, if that features is costly to develop, it may not have as high of a return on investment as features that are cheaper, but not as popular. The act of prioritization often causes conflict in an environment. Customer Service thinks that feature X is the most important, because it will stop people from calling. Operations thinks that feature Y is the most important, because it will stop servers from crashing. Developers think that feature Z is most important because it will make writing software much easier for them. All of these are useful goals, but the team can have only one list of items, and each item must have a priority that is different from all other stories. The Product Owner will determine which feature gives the best return on investment and the other features will have to wait their turn, which means that someone will not have their top priority feature implemented first. A weak Product Owner will refuse to do prioritization. I’ve heard from multiple Product Owners the following phrase, “Well, it’s all got to be done, so what does it matter what order we do it in?” If your product owner is using this phrase, you need a new Product Owner. Order is VERY important. In Scrum, every release is potentially shippable. If the wrong priority items are developed, then the value added in each release isn’t what it should be. Additionally, the Product Owner with this mindset doesn’t understand Agile. A product is NEVER finished, until the company has decided that it is no longer a going concern and they are no longer going to sell the product. Therefore, prioritization isn’t an event, its something that continues every day. The logical extension of the phrase “It’s all got to be done” is that you will never ship your product, since a product is never “done.” Once stories have been prioritized, assigning them to the Release Backlog and the Iteration Backlog becomes relatively simple. The top priority items are copied into the respective backlogs in order and the task is complete. The team does have the right to shuffle things around a little in the iteration backlog. For example, they may determine that working on story C with story A is appropriate because they’re related, even though story B is technically a higher priority than story C. Or they may decide that story B is too big to complete in the time available after Story A has tasks created, so they’ll work on Story C since it’s smaller. They can’t, however, go deep into the backlog to pick stories to implement. The team and the Product Owner should work together to determine what’s best for the company. Prioritization is time consuming, but its one of the most important things a Product Owner does. Determines Release Dates and Iteration Dates Product owners are responsible for determining release dates for a product. A common misconception that Product Owners have is that every “release” needs to correspond with an actual release to customers. This is not the case. In general, releases should be no more than 3 months long. You may decide to release the product to the customers, and many companies do release the product to customers, but it may also be an internal release. If a release date is too far away, developers will fall into the trap of not feeling a sense of urgency. The date is far enough away that they don’t need to give the release their full attention. Additionally, important tasks, such as performance tuning, regression testing, user documentation, and release preparation, will not happen regularly, making them much more difficult and time consuming to do. The more frequently you do these tasks, the easier they are to accomplish. The Product Owner will be a key participant in determining whether or not a release should be sent out to the customers. The determination should be made on whether or not the features contained in the release are valuable enough and complete enough that the customers will see real value in the release. Often, some features will take more than three months to get them to a state where they qualify for a release or need additional supporting features to be released. The product owner has the right to make this determination. In addition to release dates, the Product Owner also will help determine iteration dates. In general, an iteration length should be chosen and the team should follow that iteration length for an extended period of time. If the iteration length is changed every iteration, you’re not doing Scrum. Iteration lengths help the team and company get into a rhythm of developing quality software. Iterations should be somewhere between 2 and 4 weeks in length. Any shorter, and significant software will likely not be developed. Any longer, and the team won’t feel urgency and planning will become very difficult. Iterations may not be extended during the iteration. Companies where Scrum isn’t really followed will often use this as a strategy to complete all stories. They don’t want to face the harsh reality of what their true performance is, and looking good is more important than seeking visibility and improving the process and team. Companies like this typically don’t allow failure. This is unhealthy. Failure is part of life and unless we learn from it, we can’t improve. I would much rather see a team push out stories to the next iteration and then have healthy discussions about why they failed rather than extend the iteration and not deal with the core problems. If iteration length varies, retrospectives become more difficult. For example, evaluating the performance of the team’s estimation efforts becomes much more difficult if the iteration length varies. Also, the team must have a velocity measurement. If the iteration length varies, measuring velocity becomes impossible and upper management no longer will have the ability to evaluate the teams performance. People external to the team will no longer have the ability to determine when key features are likely to be developed. Variable iterations cause the entire company to fail and likely cause Scrum to fail at an organization. Develops Story Details and Helps the Team Understand Those Details A key concept in Scrum is that the stories are nothing more than a placeholder for a conversation. Stories should be nothing more than short, one line statements about the functionality. The team will then converse with the Product Owner about the details about that story. The product owner needs to have a very good idea about what the details of the story are and needs to be able to help the team understand those details. Too often, we see this requirement as being translated into the need for comprehensive documentation about the story, including old fashioned requirements documentation. The team should only develop the documentation that is required and should not develop documentation that is only created because their is a process to do so. In general, what we see that works best is the iteration before a team starts development work on a story, the Product Owner, with other appropriate business analysts, will develop the details of that story. They’ll figure out what business rules are required, potentially make paper prototypes or other light weight mock-ups, and they seek to understand the story and what is implied. Note that the time allowed for this task is deliberately short. The Product Owner only has a single iteration to develop all of the stories for the next iteration. If more than one iteration is used, I’ve found that teams will end up with Big Design Up Front and traditional requirements documents. This is a waste of time, since the team will need to then have discussions with the Product Owner to figure out what the requirements document says. Instead of this, skip making the pretty pictures and detailing the nuances of the requirements and build only what is minimally needed by the team to do development. If something comes up during development, you can address it at that time and figure out what you want to do. The goal is to keep things as light weight as possible so that everyone can move as quickly as possible. Helps QA to Develop Acceptance Tests In Scrum, no story can be counted until it is accepted by QA. Because of this, acceptance tests are very important to the team. In general, acceptance tests need to be developed prior to the iteration or at the very beginning of the iteration so that the team can make sure that the tasks that they develop will fulfill the acceptance criteria. The Product Owner will help the team, including QA, understand what will make the story acceptable. Note that the Product Owner needs to be careful about specifying that the feature will work “Perfectly” at the end of the iteration. In general, features are developed a little bit at a time, so only the bit that is being developed should be considered as necessary for acceptance. A weak Product Owner will make statements like “Do it right the first time.” Not only are these statements damaging to the team (like they would try to do it WRONG the first time . . .), they’re also ignoring the iterative nature of Scrum. Additionally, a weak product owner will seek to add scope in the acceptance testing. For example, they will refuse to determine acceptance at the beginning of the iteration, and then, after the team has planned and committed to the iteration, they will expand scope by defining acceptance. This often causes the team to miss the iteration because scope that wasn’t planned on is included. There are ways that the team can mitigate this problem. For example, include extra “Product Owner” time to deal with the uncertainty that you know will be introduced by the Product Owner. This will slow the perceived velocity of the team and is not ideal, since they’ll be doing more work than they get credit for. Interact with the Customer to Make Sure that the Product is Meeting the Customer’s Needs Once development is complete, what the team has worked on should be put in front of real live people to see if it meets the needs of the customer. One of the great things about Agile is that if something doesn’t work, we can revisit it in a future iteration! This frees up the team to make the best decision now and know that if that decision proves to be incorrect, the team can revisit it and change that decision. Features are about adding value to the customer, so if the customer doesn’t find them useful, then having the team make tweaks is valuable. In general, most software will be 80 to 90 percent “right” after the initial round and only minor tweaks are required. If proper coding standards are followed, these tweaks are usually minor and easy to accomplish. Product Owners that are doing a good job will encourage real users to see and use the software, since they know that they are trying to add value to the customer. Poor product owners will think that they know the answers already, that their customers are silly and do stupid things and that they don’t need customer input. If you have a product owner that is afraid to show the team’s work to real customers, you probably need a different product owner. Up Next, “Who Makes a Good Product Owner.” Followed by, “Messing with the Team.” Technorati Tags: Scrum,Product Owner

Read the article
You should NOT be writing jQuery in SharePoint if…

- by Mark Rackley

Yes… another one of these posts. What can I say? I’m a pot stirrer.. a rabble rouser *rabble rabble* jQuery in SharePoint seems to be a fairly polarizing issue with one side thinking it is the most awesome thing since Princess Leia as the slave girl in Return of the Jedi and the other half thinking it is the worst idea since Mannequin 2: On the Move. The correct answer is OF COURSE “it depends”. But what are those deciding factors that make jQuery an awesome fit or leave a bad taste in your mouth? Let’s see if I can drive the discussion here with some polarizing comments of my own… I know some of you are getting ready to leave your comments even now before reading the rest of the blog, which is great! Iron sharpens iron… These discussions hopefully open us up to understanding the entire process better and think about things in a different way. You should not be writing jQuery in SharePoint if you are not a developer… Let’s start off with my most polarizing and rant filled portion of the blog post. If you don’t know what you are doing or you don’t have a background that helps you understand the implications of what you are writing then you should not be writing jQuery in SharePoint! I truly believe that one of the biggest reasons for the jQuery haters is because of all the bad jQuery out there. If you don’t know what you are doing you can do some NASTY things! One of the best stories I’ve heard about this is from my good friend John Ferringer (@ferringer). John tells this story during our Mythbusters session we do together. One of his clients was undergoing a Denial of Service attack and they couldn’t figure out what was going on! After much searching they found that some genius jQuery developer wrote some code for an image rotator, but did not take into account what happens when there are no images to load! The code just kept hitting the servers over and over and over again which prevented anything else from getting done! Now, I’m NOT saying that I have not done the same sort of thing in the past or am immune from such mistakes. My point is that if you don’t know what you are doing, there are very REAL consequences that can have a major impact on your organization AND they will be hard to track down. Think how happy your boss will be after you copy and pasted some jQuery from a blog without understanding what it does, it brings down the farm, AND it takes them 3 days to track it back to you. :/ Good times will not be had. Like it or not JavaScript/jQuery is a programming language. While you .NET people sit on your high horses because your code is compiled and “runs faster” (also debatable), the rest of us will be actually getting work done and delivering solutions while you are trying to figure out why your widget won’t deploy. I can pick at that scab because I write .NET code too and speak from experience. I can do both, and do both well. So, I am not speaking from ignorance here. In JavaScript/jQuery you have variables, loops, conditionals, functions, arrays, events, and built in methods. If you are not a developer you just aren’t going to take advantage of all of that and use it correctly. Ahhh.. but there is hope! There is a lot of jQuery resources out there to help you learn and learn well! There are many experts on the subject that will gladly tell you when you are smoking crack. I just this minute saw a tweet from @cquick with a link to: “jQuery Fundamentals”. I just glanced through it and this may be a great primer for you aspiring jQuery devs. Take advantage of all the resources and become a developer! Hey, it will look awesome on your resume right? You should not be writing jQuery in SharePoint if it depends too much on client resources for a good user experience I’ve said it once and I’ll say it over and over until you understand. jQuery is executed on the client’s computer. Got it? If you are looping through hundreds of rows of data, searching through an enormous DOM, or performing many calculations it is going to take some time! AND if your user happens to be sitting on some old PC somewhere that they picked up at a garage sale their experience will be that much worse! If you can’t give the user a good experience they will not use the site. So, if jQuery is causing the user to have a bad experience, don’t use it. I sometimes go as far to say that you should NOT go to jQuery as a first option for external facing web sites because you have ZERO control over what the end user’s computer will be. You just can’t guarantee an awesome user experience all of the time. Ahhh… but you have no choice? (where have I heard that before?). Well… if you really have no choice, here are some tips to help improve the experience: Avoid screen scraping This is not 1999 and SharePoint is not an old green screen from a mainframe… so why are you treating it like it is? Screen scraping is time consuming and client intensive. Take advantage of tools like SPServices to do your data retrieval when possible. Fine tune your DOM searches A lot of time can be eaten up just searching the DOM and ignoring table rows that you don’t need. Write better jQuery to only loop through tables rows that you need, or only access specific elements you need. Take advantage of Element ID’s to return the one element you are looking for instead of looping through all the DOM over and over again. Write better jQuery Remember this is development. Think about how you can write cleaner, faster jQuery. This directly relates to the previous point of improving your DOM searches, but also when using arrays, variables and loops. Do you REALLY need to loop through that array 3 times? How can you knock it down to 2 times or even 1? When you have lots of calculations and data that you are manipulating every operation adds up. Think about how you can streamline it. Back in the old days before RAM was abundant, Cores were plentiful and dinosaurs roamed the earth, us developers had to take performance into account in everything we did. It’s a lost art that really needs to be used here. You should not be writing jQuery in SharePoint if you are sending a lot of data over the wire… Developer: “Awesome… you can easily call SharePoint’s web services to retrieve and write data using SPServices!” Administrator: “Crap! you can easily call SharePoint’s web services to retrieve and write data using SPServices!” SPServices may indeed be the best thing that happened to SharePoint since the invention of SharePoint Saturdays by Godfather Lotter… BUT you HAVE to use it wisely! (I REFUSE to make the Spiderman reference). If you do not know what you are doing your code will bring back EVERY field and EVERY row from a list and push that over the internet with all that lovely XML wrapped around it. That can be a HUGE amount of data and will GREATLY impact performance! Calling several web service methods at the same time can cause the same problem and can negatively impact your SharePoint servers. These problems, thankfully, are not difficult to rectify if you are careful: Limit list data retrieved Use CAML to reduce the number of rows returned and limit the fields returned using ViewFields. You should definitely be doing this regardless. If you aren’t I hope your admin thumps you upside the head. Batch large list updates You may or may not have noticed that if you try to do large updates (hundreds of rows) that the performance is either completely abysmal or it fails over half the time. You can greatly improve performance and avoid timeouts by breaking up your updates into several smaller updates. I don’t know if there is a magic number for best performance, it really depends on how much data you are sending back more than the number of rows. However, I have found that 200 rows generally works well. Play around and find the right number for your situation. Delay Web Service calls when possible One of the cool things about jQuery and SPServices is that you can delay queries to the server until they are actually needed instead of doing them all at once. This can lead to performance improvements over DataViewWebParts and even .NET code in the right situations. So, don’t load the data until it’s needed. In some instances you may not need to retrieve the data at all, so why retrieve it ALL the time? You should not be writing jQuery in SharePoint if there is a better solution… jQuery is NOT the silver bullet in SharePoint, it is not the answer to every question, it is just another tool in the developers toolkit. I urge all developers to know what options exist out there and choose the right one! Sometimes it will be jQuery, sometimes it will be .NET, sometimes it will be XSL, and sometimes it will be some other choice… So, when is there a better solution to jQuery? When you can’t get away from performance problems Sometimes jQuery will just give you horrible performance regardless of what you do because of unavoidable obstacles. In these situations you are going to have to figure out an alternative. Can I do it with a DVWP or do I have to crack open Visual Studio? When you need to do something that jQuery can’t do There are lots of things you can’t do in jQuery like elevate privileges, event handlers, workflows, or interact with back end systems that have no web service interface. It just can’t do everything. When it can be done faster and more efficiently another way Why are you spending time to write jQuery to do a DataViewWebPart that would take 5 minutes? Or why are you trying to implement complicated logic that would be simple to do in .NET? If your answer is that you don’t have the option, okay. BUT if you do have the option don’t reinvent the wheel! Take advantage of the other tools. The answer is not always jQuery… sorry… the kool-aid tastes good, but sweet tea is pretty awesome too. You should not be using jQuery in SharePoint if you are a moron… Let’s finish up the blog on a high note… Yes.. it’s true, I sometimes type things just to get a reaction… guess this section title might be a good example, but it feels good sometimes just to type the words that a lot of us think… So.. don’t be that guy! Another good buddy of mine that works for Microsoft told me. “I loved jQuery in SharePoint…. until I had to support it.”. He went on to explain that some user was making several web service calls on a page using jQuery and then was calling Microsoft and COMPLAINING because the page took so long to load… DUH! What do you expect to happen when you are pushing that much data over the wire and are making that many web service calls at once!! It’s one thing to write that kind of code and accept it’s just going to take a while, it’s COMPLETELY another issue to do that and then complain when it’s not lightning fast! Someone’s gene pool needs some chlorine. So, I think this is a nice summary of the blog… DON’T be that guy… don’t be a moron. How can you stop yourself from being a moron? Ah.. glad you asked, here are some tips: Think Is jQuery the right solution to my problem? Is there a better approach? What are the implications and pitfalls of using jQuery in this situation? Search What are others doing? Does someone have a better solution? Is there a third party library that does the same thing I need? Plan Write good jQuery. Limit calculations and data sent over the wire and don’t reinvent the wheel when possible. Test Okay, it works well on your machine. Try it on others ESPECIALLY if this is for an external site. Test with empty data. Test with hundreds of rows of data. Test as many scenarios as possible. Monitor those server resources to see the impact there as well. Ask the experts As smart as you are, there are people smarter than you. Even the experts talk to each other to make sure they aren't doing something stupid. And for the MOST part they are pretty nice guys. Marc Anderson and Christophe Humbert are two guys who regularly keep me in line. Make sure you aren’t doing something stupid. Repeat So, when you think you have the best solution possible, repeat the steps above just to be safe. Conclusion jQuery is an awesome tool and has come in handy on many occasions. I’m even teaching a 1/2 day SharePoint & jQuery workshop at the upcoming SPTechCon in Boston if you want to berate me in person. However, it’s only as awesome as the developer behind the keyboard. It IS development and has its pitfalls. Knowledge and experience are invaluable to giving the user the best experience possible. Let’s face it, in the end, no matter our opinions, prejudices, or ego providing our clients, customers, and users with the best solution possible is what counts. Period… end of sentence…

Read the article
VS 2012 Code Review – Before Check In OR After Check In?

- by Tarun Arora

“Is Code Review Important and Effective?” There is a consensus across the industry that code review is an effective and practical way to collar code inconsistency and possible defects early in the software development life cycle. Among others some of the advantages of code reviews are, Bugs are found faster Forces developers to write readable code (code that can be read without explanation or introduction!) Optimization methods/tricks/productive programs spread faster Programmers as specialists "evolve" faster It's fun “Code review is systematic examination (often known as peer review) of computer source code. It is intended to find and fix mistakes overlooked in the initial development phase, improving both the overall quality of software and the developers' skills. Reviews are done in various forms such as pair programming, informal walkthroughs, and formal inspections.” Wikipedia No where does the definition mention whether its better to review code before the code has been committed to version control or after the commit has been performed. No matter which side you favour, Visual Studio 2012 allows you to request for a code review both before check in and also request for a review after check in. Let’s weigh the pros and cons of the approaches independently. Code Review Before Check In or Code Review After Check In? Approach 1 – Code Review before Check in Developer completes the code and feels the code quality is appropriate for check in to TFS. The developer raises a code review request to have a second pair of eyes validate if the code abides to the recommended best practices, will not result in any defects due to common coding mistakes and whether any optimizations can be made to improve the code quality. Image 1 – code review before check in Pros Everything that gets committed to source control is reviewed. Minimizes the chances of smelly code making its way into the code base. Decreases the cost of fixing bugs, remember, the earlier you find them, the lesser the pain in fixing them. Cons Development Code Freeze – Since the changes aren’t in the source control yet. Further development can only be done off-line. The changes have not been through a CI build, hard to say whether the code abides to all build quality standards. Inconsistent! Cumbersome to track the actual code review process. Not every change to the code base is worth reviewing, a lot of effort is invested for very little gain. Approach 2 – Code Review after Check in Developer checks in, random code reviews are performed on the checked in code. Image 2 – Code review after check in Pros The code has already passed the CI build and run through any code analysis plug ins you may have running on the build server. Instruct the developer to ensure ZERO fx cop, style cop and static code analysis before check in. Code is cleaner and smell free even before the code review. No Offline development, developers can continue to develop against the source control. Cons Bad code can easily make its way into the code base. Since the review take place much later in the cycle, the cost of fixing issues can prove to be much higher. Approach 3 – Hybrid Approach The community advocates a more hybrid approach, a blend of tooling and human accountability quotient. Image 3 – Hybrid Approach 1. Code review high impact check ins. It is not possible to review everything, by setting up code review check in policies you can end up slowing your team. More over, the code that you are reviewing before check in hasn't even been through a green CI build either. 2. Tooling. Let the tooling work for you. By running static analysis, fx cop, style cop and other plug ins on the build agent, you can identify the real issues that in my opinion can't possibly be identified using human reviews. Configure the tooling to report back top 10 issues every day. Mandate the manual code review of individuals who keep making it to this list of shame more often. 3. During Merge. I would prefer eliminating some of the other code issues during merge from Main branch to the release branch. In a scrum project this is still easier because cheery picking the merges is a possibility and the size of code being reviewed is still limited. Let the tooling work for you, if some one breaks the CI build often, put them on a gated check in build course until you see improvement. If some one appears on the top 10 list of shame generated via the build then ensure that all their code is reviewed till you see improvement. At the end of the day, the goal is to ensure that the code being delivered is top quality. By enforcing a code review before any check in, you force the developer to work offline or stay put till the review is complete. What do the experts say? So I asked a few expects what they thought of “Code Review quality gate before Checking in code?" Terje Sandstrom | Microsoft ALM MVP You mean a review quality gate BEFORE checking in code????? That would mean a lot of code staying either local or in shelvesets, and not even been through a CI build, and a green CI build being the main criteria for going further, f.e. to the review state. I would not like code laying around with no checkin’s. Having a requirement that code is checked in small pieces, 4-8 hours work max, and AT LEAST daily checkins, a manual code review comes second down the lane. I would expect review quality gates to happen before merging back to main, or before merging to release. But that would all be on checked-in code. Branching is absolutely one way to ease the pain. Another way we are using is automatic quality builds, running metrics, coverage, static code analysis. Unfortunately it takes some time, would be great to be on CI’s – but…., so it’s done scheduled every night. Based on this we get, among other stuff, top 10 lists of suspicious code, which is then subjected to reviews. If a person seems to be very popular on these top 10 lists, we subject every check in from that person to a review for a period. That normally helps. None of the clients I have can afford to have every checkin reviewed, so we need to find ways around it. I don’t disagree with the nicety of having all the code reviewed, but I find it hard to find those resources in today’s enterprises. David V. Corbin | Visual Studio ALM Ranger I tend to agree with both sides. I hate having code that is not checked in, but at the same time hate having “bad” code in the repository. I have found that branching is one approach to solving this dilemma. Code is checked into the private/feature branch before the review, but is not merged over to the “official” branch until after the review. I advocate both, depending on circumstance (especially team dynamics) - The “pre-checkin” is usually for elements that may impact the project as a whole. Think of it as another “gate” along with passing unit tests. - The “post-checkin” may very well not be at the changeset level, but correlates to a review at the “user story” level. Again, this depends on team dynamics in play…. Robert MacLean | Microsoft ALM MVP I do not think there is no right answer for the industry as a whole. In short the question is why do you do reviews? Your question implies risk mitigation, so in low risk areas you can get away with it after check in while in high risk you need to do it before check in. An example is those new to a team or juniors need it much earlier (maybe that is before checkin, maybe that is soon after) than seniors who have shipped twenty sprints on the team. Abhimanyu Singhal | Visual Studio ALM Ranger Depends on per scenario basis. We recommend post check-in reviews when: 1. We don't want to block other checks and processes on manual code reviews. Manual reviews take time, and some pieces may not require manual reviews at all. 2. We need to trace all changes and track history. 3. We have a code promotion strategy/process in place. For risk mitigation, post checkin code can be promoted to Accepted branches. Or can be rejected. Pre Checkin Reviews are used when 1. There is a high risk factor associated 2. Reviewers are generally (most of times) have immediate availability. 3. Team does not have strict tracking needs. Simply speaking, no single process fits all scenarios. You need to select what works best for your team/project. Thomas Schissler | Visual Studio ALM Ranger This is an interesting discussion, I’m right now discussing details about executing code reviews with my teams. I see and understand the aspects you brought in, but there is another side as well, I’d like to point out. 1.) If you do reviews per check in this is not very practical as a hard rule because this will disturb the flow of the team very often or it will lead to reduce the checkin frequency of the devs which I would not accept. 2.) If you do later reviews, for example if you review PBIs, it is not easy to find out which code you should review. Either you review all changesets associate with the PBI, but then you might review code which has been changed with a later checkin and the dev maybe has already fixed the issue. Or you review the diff of the latest changeset of the PBI with the first but then you might also review changes of other PBIs. Jakob Leander | Sr. Director, Avanade In my experience, manual code review: 1. Does not get done and at the very least does not get redone after changes (regardless of intentions at start of project) 2. When a project actually do it, they often do not do it right away = errors pile up 3. Requires a lot of time discussing/defining the standard and for the team to learn it However code review is very important since e.g. even small memory leaks in a high volume web solution have big consequences In the last years I have advocated following approach for code review - Architects up front do “at least one best practice example” of each type of component and tell the team. Copy from this one. This should include error handling, logging, security etc. - Dev lead on project continuously browse code to validate that the best practices are used. Especially that patterns etc. are not broken. You can do this formally after each sprint/iteration if you want. Once this is validated it is unlikely to “go bad” even during later code changes Agree with customer to rely on static code analysis from Visual Studio as the one and only coding standard. This has HUUGE benefits - You can easily tweak to reach the level you desire together with customer - It is easy to measure for both developers/management - It is 100% consistent across code base - It gets validated all the time so you never end up getting hammered by a customer review in the end - It is easy to tell the developer that you do not want code back unless it has zero errors = minimize communication You need to track this at least during nightly builds and make sure team sees total # issues. Do not allow #issues it to grow uncontrolled. On the project I run I require code analysis to have run on code before checkin (checkin rule). This means - You have to have clean compile (or CA wont run) so this is extra benefit = very few broken builds - You can change a few of the rules to compile as errors instead of warnings. I often do this for “missing dispose” issues which you REALLY do not want in your app Tip: Place your custom CA rules files as part of solution. That way it works when you do branching etc. (path to CA file is relative in VS) Some may argue that CA is not as good as manual inspection. But since manual inspection in reality suffers from the 3 issues in start it is IMO a MUCH better (and much cheaper) approach from helicopter perspective Tirthankar Dutta | Director, Avanade I think code review should be run both before and after check ins. There are some code metrics that are meant to be run on the entire codebase … Also, especially on multi-site projects, one should strive to architect in a way that lets men manage the framework while boys write the repetitive code… scales very well with the need to review less by containment and imposing architectural restrictions to emphasise the design. Bruno Capuano | Microsoft ALM MVP For code reviews (means peer reviews) in distributed team I use http://www.vsanywhere.com/default.aspx David Jobling | Global Sr. Director, Avanade Peer review is the only way to scale and its a great practice for all in the team to learn to perform and accept. In my experience you soon learn who's code to watch more than others and tune the attention. Mikkel Toudal Kristiansen | Manager, Avanade If you have several branches in your code base, you will need to merge often. This requires manual merging, when a file has been changed in both branches. It offers a good opportunity to actually review to changed code. So my advice is: Merging between branches should be done as often as possible, it should be done by a senior developer, and he/she should perform a full code review of the code being merged. As for detecting architectural smells and code smells creeping into the code base, one really good third party tools exist: Ndepend (http://www.ndepend.com/, for static code analysis of the current state of the code base). You could also consider adding StyleCop to the solution. Jesse Houwing | Visual Studio ALM Ranger I gave a presentation on this subject on the TechDays conference in NL last year. See my presentation and slides here (talk in Dutch, but English presentation): http://blog.jessehouwing.nl/2012/03/did-you-miss-my-techdaysnl-talk-on-code.html I’d like to add a few more points: - Before/After checking is mostly a trust issue. If you have a team that does diligent peer reviews and regularly talk/sit together or peer review, there’s no need to enforce a before-checkin policy. The peer peer-programming and regular feedback during development can take care of most of the review requirements as long as the team isn’t under stress. - Under stress, enforce pre-checkin reviews, it might sound strange, if you’re already under time or budgetary constraints, but it is under such conditions most real issues start to be created or pile up. - Use tools to catch most common errors, Code Analysis/FxCop was already mentioned. HP Fortify, Resharper, Coderush etc can help you there. There are also a lot of 3rd party rules you can add to Code Analysis. I’ve written a few myself (http://fccopcontrib.codeplex.com) and various teams from Microsoft have added their own rules (MSOCAF for SharePoint, WSSF for WCF). For common errors that keep cropping up, see if you can define a rule. It’s much easier. But more importantly make sure you have a good help page explaining *WHY* it's wrong. If you have small feature or developer branches/shelvesets, you might want to review pre-merge. It’s still better to do peer reviews and peer programming, but the most important thing is that bad quality code doesn’t make it into the important branch. So my philosophy: - Use tooling as much as possible. - Make sure the team understands the tooling and the importance of the things it flags. It’s too easy to just click suppress all to ignore the warnings. - Under stress, tighten process, it’s under stress that the problems of late reviews will really surface - Most importantly if you do reviews do them as early as possible, but never later than needed. In other words, pre-checkin/post checking doesn’t really matter, as long as the review is done before the code is released. It’ll just be much more expensive to fix any review outcomes the later you find them. --- I would love to hear what you think!

Read the article
A Communication System for XAML Applications

- by psheriff

In any application, you want to keep the coupling between any two or more objects as loose as possible. Coupling happens when one class contains a property that is used in another class, or uses another class in one of its methods. If you have this situation, then this is called strong or tight coupling. One popular design pattern to help with keeping objects loosely coupled is called the Mediator design pattern. The basics of this pattern are very simple; avoid one object directly talking to another object, and instead use another class to mediate between the two. As with most of my blog posts, the purpose is to introduce you to a simple approach to using a message broker, not all of the fine details. IPDSAMessageBroker Interface As with most implementations of a design pattern, you typically start with an interface or an abstract base class. In this particular instance, an Interface will work just fine. The interface for our Message Broker class just contains a single method “SendMessage” and one event “MessageReceived”. public delegate void MessageReceivedEventHandler( object sender, PDSAMessageBrokerEventArgs e); public interface IPDSAMessageBroker{ void SendMessage(PDSAMessageBrokerMessage msg); event MessageReceivedEventHandler MessageReceived;} PDSAMessageBrokerMessage Class As you can see in the interface, the SendMessage method requires a type of PDSAMessageBrokerMessage to be passed to it. This class simply has a MessageName which is a ‘string’ type and a MessageBody property which is of the type ‘object’ so you can pass whatever you want in the body. You might pass a string in the body, or a complete Customer object. The MessageName property will help the receiver of the message know what is in the MessageBody property. public class PDSAMessageBrokerMessage{ public PDSAMessageBrokerMessage() { } public PDSAMessageBrokerMessage(string name, object body) { MessageName = name; MessageBody = body; } public string MessageName { get; set; } public object MessageBody { get; set; }} PDSAMessageBrokerEventArgs Class As our message broker class will be raising an event that others can respond to, it is a good idea to create your own event argument class. This class will inherit from the System.EventArgs class and add a couple of additional properties. The properties are the MessageName and Message. The MessageName property is simply a string value. The Message property is a type of a PDSAMessageBrokerMessage class. public class PDSAMessageBrokerEventArgs : EventArgs{ public PDSAMessageBrokerEventArgs() { } public PDSAMessageBrokerEventArgs(string name, PDSAMessageBrokerMessage msg) { MessageName = name; Message = msg; } public string MessageName { get; set; } public PDSAMessageBrokerMessage Message { get; set; }} PDSAMessageBroker Class Now that you have an interface class and a class to pass a message through an event, it is time to create your actual PDSAMessageBroker class. This class implements the SendMessage method and will also create the event handler for the delegate created in your Interface. public class PDSAMessageBroker : IPDSAMessageBroker{ public void SendMessage(PDSAMessageBrokerMessage msg) { PDSAMessageBrokerEventArgs args; args = new PDSAMessageBrokerEventArgs( msg.MessageName, msg); RaiseMessageReceived(args); } public event MessageReceivedEventHandler MessageReceived; protected void RaiseMessageReceived( PDSAMessageBrokerEventArgs e) { if (null != MessageReceived) MessageReceived(this, e); }} The SendMessage method will take a PDSAMessageBrokerMessage object as an argument. It then creates an instance of a PDSAMessageBrokerEventArgs class, passing to the constructor two items: the MessageName from the PDSAMessageBrokerMessage object and also the object itself. It may seem a little redundant to pass in the message name when that same message name is part of the message, but it does make consuming the event and checking for the message name a little cleaner – as you will see in the next section. Create a Global Message Broker In your WPF application, create an instance of this message broker class in the App class located in the App.xaml file. Create a public property in the App class and create a new instance of that class in the OnStartUp event procedure as shown in the following code: public partial class App : Application{ public PDSAMessageBroker MessageBroker { get; set; } protected override void OnStartup(StartupEventArgs e) { base.OnStartup(e); MessageBroker = new PDSAMessageBroker(); }} Sending and Receiving Messages Let’s assume you have a user control that you load into a control on your main window and you want to send a message from that user control to the main window. You might have the main window display a message box, or put a string into a status bar as shown in Figure 1. Figure 1: The main window can receive and send messages The first thing you do in the main window is to hook up an event procedure to the MessageReceived event of the global message broker. This is done in the constructor of the main window: public MainWindow(){ InitializeComponent(); (Application.Current as App).MessageBroker. MessageReceived += new MessageReceivedEventHandler( MessageBroker_MessageReceived);} One piece of code you might not be familiar with is accessing a property defined in the App class of your XAML application. Within the App.Xaml file is a class named App that inherits from the Application object. You access the global instance of this App class by using Application.Current. You cast Application.Current to ‘App’ prior to accessing any of the public properties or methods you defined in the App class. Thus, the code (Application.Current as App).MessageBroker, allows you to get at the MessageBroker property defined in the App class. In the MessageReceived event procedure in the main window (shown below) you can now check to see if the MessageName property of the PDSAMessageBrokerEventArgs is equal to “StatusBar” and if it is, then display the message body into the status bar text block control. void MessageBroker_MessageReceived(object sender, PDSAMessageBrokerEventArgs e){ switch (e.MessageName) { case "StatusBar": tbStatus.Text = e.Message.MessageBody.ToString(); break; }} In the Page 1 user control’s Loaded event procedure you will send the message “StatusBar” through the global message broker to any listener using the following code: private void UserControl_Loaded(object sender, RoutedEventArgs e){ // Send Status Message (Application.Current as App).MessageBroker. SendMessage(new PDSAMessageBrokerMessage("StatusBar", "This is Page 1"));} Since the main window is listening for the message ‘StatusBar’, it will display the value “This is Page 1” in the status bar at the bottom of the main window. Sending a Message to a User Control The previous example sent a message from the user control to the main window. You can also send messages from the main window to any listener as well. Remember that the global message broker is really just a broadcaster to anyone who has hooked into the MessageReceived event. In the constructor of the user control named ucPage1 you can hook into the global message broker’s MessageReceived event. You can then listen for any messages that are sent to this control by using a similar switch-case structure like that in the main window. public ucPage1(){ InitializeComponent(); // Hook to the Global Message Broker (Application.Current as App).MessageBroker. MessageReceived += new MessageReceivedEventHandler( MessageBroker_MessageReceived);} void MessageBroker_MessageReceived(object sender, PDSAMessageBrokerEventArgs e){ // Look for messages intended for Page 1 switch (e.MessageName) { case "ForPage1": MessageBox.Show(e.Message.MessageBody.ToString()); break; }} Once the ucPage1 user control has been loaded into the main window you can then send a message using the following code: private void btnSendToPage1_Click(object sender, RoutedEventArgs e){ PDSAMessageBrokerMessage arg = new PDSAMessageBrokerMessage(); arg.MessageName = "ForPage1"; arg.MessageBody = "Message For Page 1"; // Send a message to Page 1 (Application.Current as App).MessageBroker.SendMessage(arg);} Since the MessageName matches what is in the ucPage1 MessageReceived event procedure, ucPage1 can do anything in response to that event. It is important to note that when the message gets sent it is sent to all MessageReceived event procedures, not just the one that is looking for a message called “ForPage1”. If the user control ucPage1 is not loaded and this message is broadcast, but no other code is listening for it, then it is simply ignored. Remove Event Handler In each class where you add an event handler to the MessageReceived event you need to make sure to remove those event handlers when you are done. Failure to do so can cause a strong reference to the class and thus not allow that object to be garbage collected. In each of your user control’s make sure in the Unloaded event to remove the event handler. private void UserControl_Unloaded(object sender, RoutedEventArgs e){ if (_MessageBroker != null) _MessageBroker.MessageReceived -= _MessageBroker_MessageReceived;} Problems with Message Brokering As with most “global” classes or classes that hook up events to other classes, garbage collection is something you need to consider. Just the simple act of hooking up an event procedure to a global event handler creates a reference between your user control and the message broker in the App class. This means that even when your user control is removed from your UI, the class will still be in memory because of the reference to the message broker. This can cause messages to still being handled even though the UI is not being displayed. It is up to you to make sure you remove those event handlers as discussed in the previous section. If you don’t, then the garbage collector cannot release those objects. Instead of using events to send messages from one object to another you might consider registering your objects with a central message broker. This message broker now becomes a collection class into which you pass an object and what messages that object wishes to receive. You do end up with the same problem however. You have to un-register your objects; otherwise they still stay in memory. To alleviate this problem you can look into using the WeakReference class as a method to store your objects so they can be garbage collected if need be. Discussing Weak References is beyond the scope of this post, but you can look this up on the web. Summary In this blog post you learned how to create a simple message broker system that will allow you to send messages from one object to another without having to reference objects directly. This does reduce the coupling between objects in your application. You do need to remember to get rid of any event handlers prior to your objects going out of scope or you run the risk of having memory leaks and events being called even though you can no longer access the object that is responding to that event. NOTE: You can download the sample code for this article by visiting my website at http://www.pdsa.com/downloads. Select “Tips & Tricks”, then select “A Communication System for XAML Applications” from the drop down list.

Read the article
Seeking on a Heap, and Two Useful DMVs

- by Paul White

So far in this mini-series on seeks and scans, we have seen that a simple ‘seek’ operation can be much more complex than it first appears. A seek can contain one or more seek predicates – each of which can either identify at most one row in a unique index (a singleton lookup) or a range of values (a range scan). When looking at a query plan, we will often need to look at the details of the seek operator in the Properties window to see how many operations it is performing, and what type of operation each one is. As you saw in the first post in this series, the number of hidden seeking operations can have an appreciable impact on performance. Measuring Seeks and Scans I mentioned in my last post that there is no way to tell from a graphical query plan whether you are seeing a singleton lookup or a range scan. You can work it out – if you happen to know that the index is defined as unique and the seek predicate is an equality comparison, but there’s no separate property that says ‘singleton lookup’ or ‘range scan’. This is a shame, and if I had my way, the query plan would show different icons for range scans and singleton lookups – perhaps also indicating whether the operation was one or more of those operations underneath the covers. In light of all that, you might be wondering if there is another way to measure how many seeks of either type are occurring in your system, or for a particular query. As is often the case, the answer is yes – we can use a couple of dynamic management views (DMVs): sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats. Index Usage Stats The index usage stats DMV contains counts of index operations from the perspective of the Query Executor (QE) – the SQL Server component that is responsible for executing the query plan. It has three columns that are of particular interest to us: user_seeks – the number of times an Index Seek operator appears in an executed plan user_scans – the number of times a Table Scan or Index Scan operator appears in an executed plan user_lookups – the number of times an RID or Key Lookup operator appears in an executed plan An operator is counted once per execution (generating an estimated plan does not affect the totals), so an Index Seek that executes 10,000 times in a single plan execution adds 1 to the count of user seeks. Even less intuitively, an operator is also counted once per execution even if it is not executed at all. I will show you a demonstration of each of these things later in this post. Index Operational Stats The index operational stats DMV contains counts of index and table operations from the perspective of the Storage Engine (SE). It contains a wealth of interesting information, but the two columns of interest to us right now are: range_scan_count – the number of range scans (including unrestricted full scans) on a heap or index structure singleton_lookup_count – the number of singleton lookups in a heap or index structure This DMV counts each SE operation, so 10,000 singleton lookups will add 10,000 to the singleton lookup count column, and a table scan that is executed 5 times will add 5 to the range scan count. The Test Rig To explore the behaviour of seeks and scans in detail, we will need to create a test environment. The scripts presented here are best run on SQL Server 2008 Developer Edition, but the majority of the tests will work just fine on SQL Server 2005. A couple of tests use partitioning, but these will be skipped if you are not running an Enterprise-equivalent SKU. Ok, first up we need a database: USE master; GO IF DB_ID('ScansAndSeeks') IS NOT NULL DROP DATABASE ScansAndSeeks; GO CREATE DATABASE ScansAndSeeks; GO USE ScansAndSeeks; GO ALTER DATABASE ScansAndSeeks SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE ScansAndSeeks SET AUTO_CLOSE OFF, AUTO_SHRINK OFF, AUTO_CREATE_STATISTICS OFF, AUTO_UPDATE_STATISTICS OFF, PARAMETERIZATION SIMPLE, READ_COMMITTED_SNAPSHOT OFF, RESTRICTED_USER ; Notice that several database options are set in particular ways to ensure we get meaningful and reproducible results from the DMVs. In particular, the options to auto-create and update statistics are disabled. There are also three stored procedures, the first of which creates a test table (which may or may not be partitioned). The table is pretty much the same one we used yesterday: The table has 100 rows, and both the key_col and data columns contain the same values – the integers from 1 to 100 inclusive. The table is a heap, with a non-clustered primary key on key_col, and a non-clustered non-unique index on the data column. The only reason I have used a heap here, rather than a clustered table, is so I can demonstrate a seek on a heap later on. The table has an extra column (not shown because I am too lazy to update the diagram from yesterday) called padding – a CHAR(100) column that just contains 100 spaces in every row. It’s just there to discourage SQL Server from choosing table scan over an index + RID lookup in one of the tests. The first stored procedure is called ResetTest: CREATE PROCEDURE dbo.ResetTest @Partitioned BIT = 'false' AS BEGIN SET NOCOUNT ON ; IF OBJECT_ID(N'dbo.Example', N'U') IS NOT NULL BEGIN DROP TABLE dbo.Example; END ; -- Test table is a heap -- Non-clustered primary key on 'key_col' CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, padding CHAR(100) NOT NULL DEFAULT SPACE(100), CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ; IF @Partitioned = 'true' BEGIN -- Enterprise, Trial, or Developer -- required for partitioning tests IF SERVERPROPERTY('EngineEdition') = 3 BEGIN EXECUTE (' DROP TABLE dbo.Example ; IF EXISTS ( SELECT 1 FROM sys.partition_schemes WHERE name = N''PS'' ) DROP PARTITION SCHEME PS ; IF EXISTS ( SELECT 1 FROM sys.partition_functions WHERE name = N''PF'' ) DROP PARTITION FUNCTION PF ; CREATE PARTITION FUNCTION PF (INTEGER) AS RANGE RIGHT FOR VALUES (20, 40, 60, 80, 100) ; CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]) ; CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, padding CHAR(100) NOT NULL DEFAULT SPACE(100), CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ON PS (key_col); '); END ELSE BEGIN RAISERROR('Invalid SKU for partition test', 16, 1); RETURN; END; END ; -- Non-unique non-clustered index on the 'data' column CREATE NONCLUSTERED INDEX [IX dbo.Example data] ON dbo.Example (data) ; -- Add 100 rows INSERT dbo.Example WITH (TABLOCKX) ( key_col, data ) SELECT key_col = V.number, data = V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 100 ; END; GO The second stored procedure, ShowStats, displays information from the Index Usage Stats and Index Operational Stats DMVs: CREATE PROCEDURE dbo.ShowStats @Partitioned BIT = 'false' AS BEGIN -- Index Usage Stats DMV (QE) SELECT index_name = ISNULL(I.name, I.type_desc), scans = IUS.user_scans, seeks = IUS.user_seeks, lookups = IUS.user_lookups FROM sys.dm_db_index_usage_stats AS IUS JOIN sys.indexes AS I ON I.object_id = IUS.object_id AND I.index_id = IUS.index_id WHERE IUS.database_id = DB_ID(N'ScansAndSeeks') AND IUS.object_id = OBJECT_ID(N'dbo.Example', N'U') ORDER BY I.index_id ; -- Index Operational Stats DMV (SE) IF @Partitioned = 'true' SELECT index_name = ISNULL(I.name, I.type_desc), partitions = COUNT(IOS.partition_number), range_scans = SUM(IOS.range_scan_count), single_lookups = SUM(IOS.singleton_lookup_count) FROM sys.dm_db_index_operational_stats ( DB_ID(N'ScansAndSeeks'), OBJECT_ID(N'dbo.Example', N'U'), NULL, NULL ) AS IOS JOIN sys.indexes AS I ON I.object_id = IOS.object_id AND I.index_id = IOS.index_id GROUP BY I.index_id, -- Key I.name, I.type_desc ORDER BY I.index_id; ELSE SELECT index_name = ISNULL(I.name, I.type_desc), range_scans = SUM(IOS.range_scan_count), single_lookups = SUM(IOS.singleton_lookup_count) FROM sys.dm_db_index_operational_stats ( DB_ID(N'ScansAndSeeks'), OBJECT_ID(N'dbo.Example', N'U'), NULL, NULL ) AS IOS JOIN sys.indexes AS I ON I.object_id = IOS.object_id AND I.index_id = IOS.index_id GROUP BY I.index_id, -- Key I.name, I.type_desc ORDER BY I.index_id; END; The final stored procedure, RunTest, executes a query written against the example table: CREATE PROCEDURE dbo.RunTest @SQL VARCHAR(8000), @Partitioned BIT = 'false' AS BEGIN -- No execution plan yet SET STATISTICS XML OFF ; -- Reset the test environment EXECUTE dbo.ResetTest @Partitioned ; -- Previous call will throw an error if a partitioned -- test was requested, but SKU does not support it IF @@ERROR = 0 BEGIN -- IO statistics and plan on SET STATISTICS XML, IO ON ; -- Test statement EXECUTE (@SQL) ; -- Plan and IO statistics off SET STATISTICS XML, IO OFF ; EXECUTE dbo.ShowStats @Partitioned; END; END; The Tests The first test is a simple scan of the heap table: EXECUTE dbo.RunTest @SQL = 'SELECT * FROM Example'; The top result set comes from the Index Usage Stats DMV, so it is the Query Executor’s (QE) view. The lower result is from Index Operational Stats, which shows statistics derived from the actions taken by the Storage Engine (SE). We see that QE performed 1 scan operation on the heap, and SE performed a single range scan. Let’s try a single-value equality seek on a unique index next: EXECUTE dbo.RunTest @SQL = 'SELECT key_col FROM Example WHERE key_col = 32'; This time we see a single seek on the non-clustered primary key from QE, and one singleton lookup on the same index by the SE. Now for a single-value seek on the non-unique non-clustered index: EXECUTE dbo.RunTest @SQL = 'SELECT data FROM Example WHERE data = 32'; QE shows a single seek on the non-clustered non-unique index, but SE shows a single range scan on that index – not the singleton lookup we saw in the previous test. That makes sense because we know that only a single-value seek into a unique index is a singleton seek. A single-value seek into a non-unique index might retrieve any number of rows, if you think about it. The next query is equivalent to the IN list example seen in the first post in this series, but it is written using OR (just for variety, you understand): EXECUTE dbo.RunTest @SQL = 'SELECT data FROM Example WHERE data = 32 OR data = 33'; The plan looks the same, and there’s no difference in the stats recorded by QE, but the SE shows two range scans. Again, these are range scans because we are looking for two values in the data column, which is covered by a non-unique index. I’ve added a snippet from the Properties window to show that the query plan does show two seek predicates, not just one. Now let’s rewrite the query using BETWEEN: EXECUTE dbo.RunTest @SQL = 'SELECT data FROM Example WHERE data BETWEEN 32 AND 33'; Notice the seek operator only has one predicate now – it’s just a single range scan from 32 to 33 in the index – as the SE output shows. For the next test, we will look up four values in the key_col column: EXECUTE dbo.RunTest @SQL = 'SELECT key_col FROM Example WHERE key_col IN (2,4,6,8)'; Just a single seek on the PK from the Query Executor, but four singleton lookups reported by the Storage Engine – and four seek predicates in the Properties window. On to a more complex example: EXECUTE dbo.RunTest @SQL = 'SELECT * FROM Example WITH (INDEX([PK dbo.Example key_col])) WHERE key_col BETWEEN 1 AND 8'; This time we are forcing use of the non-clustered primary key to return eight rows. The index is not covering for this query, so the query plan includes an RID lookup into the heap to fetch the data and padding columns. The QE reports a seek on the PK and a lookup on the heap. The SE reports a single range scan on the PK (to find key_col values between 1 and 8), and eight singleton lookups on the heap. Remember that a bookmark lookup (RID or Key) is a seek to a single value in a ‘unique index’ – it finds a row in the heap or cluster from a unique RID or clustering key – so that’s why lookups are always singleton lookups, not range scans. Our next example shows what happens when a query plan operator is not executed at all: EXECUTE dbo.RunTest @SQL = 'SELECT key_col FROM Example WHERE key_col = 8 AND @@TRANCOUNT < 0'; The Filter has a start-up predicate which is always false (if your @@TRANCOUNT is less than zero, call CSS immediately). The index seek is never executed, but QE still records a single seek against the PK because the operator appears once in an executed plan. The SE output shows no activity at all. This next example is 2008 and above only, I’m afraid: EXECUTE dbo.RunTest @SQL = 'SELECT * FROM Example WHERE key_col BETWEEN 1 AND 30', @Partitioned = 'true'; This is the first example to use a partitioned table. QE reports a single seek on the heap (yes – a seek on a heap), and the SE reports two range scans on the heap. SQL Server knows (from the partitioning definition) that it only needs to look at partitions 1 and 2 to find all the rows where key_col is between 1 and 30 – the engine seeks to find the two partitions, and performs a range scan seek on each partition. The final example for today is another seek on a heap – try to work out the output of the query before running it! EXECUTE dbo.RunTest @SQL = 'SELECT TOP (2) WITH TIES * FROM Example WHERE key_col BETWEEN 1 AND 50 ORDER BY $PARTITION.PF(key_col) DESC', @Partitioned = 'true'; Notice the lack of an explicit Sort operator in the query plan to enforce the ORDER BY clause, and the backward range scan. © 2011 Paul White email: [email protected] twitter: @SQL_Kiwi

Read the article
PASS: Bylaw Changes

- by Bill Graziano

While you’re reading this, a post should be going up on the PASS blog on the plans to change our bylaws. You should be able to find our old bylaws, our proposed bylaws and a red-lined version of the changes. We plan to listen to feedback until March 31st. At that point we’ll decide whether to vote on these changes or take other action. The executive summary is that we’re adding a restriction to prevent more than two people from the same company on the Board and eliminating the Board’s Officer Appointment Committee to have Officers directly elected by the Board. This second change better matches how officer elections have been conducted in the past. The Gritty Details Our scope was to change bylaws to match how PASS actually works and tackle a limited set of issues. Changing the bylaws is hard. We’ve been working on these changes since the March board meeting last year. At that meeting we met and talked through the issues we wanted to address. In years past the Board has tried to come up with language and then we’ve discussed and negotiated to get to the result. In March, we gave HQ guidance on what we wanted and asked them to come up with a starting point. Hannes worked on building us an initial set of changes that we could work our way through. Discussing changes like this over email is difficult wasn’t very productive. We do a much better job on this at the in-person Board meetings. Unfortunately there are only 2 or 3 of those a year. In August we met in Nashville and spent time discussing the changes. That was also the day after we released the slate for the 2010 election. The discussion around that colored what we talked about in terms of these changes. We talked very briefly at the Summit and again reviewed and revised the changes at the Board meeting in January. This is the result of those changes and discussions. We made numerous small changes to clean up language and make wording more clear. We also made two big changes. Director Employment Restrictions The first is that only two people from the same company can serve on the Board at the same time. The actual language in section VI.3 reads: A maximum of two (2) Directors who are employed by, or who are joint owners or partners in, the same for-profit venture, company, organization, or other legal entity, may concurrently serve on the PASS Board of Directors at any time. The definition of “employed” is at the sole discretion of the Board. And what a mess this turns out to be in practice. Our membership is a hodgepodge of interlocking relationships. Let’s say three Board members get together and start a blog service for SQL Server bloggers. It’s technically for-profit. Let’s assume it makes $8 in the first year. Does that trigger this clause? (Technically yes.) We had a horrible time trying to write language that covered everything. All the sample bylaws that we found were just as vague as this. That led to the third clause in this section. The first sentence reads: The Board of Directors reserves the right, strictly on a case-by-case basis, to overrule the requirements of Section VI.3 by majority decision for any single Director’s conflict of employment. We needed some way to handle the trivial issues and exercise some judgment. It seems like a public vote is the best way. This discloses the relationship and gets each Board member on record on the issue. In practice I think this clause will rarely be used. I think this entire section will only be invoked for actual employment issues and not for small side projects. In either case we have the mechanisms in place to handle it in a public, transparent way. That’s the first and third clauses. The second clause says that if your situation changes and you fall afoul of this restriction you need to notify the Board. The clause further states that if this new job means a Board members violates the “two-per-company” rule the Board may request their resignation. The Board can also allow the person to continue serving with a majority vote. I think this will also take some judgment. Consider a person switching jobs that leads to three people from the same company. I’m very likely to ask for someone to resign if all three are two weeks into a two year term. I’m unlikely to ask anyone to resign if one is two weeks away from ending their term. In either case, the decision will be a public vote that we can be held accountable for. One concern that was raised was whether this would affect someone choosing to accept a job. I think that’s a choice for them to make. PASS is clearly stating its intent that only two directors from any one organization should serve at any time. Once these bylaws are approved, this policy should not come as a surprise to any potential or current Board members considering a job change. This clause isn’t perfect. The biggest hole is business relationships that aren’t defined above. Let’s say that two employees from company “X” serve on the Board. What happens if I accept a full-time consulting contract with that company? Let’s assume I’m working directly for one of the two existing Board members. That doesn’t violate section VI.3. But I think it’s clearly the kind of relationship we’d like to prevent. Unfortunately that was even harder to write than what we have now. I fully expect that in the next revision of the bylaws we’ll address this. It just didn’t make it into this one. Officer Elections The officer election process received a slightly different rewrite. Our goal was to codify in the bylaws the actual process we used to elect the officers. The officers are the President, Executive Vice-President (EVP) and Vice-President of Marketing. The Immediate Past President (IPP) is also an officer but isn’t elected. The IPP serves in that role for two years after completing their term as President. We do that for continuity’s sake. Some organizations have a President-elect that serves for one or two years. The group that founded PASS chose to have an IPP. When I started on the Board, the Nominating Committee (NomCom) selected the slate for the at-large directors and the slate for the officers. There was always one candidate for each officer position. It wasn’t really an election so much as the NomCom decided who the next person would be for each officer position. Behind the scenes the Board worked to select the best people for the role. In June 2009 that process was changed to bring it line with what actually happens. An Officer Appointment Committee was created that was a subset of the Board. That committee would take time to interview the candidates and present a slate to the Board for approval. The majority vote of the Board would determine the officers for the next two years. In practice the Board itself interviewed the candidates and conducted the elections. That means it was time to change the bylaws again. Section VII.2 and VII.3 spell out the process used to select the officers. We use the phrase “Officer Appointment” to separate it from the Director election but the end result is that the Board elects the officers. Section VII.3 starts: Officers shall be appointed bi-annually by a majority of all the voting members of the Board of Directors. Everything else revolves around that sentence. We use the word appoint but they truly are elected. There are details in the bylaws for term limits, minimum requirements for President (1 prior term as an officer), tie breakers and filling vacancies. In practice we will have an election for President, then an election for EVP and then an election for VP Marketing. That means that losing candidates will be able to fall down the ladder and run for the next open position. Another point to note is that officers aren’t at-large directors. That means if a current sitting officer loses all three elections they are off the Board. Having Board member votes public will help with the transparency of this approach. This process has a number of positive and negatives. The biggest concern I expect to hear is that our members don’t directly choose the officers. I’m going to try and list all the positives and negatives of this approach. Many non-profits value continuity and are slower to change than a business. On the plus side this promotes that. On the negative side this promotes that. If we change too slowly the members complain that we aren’t responsive. If we change too quickly we make mistakes and fail at various things. We’ve been criticized for both of those lately so I’m not entirely sure where to draw the line. My rough assumption to this point is that we’re going too slow on governance and too quickly on becoming “more than a Summit.” This approach creates competition in the officer elections. If you are an at-large director there is no consequence to losing an election. If you are an officer the only way to stay on the Board is to win an officer election or an at-large election. If you are an officer and lose an election you can always run for the next office down. This makes it very easy for multiple people to contest an election. There is value in a person moving through the officer positions up to the Presidency. Having the Board select the officers promotes this. The down side is that it takes a LOT of time to get to the Presidency. We’ve had good people struggle with burnout. We’ve had lots of discussion around this. The process as we’ve described it here makes it possible for someone to move quickly through the ranks but doesn’t prevent people from working their way up through each role. We talked long and hard about having the officers elected by the members. We had a self-imposed deadline to complete these changes prior to elections this summer. The other challenge was that our original goal was to make the bylaws reflect our actual process rather than create a new one. I believe we accomplished this goal. We ran out of time to consider this option in the detail it needs. Having member elections for officers needs a number of problems solved. We would need a way for candidates to fall through the election. This is what promotes competition. Without this few people would risk an election and we’ll be back to one candidate per slot. We need to do this without having multiple elections. We may be able to copy what other organizations are doing but I was surprised at how little I could find on other organizations. We also need a way for people that lose an officer election to win an at-large election. Otherwise we’ll have very little competition for officers. This brings me to an area that I think we as a Board haven’t done a good job. We haven’t built a strong process to tell you who is doing a good job and who isn’t. This is a double-edged sword. I don’t want to highlight Board members that are failing. That’s not a good way to get people to volunteer and run for the Board. But I also need a way let the members make an informed choice about who is doing a good job and would make a good officer. Encouraging Board members to blog, publishing minutes and making votes public helps in that regard but isn’t the final answer. I don’t know what the final answer is yet. I do know that the Board members themselves are uniquely positioned to know which other Board members are doing good work. They know who speaks up in meetings, who works to build consensus, who has good ideas and who works with the members. What I Could Do Better I’ve learned a lot writing this about how we communicated with our members. The next time we revise the bylaws I’d do a few things differently. The biggest change would be to provide better documentation. The March 2009 minutes provide a very detailed look into what changes we wanted to make to the bylaws. Looking back, I’m a little surprised at how closely they matched our final changes and covered the various arguments. If you just read those you’d get 90% of what we eventually changed. Nearly everything else was just details around implementation. I’d also consider publishing a scope document defining exactly what we were doing any why. I think it really helped that we had a limited, defined goal in mind. I don’t think we did a good job communicating that goal outside the meeting minutes though. That said, I wish I’d blogged more after the August and January meeting. I think it would have helped more people to know that this change was coming and to be ready for it. Conclusion These changes address two big concerns that the Board had. First, it prevents a single organization from dominating the Board. Second, it codifies and clearly spells out how officers are elected. This is the process that was previously followed but it was somewhat murky. These changes bring clarity to this and clearly explain the process the Board will follow. We’re going to listen to feedback until March 31st. At that time we’ll decide whether to approve these changes. I’m also assuming that we’ll start another round of changes in the next year or two. Are there other issues in the bylaws that we should tackle in the future?

Read the article
PASS: Bylaw Change 2013

- by Bill Graziano

PASS launched a Global Growth Initiative in the Summer of 2011 with the appointment of three international Board advisors. Since then we’ve thought and talked extensively about how we make PASS more relevant to our members outside the US and Canada. We’ve collected much of that discussion in our Global Growth site. You can find vision documents, plans, governance proposals, feedback sites, and transcripts of Twitter chats and town hall meetings. We also address these plans at the Board Q&A during the 2012 Summit. One of the biggest changes coming out of this process is around how we elect Board members. And that requires a change to the bylaws. We published the proposed bylaw changes as a red-lined document so you can clearly see the changes. Our goal in these bylaw changes was to address the changes required by the global growth initiatives, conduct a legal review of the document and address other minor issues in the document. There are numerous small wording changes throughout the document. For example, we replaced every reference of “The Corporation” with the word “PASS” so it now reads “PASS is organized…”. Board Composition The biggest change in these bylaw changes is how the Board is composed and elected. This discussion starts in section VI.2. This section now says that some elected directors will come from geographic regions. I think this is the best way to make sure we give all of our members a voice in the leadership of the organization. The key parts of this section are: The remaining Directors (i.e. the non-Officer Directors and non-Vendor Appointed Directors) shall be elected by the voting membership (“Elected Directors”). Elected Directors shall include representatives of defined PASS regions (“Regions”) as set forth below (“Regional Directors”) and at minimum one (1) additional Director-at-Large whose selection is not limited by region. Regional Directors shall include, but are not limited to, two (2) seats for the Region covering Canada and the United States of America. Additional Regions for the purpose of electing additional Regional Directors and additional Director-at-Large seats for the purpose of expanding the Board shall be defined by a majority vote of the current Board of Directors and must be established prior to the public call for nominations in the general election. Previously defined Regions and seats approved by the Board of Directors shall remain in effect and can only be modified by a 2/3 majority vote by the then current Board of Directors. Currently PASS has six At-Large Directors elected by the members. These changes allow for a Regional Director position that is elected by the members but must come from a particular region. It also stipulates that there must always be at least one Director-at-Large who can come from any region. We also understand that PASS is currently a very US-centric organization. Our Summit is held in America, roughly half our chapters are in the US and Canada and most of the Board members over the last ten years have come from America. We wanted to reflect that by making sure that our US and Canadian volunteers would continue to play a significant role by ensuring that two Regional seats are reserved specifically for Canada and the US. Other than that, the bylaws don’t create any specific regional seats. These rules allow us to create Regional Director seats but don’t require it. We haven’t fully discussed what the criteria will be in order for a region to have a seat designated for it or how many regions there will be. In our discussions we’ve broadly discussed regions for United States and Canada Europe, Middle East, and Africa (EMEA) Australia, New Zealand and Asia (also known as Asia Pacific or APAC) Mexico, South America, and Central America (LATAM) As you can see, our thinking is that there will be a few large regions. I’ve also considered a non-North America region that we can gradually split into the regions above as our membership grows in those areas. The regions will be defined by a policy document that will be published prior to the elections. I’m hoping that over the next year we can begin to publish more of what we do as Board-approved policy documents. While the bylaws only require a single non-region specific At-large Director, I would expect we would always have two. That way we can have one in each election. I think it’s important that we always have one seat open that anyone who is eligible to run for the Board can contest. The Board is required to have any regions defined prior to the start of the election process. Board Elections – Regional Seats We spent a lot of time discussing how the elections would work for these Regional Director seats. Ultimately we decided that the simplest solution is that every PASS member should vote for every open seat. Section VIII.3 reads: Candidates who are eligible (i.e. eligible to serve in such capacity subject to the criteria set forth herein or adopted by the Board of Directors) shall be designated to fill open Board seats in the following order of priority on the basis of total votes received: (i) full term Regional Director seats, (ii) full term Director-at-Large seats, (iii) not full term (vacated) Regional Director seats, (iv) not full term (vacated) Director-at-Large seats. For the purposes of clarity, because of eligibility requirements, it is contemplated that the candidates designated to the open Board seats may not receive more votes than certain other candidates who are not selected to the Board. We debated whether to have multiple ballots or one single ballot. Multiple ballot elections get complicated quickly. Let’s say we have a ballot for US/Canada and one for Region 2. After that we’d need a mechanism to merge those two together and come up with the winner of the at-large seat or have another election for the at-large position. We think the best way to do this is a single ballot and putting the highest vote getters into the most restrictive seats. Let’s look at an example: There are seats open for Region 1, Region 2 and at-large. The election results are as follows: Candidate A (eligible for Region 1) – 550 votes Candidate B (eligible for Region 1) – 525 votes Candidate C (eligible for Region 1) – 475 votes Candidate D (eligible for Region 2) – 125 votes Candidate E (eligible for Region 2) – 75 votes In this case, Candidate A is the winner for Region 1 and is assigned that seat. Candidate D is the winner for Region 2 and is assigned that seat. The at-large seat is filled by the high remaining vote getter which is Candidate B. The key point to understand is that we may have a situation where a person with a lower vote total is elected to a regional seat and a person with a higher vote total is excluded. This will be true whether we had multiple ballots or a single ballot. Board Elections – Vacant Seats The other change to the election process is for vacant Board seats. The actual changes are sprinkled throughout the document. Previously we didn’t have a mechanism that allowed for an election of a Board seat that we knew would be vacant in the future. The most common case is when a Board members moves to an Officer role in the middle of their term. One of the key changes is to allow the number of votes members have to match the number of open seats. This allows each voter to express their preference on all open seats. This only applies when we know about the opening prior to the call for nominations. This all means that if there’s a seat will be open at the start of the next Board term, and we know about it prior to the call for nominations, we can include that seat in the elections. Ultimately, the aim is to have PASS members decide who sits on the Board in as many situations as possible. We discussed the option of changing the bylaws to just take next highest vote-getter in all other cases. I think that’s wrong for the following reasons: All voters aren’t able to express an opinion on all candidates. If there are five people running for three seats, you can only vote for three. You have no way to express your preference between #4 and #5. Different candidates may have different information about the number of seats available. A person may learn that a Board member plans to resign at the end of the year prior to that information being made public. They may understand that the top four vote getters will end up on the Board while the rest of the members believe there are only three openings. This may affect someone’s decision to run. I don’t think this creates a transparent, fair election. Board members may use their knowledge of the election results to decide whether to remain on the Board or not. Admittedly this one is unlikely but I don’t want to create a situation where this accusation can be leveled. I think the majority of vacancies in the future will be handled through elections. The bylaw section quoted above also indicates that partial term vacancies will be filled after the full term seats are filled. Removing Directors Section VI.7 on removing directors has always had a clause that allowed members to remove an elected director. We also had a clause that allowed appointed directors to be removed. We added a clause that allows the Board to remove for cause any director with a 2/3 majority vote. The updated text reads: Any Director may be removed for cause by a 2/3 majority vote of the Board of Directors whenever in its judgment the best interests of PASS would be served thereby. Notwithstanding the foregoing, the authority of any Director to act as in an official capacity as a Director or Officer of PASS may be suspended by the Board of Directors for cause. Cause for suspension or removal of a Director shall include but not be limited to failure to meet any Board-approved performance expectations or the presence of a reason for suspension or dismissal as listed in Addendum B of these Bylaws. The first paragraph is updated and the second and third are unchanged (except cleaning up language). If you scroll down and look at Addendum B of these bylaws you find the following: Cause for suspension or dismissal of a member of the Board of Directors may include: Inability to attend Board meetings on a regular basis. Inability or unwillingness to act in a capacity designated by the Board of Directors. Failure to fulfill the responsibilities of the office. Inability to represent the Region elected to represent Failure to act in a manner consistent with PASS's Bylaws and/or policies. Misrepresentation of responsibility and/or authority. Misrepresentation of PASS. Unresolved conflict of interests with Board responsibilities. Breach of confidentiality. The bold line about your inability to represent your region is what we added to the bylaws in this revision. We also added a clause to section VII.3 allowing the Board to remove an officer. That clause is much less restrictive. It doesn’t require cause and only requires a simple majority. The Board of Directors may remove any Officer whenever in their judgment the best interests of PASS shall be served by such removal. Other There are numerous other small changes throughout the document. Proxy voting. The laws around how members and Board members proxy votes are specific in Illinois law. PASS is an Illinois corporation and is subject to Illinois laws. We changed section IV.5 to come into compliance with those laws. Specifically this says you can only vote through a proxy if you have a written proxy through your authorized attorney. English language proficiency. As we increase our global footprint we come across more members that aren’t native English speakers. The business of PASS is conducted in English and it’s important that our Board members speak English. If we get big enough to afford translators, we may be able to relax this but right now we need English language skills for effective Board members. Committees. The language around committees in section IX is old and dated. Our lawyers advised us to clean it up. This section specifically applies to any committees that the Board may form outside of portfolios. We removed the term limits, quorum and vacancies clause. We don’t currently have any committees that this would apply to. The Nominating Committee is covered elsewhere in the bylaws. Electronic Votes. The change allows the Board to vote via email but the results must be unanimous. This is to conform with Illinois state law. Immediate Past President. There was no mechanism to fill the IPP role if an outgoing President chose not to participate. We changed section VII.8 to allow the Board to invite any previous President to fill the role by majority vote. Nominations Committee. We’ve opened the language to allow for the transparent election of the Nominations Committee as outlined by the 2011 Election Review Committee. Revocation of Charters. The language surrounding the revocation of charters for local groups was flagged by the lawyers. We have allowed for the local user group to make all necessary payment before considering returning of items to PASS if required. Bylaw notification. We’ve spent countless meetings working on these bylaws with the intent to not open them again any time in the near future. Should the bylaws be opened again, we have included a clause ensuring that the PASS membership is involved. I’m proud that the Board has remained committed to transparency and accountability to members. This clause will require that same level of commitment in the future even when all the current Board members have rolled off. I think that covers everything. I’d encourage you to look through the red-line document and see the changes. It’s helpful to look at the language that’s being removed and the language that’s being added. I’m happy to answer any questions here or you can email them to [email protected].

Read the article
.NET Code Evolution

- by Alois Kraus

Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/07/24/153504.aspxAt my day job I do look at a lot of code written by other people. Most of the code is quite good and some is even a masterpiece. And there is also code which makes you think WTF… oh it was written by me. Hm not so bad after all. There are many excuses reasons for bad code. Most often it is time pressure followed by not enough ambition (who cares) or insufficient training. Normally I do care about code quality quite a lot which makes me a (perceived) slow worker who does write many tests and refines the code quite a lot because of the design deficiencies. Most of the deficiencies I do find by putting my design under stress while checking for invariants. It does also help a lot to step into the code with a debugger (sometimes also Windbg). I do this much more often when my tests are red. That way I do get a much better understanding what my code really does and not what I think it should be doing. This time I do want to show you how code can evolve over the years with different .NET Framework versions. Once there was time where .NET 1.1 was new and many C++ programmers did switch over to get rid of not initialized pointers and memory leaks. There were also nice new data structures available such as the Hashtable which is fast lookup table with O(1) time complexity. All was good and much code was written since then. At 2005 a new version of the .NET Framework did arrive which did bring many new things like generics and new data structures. The “old” fashioned way of Hashtable were coming to an end and everyone used the new Dictionary<xx,xx> type instead which was type safe and faster because the object to type conversion (aka boxing) was no longer necessary. I think 95% of all Hashtables and dictionaries use string as key. Often it is convenient to ignore casing to make it easy to look up values which the user did enter. An often followed route is to convert the string to upper case before putting it into the Hashtable. Hashtable Table = new Hashtable(); void Add(string key, string value) { Table.Add(key.ToUpper(), value); } This is valid and working code but it has problems. First we can pass to the Hashtable a custom IEqualityComparer to do the string matching case insensitive. Second we can switch over to the now also old Dictionary type to become a little faster and we can keep the the original keys (not upper cased) in the dictionary. Dictionary<string, string> DictTable = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); void AddDict(string key, string value) { DictTable.Add(key, value); } Many people do not user the other ctors of Dictionary because they do shy away from the overhead of writing their own comparer. They do not know that .NET has for strings already predefined comparers at hand which you can directly use. Today in the many core area we do use threads all over the place. Sometimes things break in subtle ways but most of the time it is sufficient to place a lock around the offender. Threading has become so mainstream that it may sound weird that in the year 2000 some guy got a huge incentive for the idea to reduce the time to process calibration data from 12 hours to 6 hours by using two threads on a dual core machine. Threading does make it easy to become faster at the expense of correctness. Correct and scalable multithreading can be arbitrarily hard to achieve depending on the problem you are trying to solve. Lets suppose we want to process millions of items with two threads and count the processed items processed by all threads. A typical beginners code might look like this: int Counter; void IJustLearnedToUseThreads() { var t1 = new Thread(ThreadWorkMethod); t1.Start(); var t2 = new Thread(ThreadWorkMethod); t2.Start(); t1.Join(); t2.Join(); if (Counter != 2 * Increments) throw new Exception("Hmm " + Counter + " != " + 2 * Increments); } const int Increments = 10 * 1000 * 1000; void ThreadWorkMethod() { for (int i = 0; i < Increments; i++) { Counter++; } } It does throw an exception with the message e.g. “Hmm 10.222.287 != 20.000.000” and does never finish. The code does fail because the assumption that Counter++ is an atomic operation is wrong. The ++ operator is just a shortcut for Counter = Counter + 1 This does involve reading the counter from a memory location into the CPU, incrementing value on the CPU and writing the new value back to the memory location. When we do look at the generated assembly code we will see only inc dword ptr [ecx+10h] which is only one instruction. Yes it is one instruction but it is not atomic. All modern CPUs have several layers of caches (L1,L2,L3) which try to hide the fact how slow actual main memory accesses are. Since cache is just another word for redundant copy it can happen that one CPU does read a value from main memory into the cache, modifies it and write it back to the main memory. The problem is that at least the L1 cache is not shared between CPUs so it can happen that one CPU does make changes to values which did change in meantime in the main memory. From the exception you can see we did increment the value 20 million times but half of the changes were lost because we did overwrite the already changed value from the other thread. This is a very common case and people do learn to protect their data with proper locking. void Intermediate() { var time = Stopwatch.StartNew(); Action acc = ThreadWorkMethod_Intermediate; var ar1 = acc.BeginInvoke(null, null); var ar2 = acc.BeginInvoke(null, null); ar1.AsyncWaitHandle.WaitOne(); ar2.AsyncWaitHandle.WaitOne(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Intermediate did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Intermediate() { for (int i = 0; i < Increments; i++) { lock (this) { Counter++; } } } This is better and does use the .NET Threadpool to get rid of manual thread management. It does give the expected result but it can result in deadlocks because you do lock on this. This is in general a bad idea since it can lead to deadlocks when other threads use your class instance as lock object. It is therefore recommended to create a private object as lock object to ensure that nobody else can lock your lock object. When you read more about threading you will read about lock free algorithms. They are nice and can improve performance quite a lot but you need to pay close attention to the CLR memory model. It does make quite weak guarantees in general but it can still work because your CPU architecture does give you more invariants than the CLR memory model. For a simple counter there is an easy lock free alternative present with the Interlocked class in .NET. As a general rule you should not try to write lock free algos since most likely you will fail to get it right on all CPU architectures. void Experienced() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); t1.Wait(); t2.Wait(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Experienced did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Experienced() { for (int i = 0; i < Increments; i++) { Interlocked.Increment(ref Counter); } } Since time does move forward we do not use threads explicitly anymore but the much nicer Task abstraction which was introduced with .NET 4 at 2010. It is educational to look at the generated assembly code. The Interlocked.Increment method must be called which does wondrous things right? Lets see: lock inc dword ptr [eax] The first thing to note that there is no method call at all. Why? Because the JIT compiler does know very well about CPU intrinsic functions. Atomic operations which do lock the memory bus to prevent other processors to read stale values are such things. Second: This is the same increment call prefixed with a lock instruction. The only reason for the existence of the Interlocked class is that the JIT compiler can compile it to the matching CPU intrinsic functions which can not only increment by one but can also do an add, exchange and a combined compare and exchange operation. But be warned that the correct usage of its methods can be tricky. If you try to be clever and look a the generated IL code and try to reason about its efficiency you will fail. Only the generated machine code counts. Is this the best code we can write? Perhaps. It is nice and clean. But can we make it any faster? Lets see how good we are doing currently. Level Time in s IJustLearnedToUseThreads Flawed Code Intermediate 1,5 (lock) Experienced 0,3 (Interlocked.Increment) Master 0,1 (1,0 for int[2]) That lock free thing is really a nice thing. But if you read more about CPU cache, cache coherency, false sharing you can do even better. int[] Counters = new int[12]; // Cache line size is 64 bytes on my machine with an 8 way associative cache try for yourself e.g. 64 on more modern CPUs void Master() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Master, 0); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Master, Counters.Length - 1); t1.Wait(); t2.Wait(); Counter = Counters[0] + Counters[Counters.Length - 1]; if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Master did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Master(object number) { int index = (int) number; for (int i = 0; i < Increments; i++) { Counters[index]++; } } The key insight here is to use for each core its own value. But if you simply use simply an integer array of two items, one for each core and add the items at the end you will be much slower than the lock free version (factor 3). Each CPU core has its own cache line size which is something in the range of 16-256 bytes. When you do access a value from one location the CPU does not only fetch one value from main memory but a complete cache line (e.g. 16 bytes). This means that you do not pay for the next 15 bytes when you access them. This can lead to dramatic performance improvements and non obvious code which is faster although it does have many more memory reads than another algorithm. So what have we done here? We have started with correct code but it was lacking knowledge how to use the .NET Base Class Libraries optimally. Then we did try to get fancy and used threads for the first time and failed. Our next try was better but it still had non obvious issues (lock object exposed to the outside). Knowledge has increased further and we have found a lock free version of our counter which is a nice and clean way which is a perfectly valid solution. The last example is only here to show you how you can get most out of threading by paying close attention to your used data structures and CPU cache coherency. Although we are working in a virtual execution environment in a high level language with automatic memory management it does pay off to know the details down to the assembly level. Only if you continue to learn and to dig deeper you can come up with solutions no one else was even considering. I have studied particle physics which does help at the digging deeper part. Have you ever tried to solve Quantum Chromodynamics equations? Compared to that the rest must be easy ;-). Although I am no longer working in the Science field I take pride in discovering non obvious things. This can be a very hard to find bug or a new way to restructure data to make something 10 times faster. Now I need to get some sleep ….

Read the article
How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

- by Bob Hanckel

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables. They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL 8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use. The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section). Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load. In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks. It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts. The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes. The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead. The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load. It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same. Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size. For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods. This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article
SVN / Tortoise SVN - Cannot checkout, Access is denied

- by Shevek

We are having a strange problem with our SVN repository One particular project is failing to check out on some dev's workstations. The error is: svn: E720005: Can't open file 'C:\path\.svn\pristine\4e\4e576fad0f625706379863e6051aac33097dbee0.svn-base': Access is denied. All branches, tags and trunk fail at the same point of checkout Other projects in the repository can be checked out fine One of our devs can check out the problem project fine. We have tried 3 different versions of TortoiseSVN (both UI and command line) and another build of win32svn. The one who can check out is using the same client version as one who cannot. Any ideas?

Read the article

< Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455 | Next Page >