Search Results

Search found 3106 results on 125 pages for 'talking shoes'.

Page 124/125 | < Previous Page | 120 121 122 123 124 125  | Next Page >

  • Pain Comes Instantly

    - by user701213
    When I look back at recent blog entries – many of which are not all that current (more on where my available writing time is going later) – I am struck by how many of them focus on public policy or legislative issues instead of, say, the latest nefarious cyberattack or exploit (or everyone’s favorite new pastime: coining terms for the Coming Cyberpocalypse: “digital Pearl Harbor” is so 1941). Speaking of which, I personally hope evil hackers from Malefactoria will someday hack into my bathroom scale – which in a future time will be connected to the Internet because, gosh, wouldn’t it be great to have absolutely everything in your life Internet-enabled? – and recalibrate it so I’m 10 pounds thinner. The horror. In part, my focus on public policy is due to an admitted limitation of my skill set. I enjoy reading technical articles about exploits and cybersecurity trends, but writing a blog entry on those topics would take more research than I have time for and, quite honestly, doesn’t play to my strengths. The first rule of writing is “write what you know.” The bigger contributing factor to my recent paucity of blog entries is that more and more of my waking hours are spent engaging in “thrust and parry” activity involving emerging regulations of some sort or other. I’ve opined in earlier blogs about what constitutes good and reasonable public policy so nobody can accuse me of being reflexively anti-regulation. That said, you have so many cycles in the day, and most of us would rather spend it slaying actual dragons than participating in focus groups on whether dragons are really a problem, whether lassoing them (with organic, sustainable and recyclable lassos) is preferable to slaying them – after all, dragons are people, too - and whether we need lasso compliance auditors to make sure lassos are being used correctly and humanely. (A point that seems to evade many rule makers: slaying dragons actually accomplishes something, whereas talking about “approved dragon slaying procedures and requirements” wastes the time of those who are competent to dispatch actual dragons and who were doing so very well without the input of “dragon-slaying theorists.”) Unfortunately for so many of us who would just get on with doing our day jobs, cybersecurity is rapidly devolving into the “focus groups on dragon dispatching” realm, which actual dragons slayers have little choice but to participate in. The general trend in cybersecurity is that powers-that-be – which encompasses groups other than just legislators – are often increasingly concerned and therefore feel they need to Do Something About Cybersecurity. Many seem to believe that if only we had the right amount of regulation and oversight, there would be no data breaches: a breach simply must mean Someone Is At Fault and Needs Supervision. (Leaving aside the fact that we have lots of home invasions despite a) guard dogs b) liberal carry permits c) alarm systems d) etc.) Also note that many well-managed and security-aware organizations, like the US Department of Defense, still get hacked. More specifically, many powers-that-be feel they must direct industry in a multiplicity of ways, up to and including how we actually build and deploy information technology systems. The more prescriptive the requirement, the more regulators or overseers a) can be seen to be doing something b) feel as if they are doing something regardless of whether they are actually doing something useful or cost effective. Note: an unfortunate concomitant of Doing Something is that often the cure is worse than the ailment. That is, doing what overseers want creates unfortunate byproducts that they either didn’t foresee or worse, don’t care about. After all, the logic goes, we Did Something. Prescriptive practice in the IT industry is problematic for a number of reasons. For a start, prescriptive guidance is really only appropriate if: • It is cost effective• It is “current” (meaning, the guidance doesn’t require the use of the technical equivalent of buggy whips long after horse-drawn transportation has become passé)*• It is practical (that is, pragmatic, proven and effective in the real world, not theoretical and unproven)• It solves the right problem With the above in mind, heading up the list of “you must be joking” regulations are recent disturbing developments in the Payment Card Industry (PCI) world. I’d like to give PCI kahunas the benefit of the doubt about their intentions, except that efforts by Oracle among others to make them aware of “unfortunate side effects of your requirements” – which is as tactful I can be for reasons that I believe will become obvious below - have gone, to-date, unanswered and more importantly, unchanged. A little background on PCI before I get too wound up. In 2008, the Payment Card Industry (PCI) Security Standards Council (SSC) introduced the Payment Application Data Security Standard (PA-DSS). That standard requires vendors of payment applications to ensure that their products implement specific requirements and undergo security assessment procedures. In order to have an application listed as a Validated Payment Application (VPA) and available for use by merchants, software vendors are required to execute the PCI Payment Application Vendor Release Agreement (VRA). (Are you still with me through all the acronyms?) Beginning in August 2010, the VRA imposed new obligations on vendors that are extraordinary and extraordinarily bad, short-sighted and unworkable. Specifically, PCI requires vendors to disclose (dare we say “tell all?”) to PCI any known security vulnerabilities and associated security breaches involving VPAs. ASAP. Think about the impact of that. PCI is asking a vendor to disclose to them: • Specific details of security vulnerabilities • Including exploit information or technical details of the vulnerability • Whether or not there is any mitigation available (as in a patch) PCI, in turn, has the right to blab about any and all of the above – specifically, to distribute all the gory details of what is disclosed - to the PCI SSC, qualified security assessors (QSAs), and any affiliate or agent or adviser of those entities, who are in turn permitted to share it with their respective affiliates, agents, employees, contractors, merchants, processors, service providers and other business partners. This assorted crew can’t be more than, oh, hundreds of thousands of entities. Does anybody believe that several hundred thousand people can keep a secret? Or that several hundred thousand people are all equally trustworthy? Or that not one of the people getting all that information would blab vulnerability details to a bad guy, even by accident? Or be a bad guy who uses the information to break into systems? (Wait, was that the Easter Bunny that just hopped by? Bringing world peace, no doubt.) Sarcasm aside, common sense tells us that telling lots of people a secret is guaranteed to “unsecret” the secret. Notably, being provided details of a vulnerability (without a patch) is of little or no use to companies running the affected application. Few users have the technological sophistication to create a workaround, and even if they do, most workarounds break some other functionality in the application or surrounding environment. Also, given the differences among corporate implementations of any application, it is highly unlikely that a single workaround is going to work for all corporate users. So until a patch is developed by the vendor, users remain at risk of exploit: even more so if the details of vulnerability have been widely shared. Sharing that information widely before a patch is available therefore does not help users, and instead helps only those wanting to exploit known security bugs. There’s a shocker for you. Furthermore, we already know that insider information about security vulnerabilities inevitably leaks, which is why most vendors closely hold such information and limit dissemination until a patch is available (and frequently limit dissemination of technical details even with the release of a patch). That’s the industry norm, not that PCI seems to realize or acknowledge that. Why would anybody release a bunch of highly technical exploit information to a cast of thousands, whose only “vetting” is that they are members of a PCI consortium? Oracle has had personal experience with this problem, which is one reason why information on security vulnerabilities at Oracle is “need to know” (we use our own row level access control to limit access to security bugs in our bug database, and thus less than 1% of development has access to this information), and we don’t provide some customers with more information than others or with vulnerability information and/or patches earlier than others. Failure to remember “insider information always leaks” creates problems in the general case, and has created problems for us specifically. A number of years ago, one of the UK intelligence agencies had information about a non-public security vulnerability in an Oracle product that they circulated among other UK and Commonwealth defense and intelligence entities. Nobody, it should be pointed out, bothered to report the problem to Oracle, even though only Oracle could produce a patch. The vulnerability was finally reported to Oracle by (drum roll) a US-based commercial company, to whom the information had leaked. (Note: every time I tell this story, the MI-whatever agency that created the problem gets a bit shirty with us. I know they meant well and have improved their vulnerability handling/sharing processes but, dudes, next time you find an Oracle vulnerability, try reporting it to us first before blabbing to lots of people who can’t actually fix the problem. Thank you!) Getting back to PCI: clearly, these new disclosure obligations increase the risk of exploitation of a vulnerability in a VPA and thus, of misappropriation of payment card data and customer information that a VPA processes, stores or transmits. It stands to reason that VRA’s current requirement for the widespread distribution of security vulnerability exploit details -- at any time, but particularly before a vendor can issue a patch or a workaround -- is very poor public policy. It effectively publicizes information of great value to potential attackers while not providing compensating benefits - actually, any benefits - to payment card merchants or consumers. In fact, it magnifies the risk to payment card merchants and consumers. The risk is most prominent in the time before a patch has been released, since customers often have little option but to continue using an application or system despite the risks. However, the risk is not limited to the time before a patch is issued: customers often need days, or weeks, to apply patches to systems, based upon the complexity of the issue and dependence on surrounding programs. Rather than decreasing the available window of exploit, this requirement increases the available window of exploit, both as to time available to exploit a vulnerability and the ease with which it can be exploited. Also, why would hackers focus on finding new vulnerabilities to exploit if they can get “EZHack” handed to them in such a manner: a) a vulnerability b) in a payment application c) with exploit code: the “Hacking Trifecta!“ It’s fair to say that this is probably the exact opposite of what PCI – or any of us – would want. Established industry practice concerning vulnerability handling avoids the risks created by the VRA’s vulnerability disclosure requirements. Specifically, the norm is not to release information about a security bug until the associated patch (or a pretty darn good workaround) has been issued. Once a patch is available, the notice to the user community is a high-level communication discussing the product at issue, the level of risk associated with the vulnerability, and how to apply the patch. The notices do not include either the specific customers affected by the vulnerability or forensic reports with maps of the exploit (both of which are required by the current VRA). In this way, customers have the tools they need to prioritize patching and to help prevent an attack, and the information released does not increase the risk of exploit. Furthermore, many vendors already use industry standards for vulnerability description: Common Vulnerability Enumeration (CVE) and Common Vulnerability Scoring System (CVSS). CVE helps ensure that customers know which particular issues a patch addresses and CVSS helps customers determine how severe a vulnerability is on a relative scale. Industry already provides the tools customers need to know what the patch contains and how bad the problem is that the patch remediates. So, what’s a poor vendor to do? Oracle is reaching out to other vendors subject to PCI and attempting to enlist then in a broad effort to engage PCI in rethinking (that is, eradicating) these requirements. I would therefore urge all who care about this issue, but especially those in the vendor community whose applications are subject to PCI and who may not have know they were being asked to tell-all to PCI and put their customers at risk, to do one of the following: • Contact PCI with your concerns• Contact Oracle (we are looking for vendors to sign our statement of concern)• And make sure you tell your customers that you have to rat them out to PCI if there is a breach involving the payment application I like to be charitable and say “PCI meant well” but in as important a public policy issue as what you disclose about vulnerabilities, to whom and when, meaning well isn’t enough. We need to do well. PCI, as regards this particular issue, has not done well, and has compounded the error by thus far being nonresponsive to those of us who have labored mightily to try to explain why they might want to rethink telling the entire planet about security problems with no solutions. By Way of Explanation… Non-related to PCI whatsoever, and the explanation for why I have not been blogging a lot recently, I have been working on Other Writing Venues with my sister Diane (who has also worked in the tech sector, inflicting upgrades on unsuspecting and largely ungrateful end users). I am pleased to note that we have recently (self-)published the first in the Miss Information Technology Murder Mystery series, Outsourcing Murder. The genre might best be described as “chick lit meets geek scene.” Our sisterly nom de plume is Maddi Davidson and (shameless plug follows): you can order the paper version of the book on Amazon, or the Kindle or Nook versions on www.amazon.com or www.bn.com, respectively. From our book jacket: Emma Jones, a 20-something IT consultant, is working on an outsourcing project at Tahiti Tacos, a restaurant chain offering Polynexican cuisine: refried poi, anyone? Emma despises her boss Padmanabh, a brilliant but arrogant partner in GD Consulting. When Emma discovers His-Royal-Padness’s body (verdict: death by cricket bat), she becomes a suspect.With her overprotective family and her best friend Stacey providing endless support and advice, Emma stumbles her way through an investigation of Padmanabh’s murder, bolstered by fusion food feeding frenzies, endless cups of frou-frou coffee and serious surfing sessions. While Stacey knows a PI who owes her a favor, landlady Magda urges Emma to tart up her underwear drawer before the next cute cop with a search warrant arrives. Emma’s mother offers to fix her up with a PhD student at Berkeley and showers her with self-defense gizmos while her old lover Keoni beckons from Hawai’i. And everyone, even Shaun the barista, knows a good lawyer. Book 2, Denial of Service, is coming out this summer. * Given the rate of change in technology, today’s “thou shalts” are easily next year’s “buggy whip guidance.”

    Read the article

  • Understanding C# async / await (1) Compilation

    - by Dixin
    Now the async / await keywords are in C#. Just like the async and ! in F#, this new C# feature provides great convenience. There are many nice documents talking about how to use async / await in specific scenarios, like using async methods in ASP.NET 4.5 and in ASP.NET MVC 4, etc. In this article we will look at the real code working behind the syntax sugar. According to MSDN: The async modifier indicates that the method, lambda expression, or anonymous method that it modifies is asynchronous. Since lambda expression / anonymous method will be compiled to normal method, we will focus on normal async method. Preparation First of all, Some helper methods need to make up. internal class HelperMethods { internal static int Method(int arg0, int arg1) { // Do some IO. WebClient client = new WebClient(); Enumerable.Repeat("http://weblogs.asp.net/dixin", 10) .Select(client.DownloadString).ToArray(); int result = arg0 + arg1; return result; } internal static Task<int> MethodTask(int arg0, int arg1) { Task<int> task = new Task<int>(() => Method(arg0, arg1)); task.Start(); // Hot task (started task) should always be returned. return task; } internal static void Before() { } internal static void Continuation1(int arg) { } internal static void Continuation2(int arg) { } } Here Method() is a long running method doing some IO. Then MethodTask() wraps it into a Task and return that Task. Nothing special here. Await something in async method Since MethodTask() returns Task, let’s try to await it: internal class AsyncMethods { internal static async Task<int> MethodAsync(int arg0, int arg1) { int result = await HelperMethods.MethodTask(arg0, arg1); return result; } } Because we used await in the method, async must be put on the method. Now we get the first async method. According to the naming convenience, it is called MethodAsync. Of course a async method can be awaited. So we have a CallMethodAsync() to call MethodAsync(): internal class AsyncMethods { internal static async Task<int> CallMethodAsync(int arg0, int arg1) { int result = await MethodAsync(arg0, arg1); return result; } } After compilation, MethodAsync() and CallMethodAsync() becomes the same logic. This is the code of MethodAsyc(): internal class CompiledAsyncMethods { [DebuggerStepThrough] [AsyncStateMachine(typeof(MethodAsyncStateMachine))] // async internal static /*async*/ Task<int> MethodAsync(int arg0, int arg1) { MethodAsyncStateMachine methodAsyncStateMachine = new MethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Builder = AsyncTaskMethodBuilder<int>.Create(), State = -1 }; methodAsyncStateMachine.Builder.Start(ref methodAsyncStateMachine); return methodAsyncStateMachine.Builder.Task; } } It just creates and starts a state machine MethodAsyncStateMachine: [CompilerGenerated] [StructLayout(LayoutKind.Auto)] internal struct MethodAsyncStateMachine : IAsyncStateMachine { public int State; public AsyncTaskMethodBuilder<int> Builder; public int Arg0; public int Arg1; public int Result; private TaskAwaiter<int> awaitor; void IAsyncStateMachine.MoveNext() { try { if (this.State != 0) { this.awaitor = HelperMethods.MethodTask(this.Arg0, this.Arg1).GetAwaiter(); if (!this.awaitor.IsCompleted) { this.State = 0; this.Builder.AwaitUnsafeOnCompleted(ref this.awaitor, ref this); return; } } else { this.State = -1; } this.Result = this.awaitor.GetResult(); } catch (Exception exception) { this.State = -2; this.Builder.SetException(exception); return; } this.State = -2; this.Builder.SetResult(this.Result); } [DebuggerHidden] void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0) { this.Builder.SetStateMachine(param0); } } The generated code has been cleaned up so it is readable and can be compiled. Several things can be observed here: The async modifier is gone, which shows, unlike other modifiers (e.g. static), there is no such IL/CLR level “async” stuff. It becomes a AsyncStateMachineAttribute. This is similar to the compilation of extension method. The generated state machine is very similar to the state machine of C# yield syntax sugar. The local variables (arg0, arg1, result) are compiled to fields of the state machine. The real code (await HelperMethods.MethodTask(arg0, arg1)) is compiled into MoveNext(): HelperMethods.MethodTask(this.Arg0, this.Arg1).GetAwaiter(). CallMethodAsync() will create and start its own state machine CallMethodAsyncStateMachine: internal class CompiledAsyncMethods { [DebuggerStepThrough] [AsyncStateMachine(typeof(CallMethodAsyncStateMachine))] // async internal static /*async*/ Task<int> CallMethodAsync(int arg0, int arg1) { CallMethodAsyncStateMachine callMethodAsyncStateMachine = new CallMethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Builder = AsyncTaskMethodBuilder<int>.Create(), State = -1 }; callMethodAsyncStateMachine.Builder.Start(ref callMethodAsyncStateMachine); return callMethodAsyncStateMachine.Builder.Task; } } CallMethodAsyncStateMachine has the same logic as MethodAsyncStateMachine above. The detail of the state machine will be discussed soon. Now it is clear that: async /await is a C# level syntax sugar. There is no difference to await a async method or a normal method. A method returning Task will be awaitable. State machine and continuation To demonstrate more details in the state machine, a more complex method is created: internal class AsyncMethods { internal static async Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { HelperMethods.Before(); int resultOfAwait1 = await MethodAsync(arg0, arg1); HelperMethods.Continuation1(resultOfAwait1); int resultOfAwait2 = await MethodAsync(arg2, arg3); HelperMethods.Continuation2(resultOfAwait2); int resultToReturn = resultOfAwait1 + resultOfAwait2; return resultToReturn; } } In this method: There are multiple awaits. There are code before the awaits, and continuation code after each await After compilation, this multi-await method becomes the same as above single-await methods: internal class CompiledAsyncMethods { [DebuggerStepThrough] [AsyncStateMachine(typeof(MultiCallMethodAsyncStateMachine))] // async internal static /*async*/ Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { MultiCallMethodAsyncStateMachine multiCallMethodAsyncStateMachine = new MultiCallMethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Arg2 = arg2, Arg3 = arg3, Builder = AsyncTaskMethodBuilder<int>.Create(), State = -1 }; multiCallMethodAsyncStateMachine.Builder.Start(ref multiCallMethodAsyncStateMachine); return multiCallMethodAsyncStateMachine.Builder.Task; } } It creates and starts one single state machine, MultiCallMethodAsyncStateMachine: [CompilerGenerated] [StructLayout(LayoutKind.Auto)] internal struct MultiCallMethodAsyncStateMachine : IAsyncStateMachine { public int State; public AsyncTaskMethodBuilder<int> Builder; public int Arg0; public int Arg1; public int Arg2; public int Arg3; public int ResultOfAwait1; public int ResultOfAwait2; public int ResultToReturn; private TaskAwaiter<int> awaiter; void IAsyncStateMachine.MoveNext() { try { switch (this.State) { case -1: HelperMethods.Before(); this.awaiter = AsyncMethods.MethodAsync(this.Arg0, this.Arg1).GetAwaiter(); if (!this.awaiter.IsCompleted) { this.State = 0; this.Builder.AwaitUnsafeOnCompleted(ref this.awaiter, ref this); } break; case 0: this.ResultOfAwait1 = this.awaiter.GetResult(); HelperMethods.Continuation1(this.ResultOfAwait1); this.awaiter = AsyncMethods.MethodAsync(this.Arg2, this.Arg3).GetAwaiter(); if (!this.awaiter.IsCompleted) { this.State = 1; this.Builder.AwaitUnsafeOnCompleted(ref this.awaiter, ref this); } break; case 1: this.ResultOfAwait2 = this.awaiter.GetResult(); HelperMethods.Continuation2(this.ResultOfAwait2); this.ResultToReturn = this.ResultOfAwait1 + this.ResultOfAwait2; this.State = -2; this.Builder.SetResult(this.ResultToReturn); break; } } catch (Exception exception) { this.State = -2; this.Builder.SetException(exception); } } [DebuggerHidden] void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine stateMachine) { this.Builder.SetStateMachine(stateMachine); } } The above code is already cleaned up, but there are still a lot of things. More clean up can be done, and the state machine can be very simple: [CompilerGenerated] [StructLayout(LayoutKind.Auto)] internal struct MultiCallMethodAsyncStateMachine : IAsyncStateMachine { // State: // -1: Begin // 0: 1st await is done // 1: 2nd await is done // ... // -2: End public int State; public TaskCompletionSource<int> ResultToReturn; // int resultToReturn ... public int Arg0; // int Arg0 public int Arg1; // int arg1 public int Arg2; // int arg2 public int Arg3; // int arg3 public int ResultOfAwait1; // int resultOfAwait1 ... public int ResultOfAwait2; // int resultOfAwait2 ... private Task<int> currentTaskToAwait; /// <summary> /// Moves the state machine to its next state. /// </summary> void IAsyncStateMachine.MoveNext() { try { switch (this.State) { // Orginal code is splitted by "case"s: // case -1: // HelperMethods.Before(); // MethodAsync(Arg0, arg1); // case 0: // int resultOfAwait1 = await ... // HelperMethods.Continuation1(resultOfAwait1); // MethodAsync(arg2, arg3); // case 1: // int resultOfAwait2 = await ... // HelperMethods.Continuation2(resultOfAwait2); // int resultToReturn = resultOfAwait1 + resultOfAwait2; // return resultToReturn; case -1: // -1 is begin. HelperMethods.Before(); // Code before 1st await. this.currentTaskToAwait = AsyncMethods.MethodAsync(this.Arg0, this.Arg1); // 1st task to await // When this.currentTaskToAwait is done, run this.MoveNext() and go to case 0. this.State = 0; IAsyncStateMachine this1 = this; // Cannot use "this" in lambda so create a local variable. this.currentTaskToAwait.ContinueWith(_ => this1.MoveNext()); // Callback break; case 0: // Now 1st await is done. this.ResultOfAwait1 = this.currentTaskToAwait.Result; // Get 1st await's result. HelperMethods.Continuation1(this.ResultOfAwait1); // Code after 1st await and before 2nd await. this.currentTaskToAwait = AsyncMethods.MethodAsync(this.Arg2, this.Arg3); // 2nd task to await // When this.currentTaskToAwait is done, run this.MoveNext() and go to case 1. this.State = 1; IAsyncStateMachine this2 = this; // Cannot use "this" in lambda so create a local variable. this.currentTaskToAwait.ContinueWith(_ => this2.MoveNext()); // Callback break; case 1: // Now 2nd await is done. this.ResultOfAwait2 = this.currentTaskToAwait.Result; // Get 2nd await's result. HelperMethods.Continuation2(this.ResultOfAwait2); // Code after 2nd await. int resultToReturn = this.ResultOfAwait1 + this.ResultOfAwait2; // Code after 2nd await. // End with resultToReturn. this.State = -2; // -2 is end. this.ResultToReturn.SetResult(resultToReturn); break; } } catch (Exception exception) { // End with exception. this.State = -2; // -2 is end. this.ResultToReturn.SetException(exception); } } /// <summary> /// Configures the state machine with a heap-allocated replica. /// </summary> /// <param name="stateMachine">The heap-allocated replica.</param> [DebuggerHidden] void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine stateMachine) { // No core logic. } } Only Task and TaskCompletionSource are involved in this version. And MultiCallMethodAsync() can be simplified to: [DebuggerStepThrough] [AsyncStateMachine(typeof(MultiCallMethodAsyncStateMachine))] // async internal static /*async*/ Task<int> MultiCallMethodAsync_(int arg0, int arg1, int arg2, int arg3) { MultiCallMethodAsyncStateMachine multiCallMethodAsyncStateMachine = new MultiCallMethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Arg2 = arg2, Arg3 = arg3, ResultToReturn = new TaskCompletionSource<int>(), // -1: Begin // 0: 1st await is done // 1: 2nd await is done // ... // -2: End State = -1 }; (multiCallMethodAsyncStateMachine as IAsyncStateMachine).MoveNext(); // Original code are in this method. return multiCallMethodAsyncStateMachine.ResultToReturn.Task; } Now the whole state machine becomes very clear - it is about callback: Original code are split into pieces by “await”s, and each piece is put into each “case” in the state machine. Here the 2 awaits split the code into 3 pieces, so there are 3 “case”s. The “piece”s are chained by callback, that is done by Builder.AwaitUnsafeOnCompleted(callback), or currentTaskToAwait.ContinueWith(callback) in the simplified code. A previous “piece” will end with a Task (which is to be awaited), when the task is done, it will callback the next “piece”. The state machine’s state works with the “case”s to ensure the code “piece”s executes one after another. Callback Since it is about callback, the simplification  can go even further – the entire state machine can be completely purged. Now MultiCallMethodAsync() becomes: internal static Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { TaskCompletionSource<int> taskCompletionSource = new TaskCompletionSource<int>(); try { // Oringinal code begins. HelperMethods.Before(); MethodAsync(arg0, arg1).ContinueWith(await1 => { int resultOfAwait1 = await1.Result; HelperMethods.Continuation1(resultOfAwait1); MethodAsync(arg2, arg3).ContinueWith(await2 => { int resultOfAwait2 = await2.Result; HelperMethods.Continuation2(resultOfAwait2); int resultToReturn = resultOfAwait1 + resultOfAwait2; // Oringinal code ends. taskCompletionSource.SetResult(resultToReturn); }); }); } catch (Exception exception) { taskCompletionSource.SetException(exception); } return taskCompletionSource.Task; } Please compare with the original async / await code: HelperMethods.Before(); int resultOfAwait1 = await MethodAsync(arg0, arg1); HelperMethods.Continuation1(resultOfAwait1); int resultOfAwait2 = await MethodAsync(arg2, arg3); HelperMethods.Continuation2(resultOfAwait2); int resultToReturn = resultOfAwait1 + resultOfAwait2; return resultToReturn; Yeah that is the magic of C# async / await: Await is literally pretending to wait. In a await expression, a Task object will be return immediately so that caller is not blocked. The continuation code is compiled as that Task’s callback code. When that task is done, continuation code will execute. Please notice that many details inside the state machine are omitted for simplicity, like context caring, etc. If you want to have a detailed picture, please do check out the source code of AsyncTaskMethodBuilder and TaskAwaiter.

    Read the article

  • C#/.NET Little Wonders: The Useful But Overlooked Sets

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>.  Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items.  In other words, the set {2,3,5} is identical to the set {3,5,2}.  In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2).  In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both.  Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set.  Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets.  Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation?  The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order.  Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for.  Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list.  If it is, great, we already have that market data feed!  If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it.  This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6:  7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4:  5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13:  14: // place the order logic! 15: } What’s wrong with this?  In short: performance!  Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion.  Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long.  Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection.  Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage.  So performing a sort after each add would probably add more time.  As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant.  The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface.  As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>.  That is, a Dictionary where the TKey and TValue refer to the same object.  This is really an oversimplification, but logically it makes sense.  I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups!  Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n)   Now, these times are amortized and represent the typical case.  In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains!  This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection.  Compare this to the List where almost any add or remove must rearrange potentially all the elements!  And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T>  is change our declaration.  Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4:  5: // ... 6:  7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22:  23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>.  Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order.  Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally.  Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes.  There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>.  Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n)   The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error.  There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity.  Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection.  Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection.  This is very good performance!  It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000   Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count.  This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search).  So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets.  Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets.  Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets.  Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets.  You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order.  That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it.  This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection.  Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value.  In addition, if you have need for the mathematical set operations, the C# sets support those as well.  The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order.  In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance.   Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

    Read the article

  • Understanding C# async / await (1) Compilation

    - by Dixin
    Now the async / await keywords are in C#. Just like the async and ! in F#, this new C# feature provides great convenience. There are many nice documents talking about how to use async / await in specific scenarios, like using async methods in ASP.NET 4.5 and in ASP.NET MVC 4, etc. In this article we will look at the real code working behind the syntax sugar. According to MSDN: The async modifier indicates that the method, lambda expression, or anonymous method that it modifies is asynchronous. Since lambda expression / anonymous method will be compiled to normal method, we will focus on normal async method. Preparation First of all, Some helper methods need to make up. internal class HelperMethods { internal static int Method(int arg0, int arg1) { // Do some IO. WebClient client = new WebClient(); Enumerable.Repeat("http://weblogs.asp.net/dixin", 10) .Select(client.DownloadString).ToArray(); int result = arg0 + arg1; return result; } internal static Task<int> MethodTask(int arg0, int arg1) { Task<int> task = new Task<int>(() => Method(arg0, arg1)); task.Start(); // Hot task (started task) should always be returned. return task; } internal static void Before() { } internal static void Continuation1(int arg) { } internal static void Continuation2(int arg) { } } Here Method() is a long running method doing some IO. Then MethodTask() wraps it into a Task and return that Task. Nothing special here. Await something in async method Since MethodTask() returns Task, let’s try to await it: internal class AsyncMethods { internal static async Task<int> MethodAsync(int arg0, int arg1) { int result = await HelperMethods.MethodTask(arg0, arg1); return result; } } Because we used await in the method, async must be put on the method. Now we get the first async method. According to the naming convenience, it is named MethodAsync. Of course a async method can be awaited. So we have a CallMethodAsync() to call MethodAsync(): internal class AsyncMethods { internal static async Task<int> CallMethodAsync(int arg0, int arg1) { int result = await MethodAsync(arg0, arg1); return result; } } After compilation, MethodAsync() and CallMethodAsync() becomes the same logic. This is the code of MethodAsyc(): internal class CompiledAsyncMethods { [DebuggerStepThrough] [AsyncStateMachine(typeof(MethodAsyncStateMachine))] // async internal static /*async*/ Task<int> MethodAsync(int arg0, int arg1) { MethodAsyncStateMachine methodAsyncStateMachine = new MethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Builder = AsyncTaskMethodBuilder<int>.Create(), State = -1 }; methodAsyncStateMachine.Builder.Start(ref methodAsyncStateMachine); return methodAsyncStateMachine.Builder.Task; } } It just creates and starts a state machine, MethodAsyncStateMachine: [CompilerGenerated] [StructLayout(LayoutKind.Auto)] internal struct MethodAsyncStateMachine : IAsyncStateMachine { public int State; public AsyncTaskMethodBuilder<int> Builder; public int Arg0; public int Arg1; public int Result; private TaskAwaiter<int> awaitor; void IAsyncStateMachine.MoveNext() { try { if (this.State != 0) { this.awaitor = HelperMethods.MethodTask(this.Arg0, this.Arg1).GetAwaiter(); if (!this.awaitor.IsCompleted) { this.State = 0; this.Builder.AwaitUnsafeOnCompleted(ref this.awaitor, ref this); return; } } else { this.State = -1; } this.Result = this.awaitor.GetResult(); } catch (Exception exception) { this.State = -2; this.Builder.SetException(exception); return; } this.State = -2; this.Builder.SetResult(this.Result); } [DebuggerHidden] void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0) { this.Builder.SetStateMachine(param0); } } The generated code has been refactored, so it is readable and can be compiled. Several things can be observed here: The async modifier is gone, which shows, unlike other modifiers (e.g. static), there is no such IL/CLR level “async” stuff. It becomes a AsyncStateMachineAttribute. This is similar to the compilation of extension method. The generated state machine is very similar to the state machine of C# yield syntax sugar. The local variables (arg0, arg1, result) are compiled to fields of the state machine. The real code (await HelperMethods.MethodTask(arg0, arg1)) is compiled into MoveNext(): HelperMethods.MethodTask(this.Arg0, this.Arg1).GetAwaiter(). CallMethodAsync() will create and start its own state machine CallMethodAsyncStateMachine: internal class CompiledAsyncMethods { [DebuggerStepThrough] [AsyncStateMachine(typeof(CallMethodAsyncStateMachine))] // async internal static /*async*/ Task<int> CallMethodAsync(int arg0, int arg1) { CallMethodAsyncStateMachine callMethodAsyncStateMachine = new CallMethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Builder = AsyncTaskMethodBuilder<int>.Create(), State = -1 }; callMethodAsyncStateMachine.Builder.Start(ref callMethodAsyncStateMachine); return callMethodAsyncStateMachine.Builder.Task; } } CallMethodAsyncStateMachine has the same logic as MethodAsyncStateMachine above. The detail of the state machine will be discussed soon. Now it is clear that: async /await is a C# language level syntax sugar. There is no difference to await a async method or a normal method. As long as a method returns Task, it is awaitable. State machine and continuation To demonstrate more details in the state machine, a more complex method is created: internal class AsyncMethods { internal static async Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { HelperMethods.Before(); int resultOfAwait1 = await MethodAsync(arg0, arg1); HelperMethods.Continuation1(resultOfAwait1); int resultOfAwait2 = await MethodAsync(arg2, arg3); HelperMethods.Continuation2(resultOfAwait2); int resultToReturn = resultOfAwait1 + resultOfAwait2; return resultToReturn; } } In this method: There are multiple awaits. There are code before the awaits, and continuation code after each await After compilation, this multi-await method becomes the same as above single-await methods: internal class CompiledAsyncMethods { [DebuggerStepThrough] [AsyncStateMachine(typeof(MultiCallMethodAsyncStateMachine))] // async internal static /*async*/ Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { MultiCallMethodAsyncStateMachine multiCallMethodAsyncStateMachine = new MultiCallMethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Arg2 = arg2, Arg3 = arg3, Builder = AsyncTaskMethodBuilder<int>.Create(), State = -1 }; multiCallMethodAsyncStateMachine.Builder.Start(ref multiCallMethodAsyncStateMachine); return multiCallMethodAsyncStateMachine.Builder.Task; } } It creates and starts one single state machine, MultiCallMethodAsyncStateMachine: [CompilerGenerated] [StructLayout(LayoutKind.Auto)] internal struct MultiCallMethodAsyncStateMachine : IAsyncStateMachine { public int State; public AsyncTaskMethodBuilder<int> Builder; public int Arg0; public int Arg1; public int Arg2; public int Arg3; public int ResultOfAwait1; public int ResultOfAwait2; public int ResultToReturn; private TaskAwaiter<int> awaiter; void IAsyncStateMachine.MoveNext() { try { switch (this.State) { case -1: HelperMethods.Before(); this.awaiter = AsyncMethods.MethodAsync(this.Arg0, this.Arg1).GetAwaiter(); if (!this.awaiter.IsCompleted) { this.State = 0; this.Builder.AwaitUnsafeOnCompleted(ref this.awaiter, ref this); } break; case 0: this.ResultOfAwait1 = this.awaiter.GetResult(); HelperMethods.Continuation1(this.ResultOfAwait1); this.awaiter = AsyncMethods.MethodAsync(this.Arg2, this.Arg3).GetAwaiter(); if (!this.awaiter.IsCompleted) { this.State = 1; this.Builder.AwaitUnsafeOnCompleted(ref this.awaiter, ref this); } break; case 1: this.ResultOfAwait2 = this.awaiter.GetResult(); HelperMethods.Continuation2(this.ResultOfAwait2); this.ResultToReturn = this.ResultOfAwait1 + this.ResultOfAwait2; this.State = -2; this.Builder.SetResult(this.ResultToReturn); break; } } catch (Exception exception) { this.State = -2; this.Builder.SetException(exception); } } [DebuggerHidden] void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine stateMachine) { this.Builder.SetStateMachine(stateMachine); } } Once again, the above state machine code is already refactored, but it still has a lot of things. More clean up can be done if we only keep the core logic, and the state machine can become very simple: [CompilerGenerated] [StructLayout(LayoutKind.Auto)] internal struct MultiCallMethodAsyncStateMachine : IAsyncStateMachine { // State: // -1: Begin // 0: 1st await is done // 1: 2nd await is done // ... // -2: End public int State; public TaskCompletionSource<int> ResultToReturn; // int resultToReturn ... public int Arg0; // int Arg0 public int Arg1; // int arg1 public int Arg2; // int arg2 public int Arg3; // int arg3 public int ResultOfAwait1; // int resultOfAwait1 ... public int ResultOfAwait2; // int resultOfAwait2 ... private Task<int> currentTaskToAwait; /// <summary> /// Moves the state machine to its next state. /// </summary> public void MoveNext() // IAsyncStateMachine member. { try { switch (this.State) { // Original code is split by "await"s into "case"s: // case -1: // HelperMethods.Before(); // MethodAsync(Arg0, arg1); // case 0: // int resultOfAwait1 = await ... // HelperMethods.Continuation1(resultOfAwait1); // MethodAsync(arg2, arg3); // case 1: // int resultOfAwait2 = await ... // HelperMethods.Continuation2(resultOfAwait2); // int resultToReturn = resultOfAwait1 + resultOfAwait2; // return resultToReturn; case -1: // -1 is begin. HelperMethods.Before(); // Code before 1st await. this.currentTaskToAwait = AsyncMethods.MethodAsync(this.Arg0, this.Arg1); // 1st task to await // When this.currentTaskToAwait is done, run this.MoveNext() and go to case 0. this.State = 0; MultiCallMethodAsyncStateMachine that1 = this; // Cannot use "this" in lambda so create a local variable. this.currentTaskToAwait.ContinueWith(_ => that1.MoveNext()); break; case 0: // Now 1st await is done. this.ResultOfAwait1 = this.currentTaskToAwait.Result; // Get 1st await's result. HelperMethods.Continuation1(this.ResultOfAwait1); // Code after 1st await and before 2nd await. this.currentTaskToAwait = AsyncMethods.MethodAsync(this.Arg2, this.Arg3); // 2nd task to await // When this.currentTaskToAwait is done, run this.MoveNext() and go to case 1. this.State = 1; MultiCallMethodAsyncStateMachine that2 = this; this.currentTaskToAwait.ContinueWith(_ => that2.MoveNext()); break; case 1: // Now 2nd await is done. this.ResultOfAwait2 = this.currentTaskToAwait.Result; // Get 2nd await's result. HelperMethods.Continuation2(this.ResultOfAwait2); // Code after 2nd await. int resultToReturn = this.ResultOfAwait1 + this.ResultOfAwait2; // Code after 2nd await. // End with resultToReturn. this.State = -2; // -2 is end. this.ResultToReturn.SetResult(resultToReturn); break; } } catch (Exception exception) { // End with exception. this.State = -2; // -2 is end. this.ResultToReturn.SetException(exception); } } /// <summary> /// Configures the state machine with a heap-allocated replica. /// </summary> /// <param name="stateMachine">The heap-allocated replica.</param> [DebuggerHidden] public void SetStateMachine(IAsyncStateMachine stateMachine) // IAsyncStateMachine member. { // No core logic. } } Only Task and TaskCompletionSource are involved in this version. And MultiCallMethodAsync() can be simplified to: [DebuggerStepThrough] [AsyncStateMachine(typeof(MultiCallMethodAsyncStateMachine))] // async internal static /*async*/ Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { MultiCallMethodAsyncStateMachine multiCallMethodAsyncStateMachine = new MultiCallMethodAsyncStateMachine() { Arg0 = arg0, Arg1 = arg1, Arg2 = arg2, Arg3 = arg3, ResultToReturn = new TaskCompletionSource<int>(), // -1: Begin // 0: 1st await is done // 1: 2nd await is done // ... // -2: End State = -1 }; multiCallMethodAsyncStateMachine.MoveNext(); // Original code are moved into this method. return multiCallMethodAsyncStateMachine.ResultToReturn.Task; } Now the whole state machine becomes very clean - it is about callback: Original code are split into pieces by “await”s, and each piece is put into each “case” in the state machine. Here the 2 awaits split the code into 3 pieces, so there are 3 “case”s. The “piece”s are chained by callback, that is done by Builder.AwaitUnsafeOnCompleted(callback), or currentTaskToAwait.ContinueWith(callback) in the simplified code. A previous “piece” will end with a Task (which is to be awaited), when the task is done, it will callback the next “piece”. The state machine’s state works with the “case”s to ensure the code “piece”s executes one after another. Callback If we focus on the point of callback, the simplification  can go even further – the entire state machine can be completely purged, and we can just keep the code inside MoveNext(). Now MultiCallMethodAsync() becomes: internal static Task<int> MultiCallMethodAsync(int arg0, int arg1, int arg2, int arg3) { TaskCompletionSource<int> taskCompletionSource = new TaskCompletionSource<int>(); try { // Oringinal code begins. HelperMethods.Before(); MethodAsync(arg0, arg1).ContinueWith(await1 => { int resultOfAwait1 = await1.Result; HelperMethods.Continuation1(resultOfAwait1); MethodAsync(arg2, arg3).ContinueWith(await2 => { int resultOfAwait2 = await2.Result; HelperMethods.Continuation2(resultOfAwait2); int resultToReturn = resultOfAwait1 + resultOfAwait2; // Oringinal code ends. taskCompletionSource.SetResult(resultToReturn); }); }); } catch (Exception exception) { taskCompletionSource.SetException(exception); } return taskCompletionSource.Task; } Please compare with the original async / await code: HelperMethods.Before(); int resultOfAwait1 = await MethodAsync(arg0, arg1); HelperMethods.Continuation1(resultOfAwait1); int resultOfAwait2 = await MethodAsync(arg2, arg3); HelperMethods.Continuation2(resultOfAwait2); int resultToReturn = resultOfAwait1 + resultOfAwait2; return resultToReturn; Yeah that is the magic of C# async / await: Await is not to wait. In a await expression, a Task object will be return immediately so that execution is not blocked. The continuation code is compiled as that Task’s callback code. When that task is done, continuation code will execute. Please notice that many details inside the state machine are omitted for simplicity, like context caring, etc. If you want to have a detailed picture, please do check out the source code of AsyncTaskMethodBuilder and TaskAwaiter.

    Read the article

  • AIX Checklist for stable obiee deployment

    - by user554629
    Common AIX configuration issues     ( last updated 27 Aug 2012 ) OBIEE is a complicated system with many moving parts and connection points.The purpose of this article is to provide a checklist to discuss OBIEE deployment with your systems administrators. The information in this article is time sensitive, and updated as I discover new  issues or details. What makes OBIEE different? When Tech Support suggests AIX component upgrades to a stable, locked-down production AIX environment, it is common to get "push back".  "Why is this necessary?  We aren't we seeing issues with other software?"It's a fair question that I have often struggled to answer; here are the talking points: OBIEE is memory intensive.  It is the entire purpose of the software to trade memory for repetitive, more expensive database requests across a network. OBIEE is implemented in C++ and is very dependent on the C++ runtime to behave correctly. OBIEE is aggressively thread efficient;  if atomic operations on a particular architecture do not work correctly, the software crashes. OBIEE dynamically loads third-party database client libraries directly into the nqsserver process.  If the library is not thread-safe, or corrupts process memory the OBIEE crash happens in an unrelated part of the code.  These are extremely difficult bugs to find. OBIEE software uses 99% common source across multiple platforms:  Windows, Linux, AIX, Solaris and HPUX.  If a crash happens on only one platform, we begin to suspect other factors.  load intensity, system differences, configuration choices, hardware failures.  It is rare to have a single product require so many diverse technical skills.   My role in support is to understand system configurations, performance issues, and crashes.   An analyst trained in Business Analytics can't be expected to know AIX internals in the depth required to make configuration choices.  Here are some guidelines. AIX C++ Runtime must be at  version 11.1.0.4$ lslpp -L | grep xlC.aixobiee software will crash if xlC.aix.rte is downlevel;  this is not a "try it" suggestion.Nov 2011 11.1.0.4 version  is appropriate for all AIX versions ( 5, 6, 7 )Download from here:https://www-304.ibm.com/support/docview.wss?uid=swg24031426 No reboot is necessary to install, it can even be installed while applications are using the current version.Restart the apps, and they will pick up the latest version. AIX 5.3 Technology Level 12 is required when running on Power5,6,7 processorsAIX 6.1 was introduced with the newer Power chips, and we have seen no issues with 6.1 or 7.1 versions.Customers with an unstable deployment, dozens of unexplained crashes, became stable after the upgrade.If your AIX system is 5.3, the minimum TL level should be at or higher than this:$ oslevel -s  5300-12-03-1107IBM typically supports only the two latest versions of AIX ( 6.1 and 7.1, for example).  AIX 5.3 is still supported and popular running in an LPAR. obiee userid limits$ ulimit -Ha  ( hard limits )$ ulimit -a   ( default limits )core file size (blocks)     unlimiteddata seg size (kbytes)      unlimitedfile size (blocks)          unlimitedmax memory size (kbytes)    unlimitedopen files                  10240 cpu time (seconds)          unlimitedvirtual memory (kbytes)     unlimitedIt is best to establish the values in /etc/security/limitsroot user is needed to observe and modify this file.If you modify a limit, you will need to relog in to change it again.  For example,$ ulimit -c 0$ ulimit -c 2097151cannot modify limit: Operation not permitted$ ulimit -c unlimited$ ulimit -c0There are only two meaningful values for ulimit -c ; zero or unlimited.Anything else is likely to produce a truncated core file that cannot be analyzed. Deploy 32-bit or 64-bit ?Early versions of OBIEE offered 32-bit or 64-bit choice to AIX customers.The 32-bit choice was needed if a database vendor did not supply a 64-bit client library.That's no longer an issue and beginning with OBIEE 11, 32-bit code is no longer shipped.A common error that leads to "out of memory" conditions to to accept the 32-bit memory configuration choices on 64-bit deployments.  The significant configuration choices are: Maximum process data (heap) size is in an AIX environment variableLDR_CNTRL=IGNOREUNLOAD@LOADPUBLIC@PREREAD_SHLIB@MAXDATA=0x... Two thread stack sizes are made in obiee NQSConfig.INI[ SERVER ]SERVER_THREAD_STACK_SIZE = 0;DB_GATEWAY_THREAD_STACK_SIZE = 0; Sort memory in NQSConfig.INI[ GENERAL ]SORT_MEMORY_SIZE = 4 MB ;SORT_BUFFER_INCREMENT_SIZE = 256 KB ; Choosing a value for MAXDATA:0x080000000  2GB Default maximum 32-bit heap size ( 8 with 7 zeros )0x100000000  4GB 64-bit breaking even with 32-bit ( 1 with 8 zeros )0x200000000  8GB 64-bit double 32-bit max0x400000000 16GB 64-bit safetyUsing 2GB heap size for a 64-bit process will almost certainly lead to an out-of-memory situation.Registers are twice as big ... consume twice as much memory in the heap.Upgrading to a 4GB heap for a 64-bit process is just "breaking even" with 32-bit.A 32-bit process is constrained by the 32-bit virtual addressing limits.  Heap memory is used for dynamic requirements of obiee software, thread stacks for each of the configured threads, and sometimes for shared libraries. 64-bit processes are not constrained in this way;  extra heap space can be configured for safety against a query that might create a sudden requirement for excessive storage.  If the storage is not available, this query might crash the whole server and disrupt existing users.There is no performance penalty on AIX for configuring more memory than required;  extra memory can be configured for safety.  If there are no other considerations, start with 8GB.Choosing a value for Thread Stack size:zero is the value documented to select an appropriate default for thread stack size.  My preference is to change this to an absolute value, even if you intend to use the documented default;  it provides better documentation and removes the "surprise" factor.There are two thread types that can be configured. GATEWAY is used by a thread pool to call a database client library to establish a DB connection.The default size is 256KB;  many customers raise this to 512KB ( no performance penalty for over-configuring ). This value must be set to 1 MB if Teradata connections are used. SERVER threads are used to run queries.  OBIEE uses recursive algorithms during the analysis of query structures which can consume significant thread stack storage.  It's difficult to provide guidance on a value that depends on data and complexity.  The general notion is to provide more space than you think you need,  "double down" and increase the value if you run out, otherwise inspect the query to understand why it is too complex for the thread stack.  There are protections built into the software to abort a single user query that is too complex, but the algorithms don't cover all situations.256 KB  The default 32-bit stack size.  Many customers increased this to 512KB on 32-bit.  A 64-bit server is very likely to crash with this value;  the stack contains mostly register values, which are twice as big.512 KB  The documented 64-bit default.  Some early releases of obiee didn't set this correctly, resulting in 256KB stacks.1 MB  The recommended 64-bit setting.  If your system only ever uses 512KB of stack space, there is no performance penalty for using 1MB stack size.2 MB  Many large customers use this value for safety.  No performance penalty.nqscheduler does not use the NQSConfig.INI file to set thread stack size.If this process crashes because the thread stack is too small, use this to set 2MB:export OBI_BACKGROUND_STACK_SIZE=2048 Shared libraries are not (shared) When application libraries are loaded at run-time, AIX makes a decision on whether to load the libraries in a "public" memory segment.  If the filesystem library permissions do not have the "Read-Other" permission bit, AIX loads the library into private process memory with two significant side-effects:* The libraries reduce the heap storage available.      Might be significant in 32-bit processes;  irrelevant in 64-bit processes.* Library code is loaded into multiple real pages for execution;  one copy for each process.Multiple execution images is a significant issue for both 32- and 64-bit processes.The "real memory pages" saved by using public memory segments is a minor concern.  Today's machines typically have plenty of real memory.The real problem with private copies of libraries is that they consume processor cache blocks, which are limited.   The same library instructions executing in different real pages will cause memory delays as the i-cache ( instruction cache 128KB blocks) are refreshed from real memory.   Performance loss because instructions are delayed is something that is difficult to measure without access to low-level cache fault data.   The machine just appears to be running slowly for no observable reason.This is an easy problem to detect, and an easy problem to correct.Detection:  "genld -l" AIX command produces a list of the libraries used by each process and the AIX memory address where they are loaded.32-bit public segment is 13 ( "dxxxxxxx" ).   private segments are 2-a.64-bit public segment is 9 ( "9xxxxxxxxxxxxxxx") ; private segment is 8.genld -l | grep -v ' d| 9' | sort +2provides a list of privately loaded libraries. Repair: chmod o+r <libname>AIX shared libraries will have a suffix of ".so" or ".a".Another technique is to change all libraries in a selected directory to repair those that might not be currently loaded.   The usual directories that need repair are obiee code, httpd code and plugins, database client libraries and java.chmod o+r /shr/dir/*.a /shr/dir/*.so Configure your system for diagnosticsProduction systems shouldn't crash, and yet bad things happen to good software.If obiee software crashes and produces a core, you should configure your system for reliable transfer of the failing conditions to Oracle Tech Support.  Here's what we need to be able to diagnose a core file from your system.* fullcore enabled. chdev -lsys0 -a fullcore=true* core naming enabled. chcore -n on -d* ulimit must not truncate core. see item 3.* pstack.sh is used to capture core documentation.* obidoc is used to capture current AIX configuration.* snapcore  AIX utility captures core and libraries. Use the proper syntax. $ snapcore -r corename executable-fullpath   /tmp/snapcore will contain the .pax.Z output file.  It is compressed.* If cores are directed to a common directory, ensure obiee userid can write to the directory.  ( chcore -p /cores -d ; chmod 777 /cores )The filesystem must have sufficient space to hold a crashing obiee application.Use:  df -k  Check the "Free" column ( not "% Used" )  8388608 is 8GB. Disable Oracle Client Library signal handlingThe Oracle DB Client Library is frequently distributed with the sqlplus development kit.By default, the library enables a signal handler, which will document a call stack if the application crashes.   The signal handler is not needed, and definitely disruptive to obiee diagnostics.   It needs to be disabled.   sqlnet.ora is typically located at:   $ORACLE_HOME/network/admin/sqlnet.oraAdd this line at the top of the file:   DIAG_SIGHANDLER_ENABLED=FALSE Disable async query in the RPD connection pool.This might be an obiee 10.1.3.4 issue only ( still checking  )."async query" must be disabled in the connection pools.It was designed to enable query cancellation to a database, and turned out to have too many edge conditions in normal communication that produced random corruption of data and crashes.  Please ensure it is turned off in the RPD. Check AIX error report (errpt).Errors external to obiee applications can trigger crashes.  $ /bin/errpt -aHardware errors ( firmware, adapters, disks ) should be reported to IBM support.All application core files are recorded by AIX;  the most recent ones are listed first. Reserved for something important to say.

    Read the article

  • The Changing Face of PASS

    - by Bill Graziano
    I’m starting my sixth year on the PASS Board.  I served two years as the Program Director, two years as the Vice-President of Marketing and I’m starting my second year as the Executive Vice-President of Finance.  There’s a pretty good chance that if PASS has done something you don’t like or is doing something you don’t like, that I’m involved in one way or another. Andy Leonard asked in a comment on his blog if the Board had ever reversed itself based on community input.  He asserted that it hadn’t.  I disagree.  I’m not going to try and list all the changes we make inside portfolios based on feedback from and meetings with the community.  I’m going to focus on major governance issues since I was elected to the Board. Management Company The first big change was our management company.  Our old management company had a standard approach to running a non-profit.  It worked well when PASS was launched.  Having a ready-made structure and process to run the organization enabled the organization to grow quickly.  As time went on we were limited in some of the things we wanted to do.  The more involved you were with PASS, the more you saw these limitations.  Key volunteers were regularly providing feedback that they wanted certain changes that were difficult for us to accomplish.  The Board at that time wanted changes that were difficult or impossible to accomplish under that structure. This was not a simple change.  Imagine a $2.5 million dollar company letting all its employees go on a Friday and starting with a new staff on Monday.  We also had a very narrow window to accomplish that so that we wouldn’t affect the Summit – our only source of revenue.  We spent the year after the change rebuilding processes and putting on the Summit in Denver.  That’s a concrete example of a huge change that PASS made to better serve its members.  And it was a change that many in the community were telling us we needed to make. Financials We heard regularly from our members that they wanted our financials posted.  Today on our web site you can find audited financials going back to 2004.  We publish our budget at the start of each year.  If you ask a question about the financials on the PASS site I do my best to answer it.  I’m also trying to do a better job answering financial questions posted in other locations.  (And yes, I know I owe a few of you some blog posts.) That’s another concrete example of a change that our members asked for that the Board agreed was a good decision. Minutes When I started on the Board the meeting minutes were very limited.  The minutes from a two day Board meeting might fit on one page.  I think we did the bare minimum we were legally required to do.  Today Board meeting minutes run from 5 to 12 pages and go into incredible detail on what we talk about.  There are certain topics that are NDA but where possible we try to list the topic we discussed but that the actual discussion was under NDA.  We also publish the agenda of Board meetings ahead of time. This is another specific example where input from the community influenced the decision.  It was certainly easier to have limited minutes but I think the extra effort helps our members understand what’s going on. Board Q&A At the 2009 Summit the Board held its first public Q&A with our members.  We’d always been available individually to answer questions.  There’s a benefit to getting us all in one room and asking the really hard questions to watch us squirm.  We learn what questions we don’t have good answers for.  We get to see how many people in the crowd look interested in the various questions and answers. I don’t recall the genesis of how this came about.  I’m fairly certain there was some community pressure though. Board Votes Until last November, the Board only reported the vote totals and not how individual Board members voted.  That was one of the topics at a great lunch I had with Tim Mitchell and Kendal van Dyke at the Summit.  That was also the topic of the first question asked at the Board Q&A by Kendal.  Kendal expressed his opposition to to anonymous votes clearly and passionately and without trying to paint anyone into a corner.  Less than 24 hours later the PASS Board voted to make individual votes public unless the topic was under NDA.  That’s another area where the Board decided to change based on feedback from our members. Summit Location While this isn’t actually a governance issue it is one of the more public decisions we make that has taken some public criticism.  There is a significant portion of our members that want the Summit near them.  There is a significant portion of our members that like the Summit in Seattle.  There is a significant portion of our members that think it should move around the country.  I was one that felt strongly that there were significant, tangible benefits to our attendees to being in Seattle every year.  I’m also one that has been swayed by some very compelling arguments that we need to have at least one outside Seattle and then revisit the decision.  I can’t tell you how the Board will vote but I know the opinion of our members weighs heavily on the decision. Elections And that brings us to the grand-daddy of all governance issues.  My thesis for this blog post is that the PASS Board has implemented policy changes in response to member feedback.  It isn’t to defend or criticize our election process.  It’s just to say that is has been under going continuous change since I’ve been on the Board.  I ran for the Board in the fall of 2005.  I don’t know much about what happened before then.  I was actively volunteering for PASS for four years prior to that as a chapter leader and on the program committee.  I don’t recall any complaints about elections but that doesn’t mean they didn’t occur.  The questions from the Nominating Committee (NomCom) were trivial and the selection process rudimentary (For example, “Tell us about your accomplishments”).  I don’t even remember who I ran against or how many other people ran.  I ran for the VP of Marketing in the fall of 2007.  I don’t recall any significant changes the Board made in the election process for that election.  I think a lot of the changes in 2007 came from us asking the management company to work on the election process.  I was expecting a similar set of puff ball questions from my previous election.  Boy, was I in for a shock.  The NomCom had found a much better set of questions and really made the interview portion difficult.  The questions were much more behavioral in nature.  I’d already written about my vision for PASS and my goals.  They wanted to know how I handled adversity, how I handled criticism, how I handled conflict, how I handled troublesome volunteers, how I motivated people and how I responded to motivation. And many, many other things. They grilled me for over an hour.  I’ve done a fair bit of technical sales in my time.  I feel I speak well under pressure addressing pointed questions.  This interview intentionally put me under pressure.  In addition to wanting to know about my interpersonal skills, my work experience, my volunteer experience and my supervisory experience they wanted to see how I’d do under pressure.  They wanted to see who would respond under pressure and who wouldn’t.  It was a bit of a shock. That was the first big change I remember in the election process.  I know there were other improvements around the process but none of them stick in my mind quite like the unexpected hour-long grilling. The next big change I remember was after the 2009 elections.  Andy Warren was unhappy with the election process and wanted to make some changes.  He worked with Hannes at HQ and they came up with a better set of processes.  I think Andy moved PASS in the right direction.  Nonetheless, after the 2010 election even more people were very publicly clamoring for changes to our election process.  In August of 2010 we had a choice to make.  There were numerous bloggers criticizing the Board and our upcoming election.  The easy change would be to announce that we were changing the process in a way that would satisfy our critics.  I believe that a knee-jerk response to criticism is seldom correct. Instead the Board spent August and September and October and November listening to the community.  I visited two SQLSaturdays and asked questions of everyone I could.  I attended chapter meetings and asked questions of as many people as they’d let me.  At Summit I made it a point to introduce myself to strangers and ask them about the election.  At every breakfast I’d sit down at a table full of strangers and ask about the election.  I’m happy to say that I left most tables arguing about the election.  Most days I managed to get 2 or 3 breakfasts in. I spent less time talking to people that had already written about the election.  They were already expressing their opinion.  I wanted to talk to people that hadn’t spoken up.  I wanted to know what the silent majority thought.  The Board all attended the Q&A session where our members expressed their concerns about a variety of issues including the election. The PASS Board also chose to create the Election Review Committee.  We wanted people from the community that had been involved with PASS to look at our election process with fresh eyes while listening to what the community had to say and give us some advice on how we could improve the process.  I’m a part of this as is Andy Warren.  None of the other members are on the Board.  I’ve sat in numerous calls and interviews with this group and attended an open meeting at the Summit.  We asked anyone that wanted to discuss the election to come speak with us.  The ERC held an open meeting at the Summit and invited anyone to attend.  There are forums on the ERC web site where we’ve invited people to participate.  The ERC has reached to key people involved in recent elections.  The years that I haven’t mentioned also saw minor improvements in the election process.  Off the top of my head I don’t recall what exact changes were made each year.  Specifically since the 2010 election we’ve gone out of our way to seek input from the community about the process.  I’m not sure what more we could have done to invite feedback from the community. I think to say that we haven’t “fixed” the election process isn’t a fair criticism at this time.  We haven’t rushed any changes through the process.  If you don’t see any changes in our election process in July or August then I think it’s fair to criticize us for ignoring the community or ask for an explanation for what we’ve done. In Summary Andy’s main point was that the PASS Board hasn’t changed in response to our members wishes.  I think I’ve shown that time and time again the PASS Board has changed in response to what our members want.  There are only two outstanding issues: Summit location and elections.  The 2013 Summit location hasn’t been decided yet.  Our work on the elections is also in progress.  And at every step in the election review we’ve gone out of our way to listen to the community and incorporate their feedback on the process. I also hope I’m not encouraging everyone that wants some change in the organization to organize a “blog rush” against the Board.  We take public suggestions very seriously but we also take the time to evaluate those suggestions and learn what the rest of our members think and make a measured decision.

    Read the article

  • Down Tools Week Cometh: Kissing Goodbye to CVs/Resumes and Cover Letters

    - by Bart Read
    I haven't blogged about what I'm doing in my (not so new) temporary role as Red Gate's technical recruiter, mostly because it's been routine, business as usual stuff, and because I've been trying to understand the role by doing it. I think now though the time has come to get a little more radical, so I'm going to tell you why I want to largely eliminate CVs/resumes and cover letters from the application process for some of our technical roles, and why I think that might be a good thing for candidates (and for us). I have a terrible confession to make, or at least it's a terrible confession for a recruiter: I don't really like CV sifting, or reading cover letters, and, unless I've misread the mood around here, neither does anybody else. It's dull, it's time-consuming, and it's somewhat soul destroying because, when all is said and done, you're being paid to be incredibly judgemental about people based on relatively little information. I feel like I've dirtied myself by saying that - I mean, after all, it's a core part of my job - but it sucks, it really does. (And, of course, the truth is I'm still a software engineer at heart, and I'm always looking for ways to do things better.) On the flip side, I've never met anyone who likes writing their CV. It takes hours and hours of faffing around and massaging it into shape, and the whole process is beset by a gnawing anxiety, frustration, and insecurity. All you really want is a chance to demonstrate your skills - not just talk about them - and how do you do that in a CV or cover letter? Often the best candidates will include samples of their work (a portfolio, screenshots, links to websites, product downloads, etc.), but sometimes this isn't possible, or may not be appropriate, or you just don't think you're allowed because of what your school/university careers service has told you (more commonly an issue with grads, obviously). And what are we actually trying to find out about people with all of this? I think the common criteria are actually pretty basic: Smart Gets things done (thanks for these two Joel) Not an a55hole* (sorry, have to get around Simple Talk's swear filter - and thanks to Professor Robert I. Sutton for this one) *Of course, everyone has off days, and I don't honestly think we're too worried about somebody being a bit grumpy every now and again. We can do a bit better than this in the context of the roles I'm talking about: we can be more specific about what "gets things done" means, at least in part. For software engineers and interns, the non-exhaustive meaning of "gets things done" is: Excellent coder For test engineers, the non-exhaustive meaning of "gets things done" is: Good at finding problems in software Competent coder Team player, etc., to me, are covered by "not an a55hole". I don't expect people to be the life and soul of the party, or a wild extrovert - that's not what team player means, and it's not what "not an a55hole" means. Some of our best technical staff are quiet, introverted types, but they're still pleasant to work with. My problem is that I don't think the initial sift really helps us find out whether people are smart and get things done with any great efficacy. It's better than nothing, for sure, but it's not as good as it could be. It's also contentious, and potentially unfair/inequitable - if you want to get an idea of what I mean by this, check out the background information section at the bottom. Before I go any further, let's look at the Red Gate recruitment process for technical staff* as it stands now: (LOTS of) People apply for jobs. All these applications go through a brutal process of manual sifting, which eliminates between 75 and 90% of them, depending upon the role, and the time of year**. Depending upon the role, those who pass the sift will be sent an assessment or telescreened. For the purposes of this blog post I'm only interested in those that are sent some sort of programming assessment, or bug hunt. This means software engineers, test engineers, and software interns, which are the roles for which I receive the most applications. The telescreen tends to be reserved for project or product managers. Those that pass the assessment are invited in for first interview. This interview is mostly about assessing their technical skills***, although we're obviously on the look out for cultural fit red flags as well. If the first interview goes well we'll invite candidates back for a second interview. This is where team/cultural fit is really scoped out. We also use this interview to dive more deeply into certain areas of their skillset, and explore any concerns that may have come out of the first interview (these obviously won't have been serious or obvious enough to cause a rejection at that point, but are things we do need to look into before we'd consider making an offer). We might subsequently invite them in for lunch before we make them an offer. This tends to happen when we're recruiting somebody for a specific team and we'd like them to meet all the people they'll be working with directly. It's not an interview per se, but can prove pivotal if they don't gel with the team. Anyone who's made it this far will receive an offer from us. *We have a slightly quirky definition of "technical staff" as it relates to the technical recruiter role here. It includes software engineers, test engineers, software interns, user experience specialists, technical authors, project managers, product managers, and development managers, but does not include product support or information systems roles. **For example, the quality of graduate applicants overall noticeably drops as the academic year wears on, which is not to say that by now there aren't still stars in there, just that they're fewer and further between. ***Some organisations prefer to assess for team fit first, but I think assessing technical skills is a more effective initial filter - if they're the nicest person in the world, but can't cut a line of code they're not going to work out. Now, as I suggested in the title, Red Gate's Down Tools Week is upon us once again - next week in fact - and I had proposed as a project that we refactor and automate the first stage of marking our programming assessments. Marking assessments, and in fact organising the marking of them, is a somewhat time-consuming process, and we receive many assessment solutions that just don't make the cut, for whatever reason. Whilst I don't think it's possible to fully automate marking, I do think it ought to be possible to run a suite of automated tests over each candidate's solution to see whether or not it behaves correctly and, if it does, move on to a manual stage where we examine the code for structure, decomposition, style, readability, maintainability, etc. Obviously it's possible to use tools to generate potentially helpful metrics for some of these indices as well. This would obviously reduce the marking workload, and would provide candidates with quicker feedback about whether they've been successful - though I do wonder if waiting a tactful interval before sending a (nicely written) rejection might be wise. I duly scrawled out a picture of my ideal process, which looked like this: The problem is, as soon as I'd roughed it out, I realised that fundamentally it wasn't an ideal process at all, which explained the gnawing feeling of cognitive dissonance I'd been wrestling with all week, whilst I'd been trying to find time to do this. Here's what I mean. Automated assessment marking, and the associated infrastructure around that, makes it much easier for us to deal with large numbers of assessments. This means we can be much more permissive about who we send assessments out to or, in other words, we can give more candidates the opportunity to really demonstrate their skills to us. And this leads to a question: why not give everyone the opportunity to demonstrate their skills, to show that they're smart and can get things done? (Two or three of us even discussed this in the down tools week hustings earlier this week.) And isn't this a lot simpler than the alternative we'd been considering? (FYI, this was automated CV/cover letter sifting by some form of textual analysis to ideally eliminate the worst 50% or so of applications based on an analysis of the 20,000 or so historical applications we've received since 2007 - definitely not the basic keyword analysis beloved of recruitment agencies, since this would eliminate hardly anyone who was awful, but definitely would eliminate stellar Oxbridge candidates - #fail - or some nightmarishly complex Google-like system where we profile all our currently employees, only to realise that we're never going to get representative results because we don't have a statistically significant sample size in any given role - also #fail.) No, I think the new way is better. We let people self-select. We make them the masters (or mistresses) of their own destiny. We give applicants the power - we put their fate in their hands - by giving them the chance to demonstrate their skills, which is what they really want anyway, instead of requiring that they spend hours and hours creating a CV and cover letter that I'm going to evaluate for suitability, and make a value judgement about, in approximately 1 minute (give or take). It doesn't matter what university you attended, it doesn't matter if you had a bad year when you took your A-levels - here's your chance to shine, so take it and run with it. (As a side benefit, we cut the number of applications we have to sift by something like two thirds.) WIN! OK, yeah, sounds good, but will it actually work? That's an excellent question. My gut feeling is yes, and I'll justify why below (and hopefully have gone some way towards doing that above as well), but what I'm proposing here is really that we run an experiment for a period of time - probably a couple of months or so - and measure the outcomes we see: How many people apply? (Wouldn't be surprised or alarmed to see this cut by a factor of ten.) How many of them submit a good assessment? (More/less than at present?) How much overhead is there for us in dealing with these assessments compared to now? What are the success and failure rates at each interview stage compared to now? How many people are we hiring at the end of it compared to now? I think it'll work because I hypothesize that, amongst other things: It self-selects for people who really want to work at Red Gate which, at the moment, is something I have to try and assess based on their CV and cover letter - but if you're not that bothered about working here, why would you complete the assessment? Candidates who would submit a shoddy application probably won't feel motivated to do the assessment. Candidates who would demonstrate good attention to detail in their CV/cover letter will demonstrate good attention to detail in the assessment. In general, only the better candidates will complete and submit the assessment. Marking assessments is much less work so we'll be able to deal with any increase that we see (hopefully we will see). There are obviously other questions as well: Is plagiarism going to be a problem? Is there any way we can detect/discourage potential plagiarism? How do we assess candidates' education and experience? What about their ability to communicate in writing? Do we still want them to submit a CV afterwards if they pass assessment? Do we want to offer them the opportunity to tell us a bit about why they'd like the job when they submit their assessment? How does this affect our relationship with recruitment agencies we might use to hire for these roles? So, what's the objective for next week's Down Tools Week? Pretty simple really - we want to implement this process for the Graduate Software Engineer and Software Engineer positions that you can find on our website. I will be joined by a crack team of our best developers (Kevin Boyle, and new Red-Gater, Sam Blackburn), and recruiting hostess with the mostest Laura McQuillen, and hopefully a couple of others as well - if I can successfully twist more arms before Monday.* Hopefully by next Friday our experiment will be up and running, and we may have changed the way Red Gate recruits software engineers for good! Stay tuned and we'll let you know how it goes! *I'm going to play dirty by offering them beer and chocolate during meetings. Some background information: how agonising over the initial CV/cover letter sift helped lead us to bin it off entirely The other day I was agonising about the new university/good degree grade versus poor A-level results issue, and decided to canvas for other opinions to see if there was something I could do that was fairer than my current approach, which is almost always to reject. This generated quite an involved discussion on our Yammer site: I'm sure you can glean a pretty good impression of my own educational prejudices from that discussion as well, although I'm very open to changing my opinion - hopefully you've already figured that out from reading the rest of this post. Hopefully you can also trace a logical path from agonising about sifting to, "Uh, hang on, why on earth are we doing this anyway?!?" Technorati Tags: recruitment,hr,developers,testers,red gate,cv,resume,cover letter,assessment,sea change

    Read the article

  • C#/.NET Little Wonders: Interlocked CompareExchange()

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Two posts ago, I discussed the Interlocked Add(), Increment(), and Decrement() methods (here) for adding and subtracting values in a thread-safe, lightweight manner.  Then, last post I talked about the Interlocked Read() and Exchange() methods (here) for safely and efficiently reading and setting 32 or 64 bit values (or references).  This week, we’ll round out the discussion by talking about the Interlocked CompareExchange() method and how it can be put to use to exchange a value if the current value is what you expected it to be. Dirty reads can lead to bad results Many of the uses of Interlocked that we’ve explored so far have centered around either reading, setting, or adding values.  But what happens if you want to do something more complex such as setting a value based on the previous value in some manner? Perhaps you were creating an application that reads a current balance, applies a deposit, and then saves the new modified balance, where of course you’d want that to happen atomically.  If you read the balance, then go to save the new balance and between that time the previous balance has already changed, you’ll have an issue!  Think about it, if we read the current balance as $400, and we are applying a new deposit of $50.75, but meanwhile someone else deposits $200 and sets the total to $600, but then we write a total of $450.75 we’ve lost $200! Now, certainly for int and long values we can use Interlocked.Add() to handles these cases, and it works well for that.  But what if we want to work with doubles, for example?  Let’s say we wanted to add the numbers from 0 to 99,999 in parallel.  We could do this by spawning several parallel tasks to continuously add to a total: 1: double total = 0; 2:  3: Parallel.For(0, 10000, next => 4: { 5: total += next; 6: }); Were this run on one thread using a standard for loop, we’d expect an answer of 4,999,950,000 (the sum of all numbers from 0 to 99,999).  But when we run this in parallel as written above, we’ll likely get something far off.  The result of one of my runs, for example, was 1,281,880,740.  That is way off!  If this were banking software we’d be in big trouble with our clients.  So what happened?  The += operator is not atomic, it will read in the current value, add the result, then store it back into the total.  At any point in all of this another thread could read a “dirty” current total and accidentally “skip” our add.   So, to clean this up, we could use a lock to guarantee concurrency: 1: double total = 0.0; 2: object locker = new object(); 3:  4: Parallel.For(0, count, next => 5: { 6: lock (locker) 7: { 8: total += next; 9: } 10: }); Which will give us the correct result of 4,999,950,000.  One thing to note is that locking can be heavy, especially if the operation being locked over is trivial, or the life of the lock is a high percentage of the work being performed concurrently.  In the case above, the lock consumes pretty much all of the time of each parallel task – and the task being locked on is relatively trivial. Now, let me put in a disclaimer here before we go further: For most uses, lock is more than sufficient for your needs, and is often the simplest solution!    So, if lock is sufficient for most needs, why would we ever consider another solution?  The problem with locking is that it can suspend execution of your thread while it waits for the signal that the lock is free.  Moreover, if the operation being locked over is trivial, the lock can add a very high level of overhead.  This is why things like Interlocked.Increment() perform so well, instead of locking just to perform an increment, we perform the increment with an atomic, lockless method. As with all things performance related, it’s important to profile before jumping to the conclusion that you should optimize everything in your path.  If your profiling shows that locking is causing a high level of waiting in your application, then it’s time to consider lighter alternatives such as Interlocked. CompareExchange() – Exchange existing value if equal some value So let’s look at how we could use CompareExchange() to solve our problem above.  The general syntax of CompareExchange() is: T CompareExchange<T>(ref T location, T newValue, T expectedValue) If the value in location == expectedValue, then newValue is exchanged.  Either way, the value in location (before exchange) is returned. Actually, CompareExchange() is not one method, but a family of overloaded methods that can take int, long, float, double, pointers, or references.  It cannot take other value types (that is, can’t CompareExchange() two DateTime instances directly).  Also keep in mind that the version that takes any reference type (the generic overload) only checks for reference equality, it does not call any overridden Equals(). So how does this help us?  Well, we can grab the current total, and exchange the new value if total hasn’t changed.  This would look like this: 1: // grab the snapshot 2: double current = total; 3:  4: // if the total hasn’t changed since I grabbed the snapshot, then 5: // set it to the new total 6: Interlocked.CompareExchange(ref total, current + next, current); So what the code above says is: if the amount in total (1st arg) is the same as the amount in current (3rd arg), then set total to current + next (2nd arg).  This check and exchange pair is atomic (and thus thread-safe). This works if total is the same as our snapshot in current, but the problem, is what happens if they aren’t the same?  Well, we know that in either case we will get the previous value of total (before the exchange), back as a result.  Thus, we can test this against our snapshot to see if it was the value we expected: 1: // if the value returned is != current, then our snapshot must be out of date 2: // which means we didn't (and shouldn't) apply current + next 3: if (Interlocked.CompareExchange(ref total, current + next, current) != current) 4: { 5: // ooops, total was not equal to our snapshot in current, what should we do??? 6: } So what do we do if we fail?  That’s up to you and the problem you are trying to solve.  It’s possible you would decide to abort the whole transaction, or perhaps do a lightweight spin and try again.  Let’s try that: 1: double current = total; 2:  3: // make first attempt... 4: if (Interlocked.CompareExchange(ref total, current + i, current) != current) 5: { 6: // if we fail, go into a spin wait, spin, and try again until succeed 7: var spinner = new SpinWait(); 8:  9: do 10: { 11: spinner.SpinOnce(); 12: current = total; 13: } 14: while (Interlocked.CompareExchange(ref total, current + i, current) != current); 15: } 16:  This is not trivial code, but it illustrates a possible use of CompareExchange().  What we are doing is first checking to see if we succeed on the first try, and if so great!  If not, we create a SpinWait and then repeat the process of SpinOnce(), grab a fresh snapshot, and repeat until CompareExchnage() succeeds.  You may wonder why not a simple do-while here, and the reason it’s more efficient to only create the SpinWait until we absolutely know we need one, for optimal efficiency. Though not as simple (or maintainable) as a simple lock, this will perform better in many situations.  Comparing an unlocked (and wrong) version, a version using lock, and the Interlocked of the code, we get the following average times for multiple iterations of adding the sum of 100,000 numbers: 1: Unlocked money average time: 2.1 ms 2: Locked money average time: 5.1 ms 3: Interlocked money average time: 3 ms So the Interlocked.CompareExchange(), while heavier to code, came in lighter than the lock, offering a good compromise of safety and performance when we need to reduce contention. CompareExchange() - it’s not just for adding stuff… So that was one simple use of CompareExchange() in the context of adding double values -- which meant we couldn’t have used the simpler Interlocked.Add() -- but it has other uses as well. If you think about it, this really works anytime you want to create something new based on a current value without using a full lock.  For example, you could use it to create a simple lazy instantiation implementation.  In this case, we want to set the lazy instance only if the previous value was null: 1: public static class Lazy<T> where T : class, new() 2: { 3: private static T _instance; 4:  5: public static T Instance 6: { 7: get 8: { 9: // if current is null, we need to create new instance 10: if (_instance == null) 11: { 12: // attempt create, it will only set if previous was null 13: Interlocked.CompareExchange(ref _instance, new T(), (T)null); 14: } 15:  16: return _instance; 17: } 18: } 19: } So, if _instance == null, this will create a new T() and attempt to exchange it with _instance.  If _instance is not null, then it does nothing and we discard the new T() we created. This is a way to create lazy instances of a type where we are more concerned about locking overhead than creating an accidental duplicate which is not used.  In fact, the BCL implementation of Lazy<T> offers a similar thread-safety choice for Publication thread safety, where it will not guarantee only one instance was created, but it will guarantee that all readers get the same instance.  Another possible use would be in concurrent collections.  Let’s say, for example, that you are creating your own brand new super stack that uses a linked list paradigm and is “lock free”.  We could use Interlocked.CompareExchange() to be able to do a lockless Push() which could be more efficient in multi-threaded applications where several threads are pushing and popping on the stack concurrently. Yes, there are already concurrent collections in the BCL (in .NET 4.0 as part of the TPL), but it’s a fun exercise!  So let’s assume we have a node like this: 1: public sealed class Node<T> 2: { 3: // the data for this node 4: public T Data { get; set; } 5:  6: // the link to the next instance 7: internal Node<T> Next { get; set; } 8: } Then, perhaps, our stack’s Push() operation might look something like: 1: public sealed class SuperStack<T> 2: { 3: private volatile T _head; 4:  5: public void Push(T value) 6: { 7: var newNode = new Node<int> { Data = value, Next = _head }; 8:  9: if (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next) 10: { 11: var spinner = new SpinWait(); 12:  13: do 14: { 15: spinner.SpinOnce(); 16: newNode.Next = _head; 17: } 18: while (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next); 19: } 20: } 21:  22: // ... 23: } Notice a similar paradigm here as with adding our doubles before.  What we are doing is creating the new Node with the data to push, and with a Next value being the original node referenced by _head.  This will create our stack behavior (LIFO – Last In, First Out).  Now, we have to set _head to now refer to the newNode, but we must first make sure it hasn’t changed! So we check to see if _head has the same value we saved in our snapshot as newNode.Next, and if so, we set _head to newNode.  This is all done atomically, and the result is _head’s original value, as long as the original value was what we assumed it was with newNode.Next, then we are good and we set it without a lock!  If not, we SpinWait and try again. Once again, this is much lighter than locking in highly parallelized code with lots of contention.  If I compare the method above with a similar class using lock, I get the following results for pushing 100,000 items: 1: Locked SuperStack average time: 6 ms 2: Interlocked SuperStack average time: 4.5 ms So, once again, we can get more efficient than a lock, though there is the cost of added code complexity.  Fortunately for you, most of the concurrent collection you’d ever need are already created for you in the System.Collections.Concurrent (here) namespace – for more information, see my Little Wonders – The Concurent Collections Part 1 (here), Part 2 (here), and Part 3 (here). Summary We’ve seen before how the Interlocked class can be used to safely and efficiently add, increment, decrement, read, and exchange values in a multi-threaded environment.  In addition to these, Interlocked CompareExchange() can be used to perform more complex logic without the need of a lock when lock contention is a concern. The added efficiency, though, comes at the cost of more complex code.  As such, the standard lock is often sufficient for most thread-safety needs.  But if profiling indicates you spend a lot of time waiting for locks, or if you just need a lock for something simple such as an increment, decrement, read, exchange, etc., then consider using the Interlocked class’s methods to reduce wait. Technorati Tags: C#,CSharp,.NET,Little Wonders,Interlocked,CompareExchange,threading,concurrency

    Read the article

  • How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

    - by Bob Hanckel
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables.  They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL  8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use.  The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section).  Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load.  In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks.  It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts.  The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes.   The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead.  The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load.  It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same.  Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size.  For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods.  This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    Read the article

  • Parsing Concerns

    - by Jesse
    If you’ve ever written an application that accepts date and/or time inputs from an external source (a person, an uploaded file, posted XML, etc.) then you’ve no doubt had to deal with parsing some text representing a date into a data structure that a computer can understand. Similarly, you’ve probably also had to take values from those same data structure and turn them back into their original formats. Most (all?) suitably modern development platforms expose some kind of parsing and formatting functionality for turning text into dates and vice versa. In .NET, the DateTime data structure exposes ‘Parse’ and ‘ToString’ methods for this purpose. This post will focus mostly on parsing, though most of the examples and suggestions below can also be applied to the ToString method. The DateTime.Parse method is pretty permissive in the values that it will accept (though apparently not as permissive as some other languages) which makes it pretty easy to take some text provided by a user and turn it into a proper DateTime instance. Here are some examples (note that the resulting DateTime values are shown using the RFC1123 format): DateTime.Parse("3/12/2010"); //Fri, 12 Mar 2010 00:00:00 GMT DateTime.Parse("2:00 AM"); //Sat, 01 Jan 2011 02:00:00 GMT (took today's date as date portion) DateTime.Parse("5-15/2010"); //Sat, 15 May 2010 00:00:00 GMT DateTime.Parse("7/8"); //Fri, 08 Jul 2011 00:00:00 GMT DateTime.Parse("Thursday, July 1, 2010"); //Thu, 01 Jul 2010 00:00:00 GMT Dealing With Inaccuracy While the DateTime struct has the ability to store a date and time value accurate down to the millisecond, most date strings provided by a user are not going to specify values with that much precision. In each of the above examples, the Parse method was provided a partial value from which to construct a proper DateTime. This means it had to go ahead and assume what you meant and fill in the missing parts of the date and time for you. This is a good thing, especially when we’re talking about taking input from a user. We can’t expect that every person using our software to provide a year, day, month, hour, minute, second, and millisecond every time they need to express a date. That said, it’s important for developers to understand what assumptions the software might be making and plan accordingly. I think the assumptions that were made in each of the above examples were pretty reasonable, though if we dig into this method a little bit deeper we’ll find that there are a lot more assumptions being made under the covers than you might have previously known. One of the biggest assumptions that the DateTime.Parse method has to make relates to the format of the date represented by the provided string. Let’s consider this example input string: ‘10-02-15’. To some people. that might look like ‘15-Feb-2010’. To others, it might be ‘02-Oct-2015’. Like many things, it depends on where you’re from. This Is America! Most cultures around the world have adopted a “little-endian” or “big-endian” formats. (Source: Date And Time Notation By Country) In this context,  a “little-endian” date format would list the date parts with the least significant first while the “big-endian” date format would list them with the most significant first. For example, a “little-endian” date would be “day-month-year” and “big-endian” would be “year-month-day”. It’s worth nothing here that ISO 8601 defines a “big-endian” format as the international standard. While I personally prefer “big-endian” style date formats, I think both styles make sense in that they follow some logical standard with respect to ordering the date parts by their significance. Here in the United States, however, we buck that trend by using what is, in comparison, a completely nonsensical format of “month/day/year”. Almost no other country in the world uses this format. I’ve been fortunate in my life to have done some international travel, so I’ve been aware of this difference for many years, but never really thought much about it. Until recently, I had been developing software for exclusively US-based audiences and remained blissfully ignorant of the different date formats employed by other countries around the world. The web application I work on is being rolled out to users in different countries, so I was recently tasked with updating it to support different date formats. As it turns out, .NET has a great mechanism for dealing with different date formats right out of the box. Supporting date formats for different cultures is actually pretty easy once you understand this mechanism. Pulling the Curtain Back On the Parse Method Have you ever taken a look at the different flavors (read: overloads) that the DateTime.Parse method comes in? In it’s simplest form, it takes a single string parameter and returns the corresponding DateTime value (if it can divine what the date value should be). You can optionally provide two additional parameters to this method: an ‘System.IFormatProvider’ and a ‘System.Globalization.DateTimeStyles’. Both of these optional parameters have some bearing on the assumptions that get made while parsing a date, but for the purposes of this article I’m going to focus on the ‘System.IFormatProvider’ parameter. The IFormatProvider exposes a single method called ‘GetFormat’ that returns an object to be used for determining the proper format for displaying and parsing things like numbers and dates. This interface plays a big role in the globalization capabilities that are built into the .NET Framework. The cornerstone of these globalization capabilities can be found in the ‘System.Globalization.CultureInfo’ class. To put it simply, the CultureInfo class is used to encapsulate information related to things like language, writing system, and date formats for a certain culture. Support for many cultures are “baked in” to the .NET Framework and there is capacity for defining custom cultures if needed (thought I’ve never delved into that). While the details of the CultureInfo class are beyond the scope of this post, so for now let me just point out that the CultureInfo class implements the IFormatInfo interface. This means that a CultureInfo instance created for a given culture can be provided to the DateTime.Parse method in order to tell it what date formats it should expect. So what happens when you don’t provide this value? Let’s crack this method open in Reflector: When no IFormatInfo parameter is provided (i.e. we use the simple DateTime.Parse(string) overload), the ‘DateTimeFormatInfo.CurrentInfo’ is used instead. Drilling down a bit further we can see the implementation of the DateTimeFormatInfo.CurrentInfo property: From this property we can determine that, in the absence of an IFormatProvider being specified, the DateTime.Parse method will assume that the provided date should be treated as if it were in the format defined by the CultureInfo object that is attached to the current thread. The culture specified by the CultureInfo instance on the current thread can vary depending on several factors, but if you’re writing an application where a single instance might be used by people from different cultures (i.e. a web application with an international user base), it’s important to know what this value is. Having a solid strategy for setting the current thread’s culture for each incoming request in an internationally used ASP .NET application is obviously important, and might make a good topic for a future post. For now, let’s think about what the implications of not having the correct culture set on the current thread. Let’s say you’re running an ASP .NET application on a server in the United States. The server was setup by English speakers in the United States, so it’s configured for US English. It exposes a web page where users can enter order data, one piece of which is an anticipated order delivery date. Most users are in the US, and therefore enter dates in a ‘month/day/year’ format. The application is using the DateTime.Parse(string) method to turn the values provided by the user into actual DateTime instances that can be stored in the database. This all works fine, because your users and your server both think of dates in the same way. Now you need to support some users in South America, where a ‘day/month/year’ format is used. The best case scenario at this point is a user will enter March 13, 2011 as ‘25/03/2011’. This would cause the call to DateTime.Parse to blow up since that value doesn’t look like a valid date in the US English culture (Note: In all likelihood you might be using the DateTime.TryParse(string) method here instead, but that method behaves the same way with regard to date formats). “But wait a minute”, you might be saying to yourself, “I thought you said that this was the best case scenario?” This scenario would prevent users from entering orders in the system, which is bad, but it could be worse! What if the order needs to be delivered a day earlier than that, on March 12, 2011? Now the user enters ‘12/03/2011’. Now the call to DateTime.Parse sees what it thinks is a valid date, but there’s just one problem: it’s not the right date. Now this order won’t get delivered until December 3, 2011. In my opinion, that kind of data corruption is a much bigger problem than having the Parse call fail. What To Do? My order entry example is a bit contrived, but I think it serves to illustrate the potential issues with accepting date input from users. There are some approaches you can take to make this easier on you and your users: Eliminate ambiguity by using a graphical date input control. I’m personally a fan of a jQuery UI Datepicker widget. It’s pretty easy to setup, can be themed to match the look and feel of your site, and has support for multiple languages and cultures. Be sure you have a way to track the culture preference of each user in your system. For a web application this could be done using something like a cookie or session state variable. Ensure that the current user’s culture is being applied correctly to DateTime formatting and parsing code. This can be accomplished by ensuring that each request has the handling thread’s CultureInfo set properly, or by using the Format and Parse method overloads that accept an IFormatProvider instance where the provided value is a CultureInfo object constructed using the current user’s culture preference. When in doubt, favor formats that are internationally recognizable. Using the string ‘2010-03-05’ is likely to be recognized as March, 5 2011 by users from most (if not all) cultures. Favor standard date format strings over custom ones. So far we’ve only talked about turning a string into a DateTime, but most of the same “gotchas” apply when doing the opposite. Consider this code: someDateValue.ToString("MM/dd/yyyy"); This will output the same string regardless of what the current thread’s culture is set to (with the exception of some cultures that don’t use the Gregorian calendar system, but that’s another issue all together). For displaying dates to users, it would be better to do this: someDateValue.ToString("d"); This standard format string of “d” will use the “short date format” as defined by the culture attached to the current thread (or provided in the IFormatProvider instance in the proper method overload). This means that it will honor the proper month/day/year, year/month/day, or day/month/year format for the culture. Knowing Your Audience The examples and suggestions shown above can go a long way toward getting an application in shape for dealing with date inputs from users in multiple cultures. There are some instances, however, where taking approaches like these would not be appropriate. In some cases, the provider or consumer of date values that pass through your application are not people, but other applications (or other portions of your own application). For example, if your site has a page that accepts a date as a query string parameter, you’ll probably want to format that date using invariant date format. Otherwise, the same URL could end up evaluating to a different page depending on the user that is viewing it. In addition, if your application exports data for consumption by other systems, it’s best to have an agreed upon format that all systems can use and that will not vary depending upon whether or not the users of the systems on either side prefer a month/day/year or day/month/year format. I’ll look more at some approaches for dealing with these situations in a future post. If you take away one thing from this post, make it an understanding of the importance of knowing where the dates that pass through your system come from and are going to. You will likely want to vary your parsing and formatting approach depending on your audience.

    Read the article

  • Microsoft and jQuery

    - by Rick Strahl
    The jQuery JavaScript library has been steadily getting more popular and with recent developments from Microsoft, jQuery is also getting ever more exposure on the ASP.NET platform including now directly from Microsoft. jQuery is a light weight, open source DOM manipulation library for JavaScript that has changed how many developers think about JavaScript. You can download it and find more information on jQuery on www.jquery.com. For me jQuery has had a huge impact on how I develop Web applications and was probably the main reason I went from dreading to do JavaScript development to actually looking forward to implementing client side JavaScript functionality. It has also had a profound impact on my JavaScript skill level for me by seeing how the library accomplishes things (and often reviewing the terse but excellent source code). jQuery made an uncomfortable development platform (JavaScript + DOM) a joy to work on. Although jQuery is by no means the only JavaScript library out there, its ease of use, small size, huge community of plug-ins and pure usefulness has made it easily the most popular JavaScript library available today. As a long time jQuery user, I’ve been excited to see the developments from Microsoft that are bringing jQuery to more ASP.NET developers and providing more integration with jQuery for ASP.NET’s core features rather than relying on the ASP.NET AJAX library. Microsoft and jQuery – making Friends jQuery is an open source project but in the last couple of years Microsoft has really thrown its weight behind supporting this open source library as a supported component on the Microsoft platform. When I say supported I literally mean supported: Microsoft now offers actual tech support for jQuery as part of their Product Support Services (PSS) as jQuery integration has become part of several of the ASP.NET toolkits and ships in several of the default Web project templates in Visual Studio 2010. The ASP.NET MVC 3 framework (still in Beta) also uses jQuery for a variety of client side support features including client side validation and we can look forward toward more integration of client side functionality via jQuery in both MVC and WebForms in the future. In other words jQuery is becoming an optional but included component of the ASP.NET platform. PSS support means that support staff will answer jQuery related support questions as part of any support incidents related to ASP.NET which provides some piece of mind to some corporate development shops that require end to end support from Microsoft. In addition to including jQuery and supporting it, Microsoft has also been getting involved in providing development resources for extending jQuery’s functionality via plug-ins. Microsoft’s last version of the Microsoft Ajax Library – which is the successor to the native ASP.NET AJAX Library – included some really cool functionality for client templates, databinding and localization. As it turns out Microsoft has rebuilt most of that functionality using jQuery as the base API and provided jQuery plug-ins of these components. Very recently these three plug-ins were submitted and have been approved for inclusion in the official jQuery plug-in repository and been taken over by the jQuery team for further improvements and maintenance. Even more surprising: The jQuery-templates component has actually been approved for inclusion in the next major update of the jQuery core in jQuery V1.5, which means it will become a native feature that doesn’t require additional script files to be loaded. Imagine this – an open source contribution from Microsoft that has been accepted into a major open source project for a core feature improvement. Microsoft has come a long way indeed! What the Microsoft Involvement with jQuery means to you For Microsoft jQuery support is a strategic decision that affects their direction in client side development, but nothing stopped you from using jQuery in your applications prior to Microsoft’s official backing and in fact a large chunk of developers did so readily prior to Microsoft’s announcement. Official support from Microsoft brings a few benefits to developers however. jQuery support in Visual Studio 2010 means built-in support for jQuery IntelliSense, automatically added jQuery scripts in many projects types and a common base for client side functionality that actually uses what most developers are already using. If you have already been using jQuery and were worried about straying from the Microsoft line and their internal Microsoft Ajax Library – worry no more. With official support and the change in direction towards jQuery Microsoft is now following along what most in the ASP.NET community had already been doing by using jQuery, which is likely the reason for Microsoft’s shift in direction in the first place. ASP.NET AJAX and the Microsoft AJAX Library weren’t bad technology – there was tons of useful functionality buried in these libraries. However, these libraries never got off the ground, mainly because early incarnations were squarely aimed at control/component developers rather than application developers. For all the functionality that these controls provided for control developers they lacked in useful and easily usable application developer functionality that was easily accessible in day to day client side development. The result was that even though Microsoft shipped support for these tools in the box (in .NET 3.5 and 4.0), other than for the internal support in ASP.NET for things like the UpdatePanel and the ASP.NET AJAX Control Toolkit as well as some third party vendors, the Microsoft client libraries were largely ignored by the developer community opening the door for other client side solutions. Microsoft seems to be acknowledging developer choice in this case: Many more developers were going down the jQuery path rather than using the Microsoft built libraries and there seems to be little sense in continuing development of a technology that largely goes unused by the majority of developers. Kudos for Microsoft for recognizing this and gracefully changing directions. Note that even though there will be no further development in the Microsoft client libraries they will continue to be supported so if you’re using them in your applications there’s no reason to start running for the exit in a panic and start re-writing everything with jQuery. Although that might be a reasonable choice in some cases, jQuery and the Microsoft libraries work well side by side so that you can leave existing solutions untouched even as you enhance them with jQuery. The Microsoft jQuery Plug-ins – Solid Core Features One of the most interesting developments in Microsoft’s embracing of jQuery is that Microsoft has started contributing to jQuery via standard mechanism set for jQuery developers: By submitting plug-ins. Microsoft took some of the nicest new features of the unpublished Microsoft Ajax Client Library and re-wrote these components for jQuery and then submitted them as plug-ins to the jQuery plug-in repository. Accepted plug-ins get taken over by the jQuery team and that’s exactly what happened with the three plug-ins submitted by Microsoft with the templating plug-in even getting slated to be published as part of the jQuery core in the next major release (1.5). The following plug-ins are provided by Microsoft: jQuery Templates – a client side template rendering engine jQuery Data Link – a client side databinder that can synchronize changes without code jQuery Globalization – provides formatting and conversion features for dates and numbers The first two are ports of functionality that was slated for the Microsoft Ajax Library while functionality for the globalization library provides functionality that was already found in the original ASP.NET AJAX library. To me all three plug-ins address a pressing need in client side applications and provide functionality I’ve previously used in other incarnations, but with more complete implementations. Let’s take a close look at these plug-ins. jQuery Templates http://api.jquery.com/category/plugins/templates/ Client side templating is a key component for building rich JavaScript applications in the browser. Templating on the client lets you avoid from manually creating markup by creating DOM nodes and injecting them individually into the document via code. Rather you can create markup templates – similar to the way you create classic ASP server markup – and merge data into these templates to render HTML which you can then inject into the document or replace existing content with. Output from templates are rendered as a jQuery matched set and can then be easily inserted into the document as needed. Templating is key to minimize client side code and reduce repeated code for rendering logic. Instead a single template can be used in many places for updating and adding content to existing pages. Further if you build pure AJAX interfaces that rely entirely on client rendering of the initial page content, templates allow you to a use a single markup template to handle all rendering of each specific HTML section/element. I’ve used a number of different client rendering template engines with jQuery in the past including jTemplates (a PHP style templating engine) and a modified version of John Resig’s MicroTemplating engine which I built into my own set of libraries because it’s such a commonly used feature in my client side applications. jQuery templates adds a much richer templating model that allows for sub-templates and access to the data items. Like John Resig’s original Micro Template engine, the core basics of the templating engine create JavaScript code which means that templates can include JavaScript code. To give you a basic idea of how templates work imagine I have an application that downloads a set of stock quotes based on a symbol list then displays them in the document. To do this you can create an ‘item’ template that describes how each of the quotes is renderd as a template inside of the document: <script id="stockTemplate" type="text/x-jquery-tmpl"> <div id="divStockQuote" class="errordisplay" style="width: 500px;"> <div class="label">Company:</div><div><b>${Company}(${Symbol})</b></div> <div class="label">Last Price:</div><div>${LastPrice}</div> <div class="label">Net Change:</div><div> {{if NetChange > 0}} <b style="color:green" >${NetChange}</b> {{else}} <b style="color:red" >${NetChange}</b> {{/if}} </div> <div class="label">Last Update:</div><div>${LastQuoteTimeString}</div> </div> </script> The ‘template’ is little more than HTML with some markup expressions inside of it that define the template language. Notice the embedded ${} expressions which reference data from the quote objects returned from an AJAX call on the server. You can embed any JavaScript or value expression in these template expressions. There are also a number of structural commands like {{if}} and {{each}} that provide for rudimentary logic inside of your templates as well as commands ({{tmpl}} and {{wrap}}) for nesting templates. You can find more about the full set of markup expressions available in the documentation. To load up this data you can use code like the following: <script type="text/javascript"> //var Proxy = new ServiceProxy("../PageMethods/PageMethodsService.asmx/"); $(document).ready(function () { $("#btnGetQuotes").click(GetQuotes); }); function GetQuotes() { var symbols = $("#txtSymbols").val().split(","); $.ajax({ url: "../PageMethods/PageMethodsService.asmx/GetStockQuotes", data: JSON.stringify({ symbols: symbols }), // parameter map type: "POST", // data has to be POSTed contentType: "application/json", timeout: 10000, dataType: "json", success: function (result) { var quotes = result.d; var jEl = $("#stockTemplate").tmpl(quotes); $("#quoteDisplay").empty().append(jEl); }, error: function (xhr, status) { alert(status + "\r\n" + xhr.responseText); } }); }; </script> In this case an ASMX AJAX service is called to retrieve the stock quotes. The service returns an array of quote objects. The result is returned as an object with the .d property (in Microsoft service style) that returns the actual array of quotes. The template is applied with: var jEl = $("#stockTemplate").tmpl(quotes); which selects the template script tag and uses the .tmpl() function to apply the data to it. The result is a jQuery matched set of elements that can then be appended to the quote display element in the page. The template is merged against an array in this example. When the result is an array the template is automatically applied to each each array item. If you pass a single data item – like say a stock quote – the template works exactly the same way but is applied only once. Templates also have access to a $data item which provides the current data item and information about the tempalte that is currently executing. This makes it possible to keep context within the context of the template itself and also to pass context from a parent template to a child template which is very powerful. Templates can be evaluated by using the template selector and calling the .tmpl() function on the jQuery matched set as shown above or you can use the static $.tmpl() function to provide a template as a string. This allows you to dynamically create templates in code or – more likely – to load templates from the server via AJAX calls. In short there are options The above shows off some of the basics, but there’s much for functionality available in the template engine. Check the documentation link for more information and links to additional examples. The plug-in download also comes with a number of examples that demonstrate functionality. jQuery templates will become a native component in jQuery Core 1.5, so it’s definitely worthwhile checking out the engine today and get familiar with this interface. As much as I’m stoked about templating becoming part of the jQuery core because it’s such an integral part of many applications, there are also a couple shortcomings in the current incarnation: Lack of Error Handling Currently if you embed an expression that is invalid it’s simply not rendered. There’s no error rendered into the template nor do the various  template functions throw errors which leaves finding of bugs as a runtime exercise. I would like some mechanism – optional if possible – to be able to get error info of what is failing in a template when it’s rendered. No String Output Templates are always rendered into a jQuery matched set and there’s no way that I can see to directly render to a string. String output can be useful for debugging as well as opening up templating for creating non-HTML string output. Limited JavaScript Access Unlike John Resig’s original MicroTemplating Engine which was entirely based on JavaScript code generation these templates are limited to a few structured commands that can ‘execute’. There’s no code execution inside of script code which means you’re limited to calling expressions available in global objects or the data item passed in. This may or may not be a big deal depending on the complexity of your template logic. Error handling has been discussed quite a bit and it’s likely there will be some solution to that particualar issue by the time jQuery templates ship. The others are relatively minor issues but something to think about anyway. jQuery Data Link http://api.jquery.com/category/plugins/data-link/ jQuery Data Link provides the ability to do two-way data binding between input controls and an underlying object’s properties. The typical scenario is linking a textbox to a property of an object and have the object updated when the text in the textbox is changed and have the textbox change when the value in the object or the entire object changes. The plug-in also supports converter functions that can be applied to provide the conversion logic from string to some other value typically necessary for mapping things like textbox string input to say a number property and potentially applying additional formatting and calculations. In theory this sounds great, however in reality this plug-in has some serious usability issues. Using the plug-in you can do things like the following to bind data: person = { firstName: "rick", lastName: "strahl"}; $(document).ready( function() { // provide for two-way linking of inputs $("form").link(person); // bind to non-input elements explicitly $("#objFirst").link(person, { firstName: { name: "objFirst", convertBack: function (value, source, target) { $(target).text(value); } } }); $("#objLast").link(person, { lastName: { name: "objLast", convertBack: function (value, source, target) { $(target).text(value); } } }); }); This code hooks up two-way linking between a couple of textboxes on the page and the person object. The first line in the .ready() handler provides mapping of object to form field with the same field names as properties on the object. Note that .link() does NOT bind items into the textboxes when you call .link() – changes are mapped only when values change and you move out of the field. Strike one. The two following commands allow manual binding of values to specific DOM elements which is effectively a one-way bind. You specify the object and a then an explicit mapping where name is an ID in the document. The converter is required to explicitly assign the value to the element. Strike two. You can also detect changes to the underlying object and cause updates to the input elements bound. Unfortunately the syntax to do this is not very natural as you have to rely on the jQuery data object. To update an object’s properties and get change notification looks like this: function updateFirstName() { $(person).data("firstName", person.firstName + " (code updated)"); } This works fine in causing any linked fields to be updated. In the bindings above both the firstName input field and objFirst DOM element gets updated. But the syntax requires you to use a jQuery .data() call for each property change to ensure that the changes are tracked properly. Really? Sure you’re binding through multiple layers of abstraction now but how is that better than just manually assigning values? The code savings (if any) are going to be minimal. As much as I would like to have a WPF/Silverlight/Observable-like binding mechanism in client script, this plug-in doesn’t help much towards that goal in its current incarnation. While you can bind values, the ‘binder’ is too limited to be really useful. If initial values can’t be assigned from the mappings you’re going to end up duplicating work loading the data using some other mechanism. There’s no easy way to re-bind data with a different object altogether since updates trigger only through the .data members. Finally, any non-input elements have to be bound via code that’s fairly verbose and frankly may be more voluminous than what you might write by hand for manual binding and unbinding. Two way binding can be very useful but it has to be easy and most importantly natural. If it’s more work to hook up a binding than writing a couple of lines to do binding/unbinding this sort of thing helps very little in most scenarios. In talking to some of the developers the feature set for Data Link is not complete and they are still soliciting input for features and functionality. If you have ideas on how you want this feature to be more useful get involved and post your recommendations. As it stands, it looks to me like this component needs a lot of love to become useful. For this component to really provide value, bindings need to be able to be refreshed easily and work at the object level, not just the property level. It seems to me we would be much better served by a model binder object that can perform these binding/unbinding tasks in bulk rather than a tool where each link has to be mapped first. I also find the choice of creating a jQuery plug-in questionable – it seems a standalone object – albeit one that relies on the jQuery library – would provide a more intuitive interface than the current forcing of options onto a plug-in style interface. Out of the three Microsoft created components this is by far the least useful and least polished implementation at this point. jQuery Globalization http://github.com/jquery/jquery-global Globalization in JavaScript applications often gets short shrift and part of the reason for this is that natively in JavaScript there’s little support for formatting and parsing of numbers and dates. There are a number of JavaScript libraries out there that provide some support for globalization, but most are limited to a particular portion of globalization. As .NET developers we’re fairly spoiled by the richness of APIs provided in the framework and when dealing with client development one really notices the lack of these features. While you may not necessarily need to localize your application the globalization plug-in also helps with some basic tasks for non-localized applications: Dealing with formatting and parsing of dates and time values. Dates in particular are problematic in JavaScript as there are no formatters whatsoever except the .toString() method which outputs a verbose and next to useless long string. With the globalization plug-in you get a good chunk of the formatting and parsing functionality that the .NET framework provides on the server. You can write code like the following for example to format numbers and dates: var date = new Date(); var output = $.format(date, "MMM. dd, yy") + "\r\n" + $.format(date, "d") + "\r\n" + // 10/25/2010 $.format(1222.32213, "N2") + "\r\n" + $.format(1222.33, "c") + "\r\n"; alert(output); This becomes even more useful if you combine it with templates which can also include any JavaScript expressions. Assuming the globalization plug-in is loaded you can create template expressions that use the $.format function. Here’s the template I used earlier for the stock quote again with a couple of formats applied: <script id="stockTemplate" type="text/x-jquery-tmpl"> <div id="divStockQuote" class="errordisplay" style="width: 500px;"> <div class="label">Company:</div><div><b>${Company}(${Symbol})</b></div> <div class="label">Last Price:</div> <div>${$.format(LastPrice,"N2")}</div> <div class="label">Net Change:</div><div> {{if NetChange > 0}} <b style="color:green" >${NetChange}</b> {{else}} <b style="color:red" >${NetChange}</b> {{/if}} </div> <div class="label">Last Update:</div> <div>${$.format(LastQuoteTime,"MMM dd, yyyy")}</div> </div> </script> There are also parsing methods that can parse dates and numbers from strings into numbers easily: alert($.parseDate("25.10.2010")); alert($.parseInt("12.222")); // de-DE uses . for thousands separators As you can see culture specific options are taken into account when parsing. The globalization plugin provides rich support for a variety of locales: Get a list of all available cultures Query cultures for culture items (like currency symbol, separators etc.) Localized string names for all calendar related items (days of week, months) Generated off of .NET’s supported locales In short you get much of the same functionality that you already might be using in .NET on the server side. The plugin includes a huge number of locales and an Globalization.all.min.js file that contains the text defaults for each of these locales as well as small locale specific script files that define each of the locale specific settings. It’s highly recommended that you NOT use the huge globalization file that includes all locales, but rather add script references to only those languages you explicitly care about. Overall this plug-in is a welcome helper. Even if you use it with a single locale (like en-US) and do no other localization, you’ll gain solid support for number and date formatting which is a vital feature of many applications. Changes for Microsoft It’s good to see Microsoft coming out of its shell and away from the ‘not-built-here’ mentality that has been so pervasive in the past. It’s especially good to see it applied to jQuery – a technology that has stood in drastic contrast to Microsoft’s own internal efforts in terms of design, usage model and… popularity. It’s great to see that Microsoft is paying attention to what customers prefer to use and supporting the customer sentiment – even if it meant drastically changing course of policy and moving into a more open and sharing environment in the process. The additional jQuery support that has been introduced in the last two years certainly has made lives easier for many developers on the ASP.NET platform. It’s also nice to see Microsoft submitting proposals through the standard jQuery process of plug-ins and getting accepted for various very useful projects. Certainly the jQuery Templates plug-in is going to be very useful to many especially since it will be baked into the jQuery core in jQuery 1.5. I hope we see more of this type of involvement from Microsoft in the future. Kudos!© Rick Strahl, West Wind Technologies, 2005-2010Posted in jQuery  ASP.NET  

    Read the article

  • Creating STA COM compatible ASP.NET Applications

    - by Rick Strahl
    When building ASP.NET applications that interface with old school COM objects like those created with VB6 or Visual FoxPro (MTDLL), it's extremely important that the threads that are serving requests use Single Threaded Apartment Threading. STA is a COM built-in technology that allows essentially single threaded components to operate reliably in a multi-threaded environment. STA's guarantee that COM objects instantiated on a specific thread stay on that specific thread and any access to a COM object from another thread automatically marshals that thread to the STA thread. The end effect is that you can have multiple threads, but a COM object instance lives on a fixed never changing thread. ASP.NET by default uses MTA (multi-threaded apartment) threads which are truly free spinning threads that pay no heed to COM object marshaling. This is vastly more efficient than STA threading which has a bit of overhead in determining whether it's OK to run code on a given thread or whether some sort of thread/COM marshaling needs to occur. MTA COM components can be very efficient, but STA COM components in a multi-threaded environment always tend to have a fair amount of overhead. It's amazing how much COM Interop I still see today so while it seems really old school to be talking about this topic, it's actually quite apropos for me as I have many customers using legacy COM systems that need to interface with other .NET applications. In this post I'm consolidating some of the hacks I've used to integrate with various ASP.NET technologies when using STA COM Components. STA in ASP.NET Support for STA threading in the ASP.NET framework is fairly limited. Specifically only the original ASP.NET WebForms technology supports STA threading directly via its STA Page Handler implementation or what you might know as ASPCOMPAT mode. For WebForms running STA components is as easy as specifying the ASPCOMPAT attribute in the @Page tag:<%@ Page Language="C#" AspCompat="true" %> which runs the page in STA mode. Removing it runs in MTA mode. Simple. Unfortunately all other ASP.NET technologies built on top of the core ASP.NET engine do not support STA natively. So if you want to use STA COM components in MVC or with class ASMX Web Services, there's no automatic way like the ASPCOMPAT keyword available. So what happens when you run an STA COM component in an MTA application? In low volume environments - nothing much will happen. The COM objects will appear to work just fine as there are no simultaneous thread interactions and the COM component will happily run on a single thread or multiple single threads one at a time. So for testing running components in MTA environments may appear to work just fine. However as load increases and threads get re-used by ASP.NET COM objects will end up getting created on multiple different threads. This can result in crashes or hangs, or data corruption in the STA components which store their state in thread local storage on the STA thread. If threads overlap this global store can easily get corrupted which in turn causes problems. STA ensures that any COM object instance loaded always stays on the same thread it was instantiated on. What about COM+? COM+ is supposed to address the problem of STA in MTA applications by providing an abstraction with it's own thread pool manager for COM objects. It steps in to the COM instantiation pipeline and hands out COM instances from its own internally maintained STA Thread pool. This guarantees that the COM instantiation threads are STA threads if using STA components. COM+ works, but in my experience the technology is very, very slow for STA components. It adds a ton of overhead and reduces COM performance noticably in load tests in IIS. COM+ can make sense in some situations but for Web apps with STA components it falls short. In addition there's also the need to ensure that COM+ is set up and configured on the target machine and the fact that components have to be registered in COM+. COM+ also keeps components up at all times, so if a component needs to be replaced the COM+ package needs to be unloaded (same is true for IIS hosted components but it's more common to manage that). COM+ is an option for well established components, but native STA support tends to provide better performance and more consistent usability, IMHO. STA for non supporting ASP.NET Technologies As mentioned above only WebForms supports STA natively. However, by utilizing the WebForms ASP.NET Page handler internally it's actually possible to trick various other ASP.NET technologies and let them work with STA components. This is ugly but I've used each of these in various applications and I've had minimal problems making them work with FoxPro STA COM components which is about as dififcult as it gets for COM Interop in .NET. In this post I summarize several STA workarounds that enable you to use STA threading with these ASP.NET Technologies: ASMX Web Services ASP.NET MVC WCF Web Services ASP.NET Web API ASMX Web Services I start with classic ASP.NET ASMX Web Services because it's the easiest mechanism that allows for STA modification. It also clearly demonstrates how the WebForms STA Page Handler is the key technology to enable the various other solutions to create STA components. Essentially the way this works is to override the WebForms Page class and hijack it's init functionality for processing requests. Here's what this looks like for Web Services:namespace FoxProAspNet { public class WebServiceStaHandler : System.Web.UI.Page, IHttpAsyncHandler { protected override void OnInit(EventArgs e) { IHttpHandler handler = new WebServiceHandlerFactory().GetHandler( this.Context, this.Context.Request.HttpMethod, this.Context.Request.FilePath, this.Context.Request.PhysicalPath); handler.ProcessRequest(this.Context); this.Context.ApplicationInstance.CompleteRequest(); } public IAsyncResult BeginProcessRequest( HttpContext context, AsyncCallback cb, object extraData) { return this.AspCompatBeginProcessRequest(context, cb, extraData); } public void EndProcessRequest(IAsyncResult result) { this.AspCompatEndProcessRequest(result); } } public class AspCompatWebServiceStaHandlerWithSessionState : WebServiceStaHandler, IRequiresSessionState { } } This class overrides the ASP.NET WebForms Page class which has a little known AspCompatBeginProcessRequest() and AspCompatEndProcessRequest() method that is responsible for providing the WebForms ASPCOMPAT functionality. These methods handle routing requests to STA threads. Note there are two classes - one that includes session state and one that does not. If you plan on using ASP.NET Session state use the latter class, otherwise stick to the former. This maps to the EnableSessionState page setting in WebForms. This class simply hooks into this functionality by overriding the BeginProcessRequest and EndProcessRequest methods and always forcing it into the AspCompat methods. The way this works is that BeginProcessRequest() fires first to set up the threads and starts intializing the handler. As part of that process the OnInit() method is fired which is now already running on an STA thread. The code then creates an instance of the actual WebService handler factory and calls its ProcessRequest method to start executing which generates the Web Service result. Immediately after ProcessRequest the request is stopped with Application.CompletRequest() which ensures that the rest of the Page handler logic doesn't fire. This means that even though the fairly heavy Page class is overridden here, it doesn't end up executing any of its internal processing which makes this code fairly efficient. In a nutshell, we're highjacking the Page HttpHandler and forcing it to process the WebService process handler in the context of the AspCompat handler behavior. Hooking up the Handler Because the above is an HttpHandler implementation you need to hook up the custom handler and replace the standard ASMX handler. To do this you need to modify the web.config file (here for IIS 7 and IIS Express): <configuration> <system.webServer> <handlers> <remove name="WebServiceHandlerFactory-Integrated-4.0" /> <add name="Asmx STA Web Service Handler" path="*.asmx" verb="*" type="FoxProAspNet.WebServiceStaHandler" precondition="integrated"/> </handlers> </system.webServer> </configuration> (Note: The name for the WebServiceHandlerFactory-Integrated-4.0 might be slightly different depending on your server version. Check the IIS Handler configuration in the IIS Management Console for the exact name or simply remove the handler from the list there which will propagate to your web.config). For IIS 5 & 6 (Windows XP/2003) or the Visual Studio Web Server use:<configuration> <system.web> <httpHandlers> <remove path="*.asmx" verb="*" /> <add path="*.asmx" verb="*" type="FoxProAspNet.WebServiceStaHandler" /> </httpHandlers> </system.web></configuration> To test, create a new ASMX Web Service and create a method like this: [WebService(Namespace = "http://foxaspnet.org/")] [WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)] public class FoxWebService : System.Web.Services.WebService { [WebMethod] public string HelloWorld() { return "Hello World. Threading mode is: " + System.Threading.Thread.CurrentThread.GetApartmentState(); } } Run this before you put in the web.config configuration changes and you should get: Hello World. Threading mode is: MTA Then put the handler mapping into Web.config and you should see: Hello World. Threading mode is: STA And you're on your way to using STA COM components. It's a hack but it works well! I've used this with several high volume Web Service installations with various customers and it's been fast and reliable. ASP.NET MVC ASP.NET MVC has quickly become the most popular ASP.NET technology, replacing WebForms for creating HTML output. MVC is more complex to get started with, but once you understand the basic structure of how requests flow through the MVC pipeline it's easy to use and amazingly flexible in manipulating HTML requests. In addition, MVC has great support for non-HTML output sources like JSON and XML, making it an excellent choice for AJAX requests without any additional tools. Unlike WebForms ASP.NET MVC doesn't support STA threads natively and so some trickery is needed to make it work with STA threads as well. MVC gets its handler implementation through custom route handlers using ASP.NET's built in routing semantics. To work in an STA handler requires working in the Page Handler as part of the Route Handler implementation. As with the Web Service handler the first step is to create a custom HttpHandler that can instantiate an MVC request pipeline properly:public class MvcStaThreadHttpAsyncHandler : Page, IHttpAsyncHandler, IRequiresSessionState { private RequestContext _requestContext; public MvcStaThreadHttpAsyncHandler(RequestContext requestContext) { if (requestContext == null) throw new ArgumentNullException("requestContext"); _requestContext = requestContext; } public IAsyncResult BeginProcessRequest(HttpContext context, AsyncCallback cb, object extraData) { return this.AspCompatBeginProcessRequest(context, cb, extraData); } protected override void OnInit(EventArgs e) { var controllerName = _requestContext.RouteData.GetRequiredString("controller"); var controllerFactory = ControllerBuilder.Current.GetControllerFactory(); var controller = controllerFactory.CreateController(_requestContext, controllerName); if (controller == null) throw new InvalidOperationException("Could not find controller: " + controllerName); try { controller.Execute(_requestContext); } finally { controllerFactory.ReleaseController(controller); } this.Context.ApplicationInstance.CompleteRequest(); } public void EndProcessRequest(IAsyncResult result) { this.AspCompatEndProcessRequest(result); } public override void ProcessRequest(HttpContext httpContext) { throw new NotSupportedException("STAThreadRouteHandler does not support ProcessRequest called (only BeginProcessRequest)"); } } This handler code figures out which controller to load and then executes the controller. MVC internally provides the information needed to route to the appropriate method and pass the right parameters. Like the Web Service handler the logic occurs in the OnInit() and performs all the processing in that part of the request. Next, we need a RouteHandler that can actually pick up this handler. Unlike the Web Service handler where we simply registered the handler, MVC requires a RouteHandler to pick up the handler. RouteHandlers look at the URL's path and based on that decide on what handler to invoke. The route handler is pretty simple - all it does is load our custom handler: public class MvcStaThreadRouteHandler : IRouteHandler { public IHttpHandler GetHttpHandler(RequestContext requestContext) { if (requestContext == null) throw new ArgumentNullException("requestContext"); return new MvcStaThreadHttpAsyncHandler(requestContext); } } At this point you can instantiate this route handler and force STA requests to MVC by specifying a route. The following sets up the ASP.NET Default Route:Route mvcRoute = new Route("{controller}/{action}/{id}", new RouteValueDictionary( new { controller = "Home", action = "Index", id = UrlParameter.Optional }), new MvcStaThreadRouteHandler()); RouteTable.Routes.Add(mvcRoute);   To make this code a little easier to work with and mimic the behavior of the routes.MapRoute() functionality extension method that MVC provides, here is an extension method for MapMvcStaRoute(): public static class RouteCollectionExtensions { public static void MapMvcStaRoute(this RouteCollection routeTable, string name, string url, object defaults = null) { Route mvcRoute = new Route(url, new RouteValueDictionary(defaults), new MvcStaThreadRouteHandler()); RouteTable.Routes.Add(mvcRoute); } } With this the syntax to add  route becomes a little easier and matches the MapRoute() method:RouteTable.Routes.MapMvcStaRoute( name: "Default", url: "{controller}/{action}/{id}", defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional } ); The nice thing about this route handler, STA Handler and extension method is that it's fully self contained. You can put all three into a single class file and stick it into your Web app, and then simply call MapMvcStaRoute() and it just works. Easy! To see whether this works create an MVC controller like this: public class ThreadTestController : Controller { public string ThreadingMode() { return Thread.CurrentThread.GetApartmentState().ToString(); } } Try this test both with only the MapRoute() hookup in the RouteConfiguration in which case you should get MTA as the value. Then change the MapRoute() call to MapMvcStaRoute() leaving all the parameters the same and re-run the request. You now should see STA as the result. You're on your way using STA COM components reliably in ASP.NET MVC. WCF Web Services running through IIS WCF Web Services provide a more robust and wider range of services for Web Services. You can use WCF over HTTP, TCP, and Pipes, and WCF services support WS* secure services. There are many features in WCF that go way beyond what ASMX can do. But it's also a bit more complex than ASMX. As a basic rule if you need to serve straight SOAP Services over HTTP I 'd recommend sticking with the simpler ASMX services especially if COM is involved. If you need WS* support or want to serve data over non-HTTP protocols then WCF makes more sense. WCF is not my forte but I found a solution from Scott Seely on his blog that describes the progress and that seems to work well. I'm copying his code below so this STA information is all in one place and quickly explain. Scott's code basically works by creating a custom OperationBehavior which can be specified via an [STAOperation] attribute on every method. Using his attribute you end up with a class (or Interface if you separate the contract and class) that looks like this: [ServiceContract] public class WcfService { [OperationContract] public string HelloWorldMta() { return Thread.CurrentThread.GetApartmentState().ToString(); } // Make sure you use this custom STAOperationBehavior // attribute to force STA operation of service methods [STAOperationBehavior] [OperationContract] public string HelloWorldSta() { return Thread.CurrentThread.GetApartmentState().ToString(); } } Pretty straight forward. The latter method returns STA while the former returns MTA. To make STA work every method needs to be marked up. The implementation consists of the attribute and OperationInvoker implementation. Here are the two classes required to make this work from Scott's post:public class STAOperationBehaviorAttribute : Attribute, IOperationBehavior { public void AddBindingParameters(OperationDescription operationDescription, System.ServiceModel.Channels.BindingParameterCollection bindingParameters) { } public void ApplyClientBehavior(OperationDescription operationDescription, System.ServiceModel.Dispatcher.ClientOperation clientOperation) { // If this is applied on the client, well, it just doesn’t make sense. // Don’t throw in case this attribute was applied on the contract // instead of the implementation. } public void ApplyDispatchBehavior(OperationDescription operationDescription, System.ServiceModel.Dispatcher.DispatchOperation dispatchOperation) { // Change the IOperationInvoker for this operation. dispatchOperation.Invoker = new STAOperationInvoker(dispatchOperation.Invoker); } public void Validate(OperationDescription operationDescription) { if (operationDescription.SyncMethod == null) { throw new InvalidOperationException("The STAOperationBehaviorAttribute " + "only works for synchronous method invocations."); } } } public class STAOperationInvoker : IOperationInvoker { IOperationInvoker _innerInvoker; public STAOperationInvoker(IOperationInvoker invoker) { _innerInvoker = invoker; } public object[] AllocateInputs() { return _innerInvoker.AllocateInputs(); } public object Invoke(object instance, object[] inputs, out object[] outputs) { // Create a new, STA thread object[] staOutputs = null; object retval = null; Thread thread = new Thread( delegate() { retval = _innerInvoker.Invoke(instance, inputs, out staOutputs); }); thread.SetApartmentState(ApartmentState.STA); thread.Start(); thread.Join(); outputs = staOutputs; return retval; } public IAsyncResult InvokeBegin(object instance, object[] inputs, AsyncCallback callback, object state) { // We don’t handle async… throw new NotImplementedException(); } public object InvokeEnd(object instance, out object[] outputs, IAsyncResult result) { // We don’t handle async… throw new NotImplementedException(); } public bool IsSynchronous { get { return true; } } } The key in this setup is the Invoker and the Invoke method which creates a new thread and then fires the request on this new thread. Because this approach creates a new thread for every request it's not super efficient. There's a bunch of overhead involved in creating the thread and throwing it away after each thread, but it'll work for low volume requests and insure each thread runs in STA mode. If better performance is required it would be useful to create a custom thread manager that can pool a number of STA threads and hand off threads as needed rather than creating new threads on every request. If your Web Service needs are simple and you need only to serve standard SOAP 1.x requests, I would recommend sticking with ASMX services. It's easier to set up and work with and for STA component use it'll be significantly better performing since ASP.NET manages the STA thread pool for you rather than firing new threads for each request. One nice thing about Scotts code is though that it works in any WCF environment including self hosting. It has no dependency on ASP.NET or WebForms for that matter. STA - If you must STA components are a  pain in the ass and thankfully there isn't too much stuff out there anymore that requires it. But when you need it and you need to access STA functionality from .NET at least there are a few options available to make it happen. Each of these solutions is a bit hacky, but they work - I've used all of them in production with good results with FoxPro components. I hope compiling all of these in one place here makes it STA consumption a little bit easier. I feel your pain :-) Resources Download STA Handler Code Examples Scott Seely's original STA WCF OperationBehavior Article© Rick Strahl, West Wind Technologies, 2005-2012Posted in FoxPro   ASP.NET  .NET  COM   Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

    Read the article

  • Scrum in 5 Minutes

    - by Stephen.Walther
    The goal of this blog entry is to explain the basic concepts of Scrum in less than five minutes. You learn how Scrum can help a team of developers to successfully complete a complex software project. Product Backlog and the Product Owner Imagine that you are part of a team which needs to create a new website – for example, an e-commerce website. You have an overwhelming amount of work to do. You need to build (or possibly buy) a shopping cart, install an SSL certificate, create a product catalog, create a Facebook page, and at least a hundred other things that you have not thought of yet. According to Scrum, the first thing you should do is create a list. Place the highest priority items at the top of the list and the lower priority items lower in the list. For example, creating the shopping cart and buying the domain name might be high priority items and creating a Facebook page might be a lower priority item. In Scrum, this list is called the Product Backlog. How do you prioritize the items in the Product Backlog? Different stakeholders in the project might have different priorities. Gary, your division VP, thinks that it is crucial that the e-commerce site has a mobile app. Sally, your direct manager, thinks taking advantage of new HTML5 features is much more important. Multiple people are pulling you in different directions. According to Scrum, it is important that you always designate one person, and only one person, as the Product Owner. The Product Owner is the person who decides what items should be added to the Product Backlog and the priority of the items in the Product Backlog. The Product Owner could be the customer who is paying the bills, the project manager who is responsible for delivering the project, or a customer representative. The critical point is that the Product Owner must always be a single person and that single person has absolute authority over the Product Backlog. Sprints and the Sprint Backlog So now the developer team has a prioritized list of items and they can start work. The team starts implementing the first item in the Backlog — the shopping cart — and the team is making good progress. Unfortunately, however, half-way through the work of implementing the shopping cart, the Product Owner changes his mind. The Product Owner decides that it is much more important to create the product catalog before the shopping cart. With some frustration, the team switches their developmental efforts to focus on implementing the product catalog. However, part way through completing this work, once again the Product Owner changes his mind about the highest priority item. Getting work done when priorities are constantly shifting is frustrating for the developer team and it results in lower productivity. At the same time, however, the Product Owner needs to have absolute authority over the priority of the items which need to get done. Scrum solves this conflict with the concept of Sprints. In Scrum, a developer team works in Sprints. At the beginning of a Sprint the developers and the Product Owner agree on the items from the backlog which they will complete during the Sprint. This subset of items from the Product Backlog becomes the Sprint Backlog. During the Sprint, the Product Owner is not allowed to change the items in the Sprint Backlog. In other words, the Product Owner cannot shift priorities on the developer team during the Sprint. Different teams use Sprints of different lengths such as one month Sprints, two-week Sprints, and one week Sprints. For high-stress, time critical projects, teams typically choose shorter sprints such as one week sprints. For more mature projects, longer one month sprints might be more appropriate. A team can pick whatever Sprint length makes sense for them just as long as the team is consistent. You should pick a Sprint length and stick with it. Daily Scrum During a Sprint, the developer team needs to have meetings to coordinate their work on completing the items in the Sprint Backlog. For example, the team needs to discuss who is working on what and whether any blocking issues have been discovered. Developers hate meetings (well, sane developers hate meetings). Meetings take developers away from their work of actually implementing stuff as opposed to talking about implementing stuff. However, a developer team which never has meetings and never coordinates their work also has problems. For example, Fred might get stuck on a programming problem for days and never reach out for help even though Tom (who sits in the cubicle next to him) has already solved the very same problem. Or, both Ted and Fred might have started working on the same item from the Sprint Backlog at the same time. In Scrum, these conflicting needs – limiting meetings but enabling team coordination – are resolved with the idea of the Daily Scrum. The Daily Scrum is a meeting for coordinating the work of the developer team which happens once a day. To keep the meeting short, each developer answers only the following three questions: 1. What have you done since yesterday? 2. What do you plan to do today? 3. Any impediments in your way? During the Daily Scrum, developers are not allowed to talk about issues with their cat, do demos of their latest work, or tell heroic stories of programming problems overcome. The meeting must be kept short — typically about 15 minutes. Issues which come up during the Daily Scrum should be discussed in separate meetings which do not involve the whole developer team. Stories and Tasks Items in the Product or Sprint Backlog – such as building a shopping cart or creating a Facebook page – are often referred to as User Stories or Stories. The Stories are created by the Product Owner and should represent some business need. Unlike the Product Owner, the developer team needs to think about how a Story should be implemented. At the beginning of a Sprint, the developer team takes the Stories from the Sprint Backlog and breaks the stories into tasks. For example, the developer team might take the Create a Shopping Cart story and break it into the following tasks: · Enable users to add and remote items from shopping cart · Persist the shopping cart to database between visits · Redirect user to checkout page when Checkout button is clicked During the Daily Scrum, members of the developer team volunteer to complete the tasks required to implement the next Story in the Sprint Backlog. When a developer talks about what he did yesterday or plans to do tomorrow then the developer should be referring to a task. Stories are owned by the Product Owner and a story is all about business value. In contrast, the tasks are owned by the developer team and a task is all about implementation details. A story might take several days or weeks to complete. A task is something which a developer can complete in less than a day. Some teams get lazy about breaking stories into tasks. Neglecting to break stories into tasks can lead to “Never Ending Stories” If you don’t break a story into tasks, then you can’t know how much of a story has actually been completed because you don’t have a clear idea about the implementation steps required to complete the story. Scrumboard During the Daily Scrum, the developer team uses a Scrumboard to coordinate their work. A Scrumboard contains a list of the stories for the current Sprint, the tasks associated with each Story, and the state of each task. The developer team uses the Scrumboard so everyone on the team can see, at a glance, what everyone is working on. As a developer works on a task, the task moves from state to state and the state of the task is updated on the Scrumboard. Common task states are ToDo, In Progress, and Done. Some teams include additional task states such as Needs Review or Needs Testing. Some teams use a physical Scrumboard. In that case, you use index cards to represent the stories and the tasks and you tack the index cards onto a physical board. Using a physical Scrumboard has several disadvantages. A physical Scrumboard does not work well with a distributed team – for example, it is hard to share the same physical Scrumboard between Boston and Seattle. Also, generating reports from a physical Scrumboard is more difficult than generating reports from an online Scrumboard. Estimating Stories and Tasks Stakeholders in a project, the people investing in a project, need to have an idea of how a project is progressing and when the project will be completed. For example, if you are investing in creating an e-commerce site, you need to know when the site can be launched. It is not enough to just say that “the project will be done when it is done” because the stakeholders almost certainly have a limited budget to devote to the project. The people investing in the project cannot determine the business value of the project unless they can have an estimate of how long it will take to complete the project. Developers hate to give estimates. The reason that developers hate to give estimates is that the estimates are almost always completely made up. For example, you really don’t know how long it takes to build a shopping cart until you finish building a shopping cart, and at that point, the estimate is no longer useful. The problem is that writing code is much more like Finding a Cure for Cancer than Building a Brick Wall. Building a brick wall is very straightforward. After you learn how to add one brick to a wall, you understand everything that is involved in adding a brick to a wall. There is no additional research required and no surprises. If, on the other hand, I assembled a team of scientists and asked them to find a cure for cancer, and estimate exactly how long it will take, they would have no idea. The problem is that there are too many unknowns. I don’t know how to cure cancer, I need to do a lot of research here, so I cannot even begin to estimate how long it will take. So developers hate to provide estimates, but the Product Owner and other product stakeholders, have a legitimate need for estimates. Scrum resolves this conflict by using the idea of Story Points. Different teams use different units to represent Story Points. For example, some teams use shirt sizes such as Small, Medium, Large, and X-Large. Some teams prefer to use Coffee Cup sizes such as Tall, Short, and Grande. Finally, some teams like to use numbers from the Fibonacci series. These alternative units are converted into a Story Point value. Regardless of the type of unit which you use to represent Story Points, the goal is the same. Instead of attempting to estimate a Story in hours (which is doomed to failure), you use a much less fine-grained measure of work. A developer team is much more likely to be able to estimate that a Story is Small or X-Large than the exact number of hours required to complete the story. So you can think of Story Points as a compromise between the needs of the Product Owner and the developer team. When a Sprint starts, the developer team devotes more time to thinking about the Stories in a Sprint and the developer team breaks the Stories into Tasks. In Scrum, you estimate the work required to complete a Story by using Story Points and you estimate the work required to complete a task by using hours. The difference between Stories and Tasks is that you don’t create a task until you are just about ready to start working on a task. A task is something that you should be able to create within a day, so you have a much better chance of providing an accurate estimate of the work required to complete a task than a story. Burndown Charts In Scrum, you use Burndown charts to represent the remaining work on a project. You use Release Burndown charts to represent the overall remaining work for a project and you use Sprint Burndown charts to represent the overall remaining work for a particular Sprint. You create a Release Burndown chart by calculating the remaining number of uncompleted Story Points for the entire Product Backlog every day. The vertical axis represents Story Points and the horizontal axis represents time. A Sprint Burndown chart is similar to a Release Burndown chart, but it focuses on the remaining work for a particular Sprint. There are two different types of Sprint Burndown charts. You can either represent the remaining work in a Sprint with Story Points or with task hours (the following image, taken from Wikipedia, uses hours). When each Product Backlog Story is completed, the Release Burndown chart slopes down. When each Story or task is completed, the Sprint Burndown chart slopes down. Burndown charts typically do not always slope down over time. As new work is added to the Product Backlog, the Release Burndown chart slopes up. If new tasks are discovered during a Sprint, the Sprint Burndown chart will also slope up. The purpose of a Burndown chart is to give you a way to track team progress over time. If, halfway through a Sprint, the Sprint Burndown chart is still climbing a hill then you know that you are in trouble. Team Velocity Stakeholders in a project always want more work done faster. For example, the Product Owner for the e-commerce site wants the website to launch before tomorrow. Developers tend to be overly optimistic. Rarely do developers acknowledge the physical limitations of reality. So Project stakeholders and the developer team often collude to delude themselves about how much work can be done and how quickly. Too many software projects begin in a state of optimism and end in frustration as deadlines zoom by. In Scrum, this problem is overcome by calculating a number called the Team Velocity. The Team Velocity is a measure of the average number of Story Points which a team has completed in previous Sprints. Knowing the Team Velocity is important during the Sprint Planning meeting when the Product Owner and the developer team work together to determine the number of stories which can be completed in the next Sprint. If you know the Team Velocity then you can avoid committing to do more work than the team has been able to accomplish in the past, and your team is much more likely to complete all of the work required for the next Sprint. Scrum Master There are three roles in Scrum: the Product Owner, the developer team, and the Scrum Master. I’v e already discussed the Product Owner. The Product Owner is the one and only person who maintains the Product Backlog and prioritizes the stories. I’ve also described the role of the developer team. The members of the developer team do the work of implementing the stories by breaking the stories into tasks. The final role, which I have not discussed, is the role of the Scrum Master. The Scrum Master is responsible for ensuring that the team is following the Scrum process. For example, the Scrum Master is responsible for making sure that there is a Daily Scrum meeting and that everyone answers the standard three questions. The Scrum Master is also responsible for removing (non-technical) impediments which the team might encounter. For example, if the team cannot start work until everyone installs the latest version of Microsoft Visual Studio then the Scrum Master has the responsibility of working with management to get the latest version of Visual Studio as quickly as possible. The Scrum Master can be a member of the developer team. Furthermore, different people can take on the role of the Scrum Master over time. The Scrum Master, however, cannot be the same person as the Product Owner. Using SonicAgile SonicAgile (SonicAgile.com) is an online tool which you can use to manage your projects using Scrum. You can use the SonicAgile Product Backlog to create a prioritized list of stories. You can estimate the size of the Stories using different Story Point units such as Shirt Sizes and Coffee Cup sizes. You can use SonicAgile during the Sprint Planning meeting to select the Stories that you want to complete during a particular Sprint. You can configure Sprints to be any length of time. SonicAgile calculates Team Velocity automatically and displays a warning when you add too many stories to a Sprint. In other words, it warns you when it thinks you are overcommitting in a Sprint. SonicAgile also includes a Scrumboard which displays the list of Stories selected for a Sprint and the tasks associated with each story. You can drag tasks from one task state to another. Finally, SonicAgile enables you to generate Release Burndown and Sprint Burndown charts. You can use these charts to view the progress of your team. To learn more about SonicAgile, visit SonicAgile.com. Summary In this post, I described many of the basic concepts of Scrum. You learned how a Product Owner uses a Product Backlog to create a prioritized list of tasks. I explained why work is completed in Sprints so the developer team can be more productive. I also explained how a developer team uses the daily scrum to coordinate their work. You learned how the developer team uses a Scrumboard to see, at a glance, who is working on what and the state of each task. I also discussed Burndown charts. You learned how you can use both Release and Sprint Burndown charts to track team progress in completing a project. Finally, I described the crucial role of the Scrum Master – the person who is responsible for ensuring that the rules of Scrum are being followed. My goal was not to describe all of the concepts of Scrum. This post was intended to be an introductory overview. For a comprehensive explanation of Scrum, I recommend reading Ken Schwaber’s book Agile Project Management with Scrum: http://www.amazon.com/Agile-Project-Management-Microsoft-Professional/dp/073561993X/ref=la_B001H6ODMC_1_1?ie=UTF8&qid=1345224000&sr=1-1

    Read the article

  • Red Gate Coder interviews: Robin Hellen

    - by Michael Williamson
    Robin Hellen is a test engineer here at Red Gate, and is also the latest coder I’ve interviewed. We chatted about debugging code, the roles of software engineers and testers, and why Vala is currently his favourite programming language. How did you get started with programming?It started when I was about six. My dad’s a professional programmer, and he gave me and my sister one of his old computers and taught us a bit about programming. It was an old Amiga 500 with a variant of BASIC. I don’t think I ever successfully completed anything! It was just faffing around. I didn’t really get anywhere with it.But then presumably you did get somewhere with it at some point.At some point. The PC emerged as the dominant platform, and I learnt a bit of Visual Basic. I didn’t really do much, just a couple of quick hacky things. A bit of demo animation. Took me a long time to get anywhere with programming, really.When did you feel like you did start to get somewhere?I think it was when I started doing things for someone else, which was my sister’s final year of university project. She called up my dad two days before she was due to submit, saying “We need something to display a graph!”. Dad says, “I’m too busy, go talk to your brother”. So I hacked up this ugly piece of code, sent it off and they won a prize for that project. Apparently, the graph, the bit that I wrote, was the reason they won a prize! That was when I first felt that I’d actually done something that was worthwhile. That was my first real bit of code, and the ugliest code I’ve ever written. It’s basically an array of pre-drawn line elements that I shifted round the screen to draw a very spikey graph.When did you decide that programming might actually be something that you wanted to do as a career?It’s not really a decision I took, I always wanted to do something with computers. And I had to take a gap year for uni, so I was looking for twelve month internships. I applied to Red Gate, and they gave me a job as a tester. And that’s where I really started having to write code well. To a better standard that I had been up to that point.How did you find coming to Red Gate and working with other coders?I thought it was really nice. I learnt so much just from other people around. I think one of the things that’s really great is that people are just willing to help you learn. Instead of “Don’t you know that, you’re so stupid”, it’s “You can just do it this way”.If you could go back to the very start of that internship, is there something that you would tell yourself?Write shorter code. I have a tendency to write massive, many-thousand line files that I break out of right at the end. And then half-way through a project I’m doing something, I think “Where did I write that bit that does that thing?”, and it’s almost impossible to find. I wrote some horrendous code when I started. Just that principle, just keep things short. Even if looks a bit crazy to be jumping around all over the place all of the time, it’s actually a lot more understandable.And how do you hold yourself to that?Generally, if a function’s going off my screen, it’s probably too long. That’s what I tell myself, and within the team here we have code reviews, so the guys I’m with at the moment are pretty good at pulling me up on, “Doesn’t that look like it’s getting a bit long?”. It’s more just the subjective standard of readability than anything.So you’re an advocate of code review?Yes, definitely. Both to spot errors that you might have made, and to improve your knowledge. The person you’re reviewing will say “Oh, you could have done it that way”. That’s how we learn, by talking to others, and also just sharing knowledge of how your project works around the team, or even outside the team. Definitely a very firm advocate of code reviews.Do you think there’s more we could do with them?I don’t know. We’re struggling with how to add them as part of the process without it becoming too cumbersome. We’ve experimented with a few different ways, and we’ve not found anything that just works.To get more into the nitty gritty: how do you like to debug code?The first thing is to do it in my head. I’ll actually think what piece of code is likely to have caused that error, and take a quick look at it, just to see if there’s anything glaringly obvious there. The next thing I’ll probably do is throw in print statements, or throw some exceptions from various points, just to check: is it going through the code path I expect it to? A last resort is to actually debug code using a debugger.Why is the debugger the last resort?Probably because of the environments I learnt programming in. VB and early BASIC didn’t have much of a debugger, the only way to find out what your program was doing was to add print statements. Also, because a lot of the stuff I tend to work with is non-interactive, if it’s something that takes a long time to run, I can throw in the print statements, set a run off, go and do something else, and look at it again later, rather than trying to remember what happened at that point when I was debugging through it. So it also gives me the record of what happens. I hate just sitting there pressing F5, F5, continually. If you’re having to find out what your code is doing at each line, you’ve probably got a very wrong mental model of what your code’s doing, and you can find that out just as easily by inspecting a couple of values through the print statements.If I were on some codebase that you were also working on, what should I do to make it as easy as possible to understand?I’d say short and well-named methods. The one thing I like to do when I’m looking at code is to find out where a value comes from, and the more layers of indirection there are, particularly DI [dependency injection] frameworks, the harder it is to find out where something’s come from. I really hate that. I want to know if the value come from the user here or is a constant here, and if I can’t find that out, that makes code very hard to understand for me.As a tester, where do you think the split should lie between software engineers and testers?I think the split is less on areas of the code you write and more what you’re designing and creating. The developers put a structure on the code, while my major role is to say which tests we should have, whether we should test that, or it’s not worth testing that because it’s a tiny function in code that nobody’s ever actually going to see. So it’s not a split in the code, it’s a split in what you’re thinking about. Saying what code we should write, but alternatively what code we should take out.In your experience, do the software engineers tend to do much testing themselves?They tend to control the lowest layer of tests. And, depending on how the balance of people is in the team, they might write some of the higher levels of test. Or that might go to the testers. I’m the only tester on my team with three other developers, so they’ll be writing quite a lot of the actual test code, with input from me as to whether we should test that functionality, whereas on other teams, where it’s been more equal numbers, the testers have written pretty much all of the high level tests, just because that’s the best use of resource.If you could shuffle resources around however you liked, do you think that the developers should be writing those high-level tests?I think they should be writing them occasionally. It helps when they have an understanding of how testing code works and possibly what assumptions we’ve made in tests, and they can say “actually, it doesn’t work like that under the hood so you’ve missed this whole area”. It’s one of those agile things that everyone on the team should be at least comfortable doing the various jobs. So if the developers can write test code then I think that’s a very good thing.So you think testers should be able to write production code?Yes, although given most testers skills at coding, I wouldn’t advise it too much! I have written a few things, and I did make a few changes that have actually gone into our production code base. They’re not necessarily running every time but they are there. I think having that mix of skill sets is really useful. In some ways we’re using our own product to test itself, so being able to make those changes where it’s not working saves me a round-trip through the developers. It can be really annoying if the developers have no time to make a change, and I can’t touch the code.If the software engineers are consistently writing tests at all levels, what role do you think the role of a tester is?I think on a team like that, those distinctions aren’t quite so useful. There’ll be two cases. There’s either the case where the developers think they’ve written good tests, but you still need someone with a test engineer mind-set to go through the tests and validate that it’s a useful set, or the correct set for that code. Or they won’t actually be pure developers, they’ll have that mix of test ability in there.I think having slightly more distinct roles is useful. When it starts to blur, then you lose that view of the tests as a whole. The tester job is not to create tests, it’s to validate the quality of the product, and you don’t do that just by writing tests. There’s more things you’ve got to keep in your mind. And I think when you blur the roles, you start to lose that end of the tester.So because you’re working on those features, you lose that holistic view of the whole system?Yeah, and anyone who’s worked on the feature shouldn’t be testing it. You always need to have it tested it by someone who didn’t write it. Otherwise you’re a bit too close and you assume “yes, people will only use it that way”, but the tester will come along and go “how do people use this? How would our most idiotic user use this?”. I might not test that because it might be completely irrelevant. But it’s coming in and trying to have a different set of assumptions.Are you a believer that it should all be automated if possible?Not entirely. So an automated test is always better than a manual test for the long-term, but there’s still nothing that beats a human sitting in front of the application and thinking “What could I do at this point?”. The automated test is very good but they follow that strict path, and they never check anything off the path. The human tester will look at things that they weren’t expecting, whereas the automated test can only ever go “Is that value correct?” in many respects, and it won’t notice that on the other side of the screen you’re showing something completely wrong. And that value might have been checked independently, but you always find a few odd interactions when you’re going through something manually, and you always need to go through something manually to start with anyway, otherwise you won’t know where the important bits to write your automation are.When you’re doing that manual testing, do you think it’s important to do that across the entire product, or just the bits that you’ve touched recently?I think it’s important to do it mostly on the bits you’ve touched, but you can’t ignore the rest of the product. Unless you’re dealing with a very, very self-contained bit, you’re almost always encounter other bits of the product along the way. Most testers I know, even if they are looking at just one path, they’ll keep open and move around a bit anyway, just because they want to find something that’s broken. If we find that your path is right, we’ll go out and hunt something else.How do you think this fits into the idea of continuously deploying, so long as the tests pass?With deploying a website it’s a bit different because you can always pull it back. If you’re deploying an application to customers, when you’ve released it, it’s out there, you can’t pull it back. Someone’s going to keep it, no matter how hard you try there will be a few installations that stay around. So I’d always have at least a human element on that path. With websites, you could probably automate straight out, or at least straight out to an internal environment or a single server in a cloud of fifty that will serve some people. But I don’t think you should release to everyone just on automated tests passing.You’ve already mentioned using BASIC and C# — are there any other languages that you’ve used?I’ve used a few. That’s something that has changed more recently, I’ve become familiar with more languages. Before I started at Red Gate I learnt a bit of C. Then last year, I taught myself Python which I actually really enjoyed using. I’ve also come across another language called Vala, which is sort of a C#-like language. It’s basically a pre-processor for C, but it has very nice syntax. I think that’s currently my favourite language.Any particular reason for trying Vala?I have a completely Linux environment at home, and I’ve been looking for a nice language, and C# just doesn’t cut it because I won’t touch Mono. So, I was looking for something like C# but that was useable in an open source environment, and Vala’s what I found. C#’s got a few features that Vala doesn’t, and Vala’s got a few features where I think “It would be awesome if C# had that”.What are some of the features that it’s missing?Extension methods. And I think that’s the only one that really bugs me. I like to use them when I’m writing C# because it makes some things really easy, especially with libraries that you can’t touch the internals of. It doesn’t have method overloading, which is sometimes annoying.Where it does win over C#?Everything is non-nullable by default, you never have to check that something’s unexpectedly null.Also, Vala has code contracts. This is starting to come in C# 4, but the way it works in Vala is that you specify requirements in short phrases as part of your function signature and they stick to the signature, so that when you inherit it, it has exactly the same code contract as the base one, or when you inherit from an interface, you have to match the signature exactly. Just using those makes you think a bit more about how you’re writing your method, it’s not an afterthought when you’ve got contracts from base classes given to you, you can’t change it. Which I think is a lot nicer than the way C# handles it. When are those actually checked?They’re checked both at compile and run-time. The compile-time checking isn’t very strong yet, it’s quite a new feature in the compiler, and because it compiles down to C, you can write C code and interface with your methods, so you can bypass that compile-time check anyway. So there’s an extra runtime check, and if you violate one of the contracts at runtime, it’s game over for your program, there’s no exception to catch, it’s just goodbye!One thing I dislike about C# is the exceptions. You write a bit of code and fifty exceptions could come from any point in your ten lines, and you can’t mentally model how those exceptions are going to come out, and you can’t even predict them based on the functions you’re calling, because if you’ve accidentally got a derived class there instead of a base class, that can throw a completely different set of exceptions. So I’ve got no way of mentally modelling those, whereas in Vala they’re checked like Java, so you know only these exceptions can come out. You know in advance the error conditions.I think Raymond Chen on Old New Thing says “the only thing you know when you throw an exception is that you’re in an invalid state somewhere in your program, so just kill it and be done with it!”You said you’ve also learnt bits of Python. How did you find that compared to Vala and C#?Very different because of the dynamic typing. I’ve been writing a website for my own use. I’m quite into photography, so I take photos off my camera, post-process them, dump them in a file, and I get a webpage with all my thumbnails. So sort of like Picassa, but written by myself because I wanted something to learn Python with. There are some things that are really nice, I just found it really difficult to cope with the fact that I’m not quite sure what this object type that I’m passed is, I might not ever be sure, so it can randomly blow up on me. But once I train myself to ignore that and just say “well, I’m fairly sure it’s going to be something that looks like this, so I’ll use it like this”, then it’s quite nice.Any particular features that you’ve appreciated?I don’t like any particular feature, it’s just very straightforward to work with. It’s very quick to write something in, particularly as you don’t have to worry that you’ve changed something that affects a different part of the program. If you have, then that part blows up, but I can get this part working right now.If you were doing a big project, would you be willing to do it in Python rather than C# or Vala?I think I might be willing to try something bigger or long term with Python. We’re currently doing an ASP.NET MVC project on C#, and I don’t like the amount of reflection. There’s a lot of magic that pulls values out, and it’s all done under the scenes. It’s almost managed to put a dynamic type system on top of C#, which in many ways destroys the language to me, whereas if you’re already in a dynamic language, having things done dynamically is much more natural. In many ways, you get the worst of both worlds. I think for web projects, I would go with Python again, whereas for anything desktop, command-line or GUI-based, I’d probably go for C# or Vala, depending on what environment I’m in.It’s the fact that you can gain from the strong typing in ways that you can’t so much on the web app. Or, in a web app, you have to use dynamic typing at some point, or you have to write a hell of a lot of boilerplate, and I’d rather use the dynamic typing than write the boilerplate.What do you think separates great programmers from everyone else?Probably design choices. Choosing to write it a piece of code one way or another. For any given program you ask me to write, I could probably do it five thousand ways. A programmer who is capable will see four or five of them, and choose one of the better ones. The excellent programmer will see the largest proportion and manage to pick the best one very quickly without having to think too much about it. I think that’s probably what separates, is the speed at which they can see what’s the best path to write the program in. More Red Gater Coder interviews

    Read the article

  • Use Extension method to write cleaner code

    - by Fredrik N
    This blog post will show you step by step to refactoring some code to be more readable (at least what I think). Patrik Löwnedahl gave me some of the ideas when we where talking about making code much cleaner. The following is an simple application that will have a list of movies (Normal and Transfer). The task of the application is to calculate the total sum of each movie and also display the price of each movie. class Program { enum MovieType { Normal, Transfer } static void Main(string[] args) { var movies = GetMovies(); int totalPriceOfNormalMovie = 0; int totalPriceOfTransferMovie = 0; foreach (var movie in movies) { if (movie == MovieType.Normal) { totalPriceOfNormalMovie += 2; Console.WriteLine("$2"); } else if (movie == MovieType.Transfer) { totalPriceOfTransferMovie += 3; Console.WriteLine("$3"); } } } private static IEnumerable<MovieType> GetMovies() { return new List<MovieType>() { MovieType.Normal, MovieType.Transfer, MovieType.Normal }; } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } In the code above I’m using an enum, a good way to add types (isn’t it ;)). I also use one foreach loop to calculate the price, the loop has a condition statement to check what kind of movie is added to the list of movies. I want to reuse the foreach only to increase performance and let it do two things (isn’t that smart of me?! ;)). First of all I can admit, I’m not a big fan of enum. Enum often results in ugly condition statements and can be hard to maintain (if a new type is added we need to check all the code in our app to see if we use the enum somewhere else). I don’t often care about pre-optimizations when it comes to write code (of course I have performance in mind). I rather prefer to use two foreach to let them do one things instead of two. So based on what I don’t like and Martin Fowler’s Refactoring catalog, I’m going to refactoring this code to what I will call a more elegant and cleaner code. First of all I’m going to use Split Loop to make sure the foreach will do one thing not two, it will results in two foreach (Don’t care about performance here, if the results will results in bad performance, you can refactoring later, but computers are so fast to day, so iterating through a list is not often so time consuming.) Note: The foreach actually do four things, will come to is later. var movies = GetMovies(); int totalPriceOfNormalMovie = 0; int totalPriceOfTransferMovie = 0; foreach (var movie in movies) { if (movie == MovieType.Normal) { totalPriceOfNormalMovie += 2; Console.WriteLine("$2"); } } foreach (var movie in movies) { if (movie == MovieType.Transfer) { totalPriceOfTransferMovie += 3; Console.WriteLine("$3"); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } To remove the condition statement we can use the Where extension method added to the IEnumerable<T> and is located in the System.Linq namespace: foreach (var movie in movies.Where( m => m == MovieType.Normal)) { totalPriceOfNormalMovie += 2; Console.WriteLine("$2"); } foreach (var movie in movies.Where( m => m == MovieType.Transfer)) { totalPriceOfTransferMovie += 3; Console.WriteLine("$3"); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } The above code will still do two things, calculate the total price, and display the price of the movie. I will not take care of it at the moment, instead I will focus on the enum and try to remove them. One way to remove enum is by using the Replace Conditional with Polymorphism. So I will create two classes, one base class called Movie, and one called MovieTransfer. The Movie class will have a property called Price, the Movie will now hold the price:   public class Movie { public virtual int Price { get { return 2; } } } public class MovieTransfer : Movie { public override int Price { get { return 3; } } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } The following code has no enum and will use the new Movie classes instead: class Program { static void Main(string[] args) { var movies = GetMovies(); int totalPriceOfNormalMovie = 0; int totalPriceOfTransferMovie = 0; foreach (var movie in movies.Where( m => m is Movie)) { totalPriceOfNormalMovie += movie.Price; Console.WriteLine(movie.Price); } foreach (var movie in movies.Where( m => m is MovieTransfer)) { totalPriceOfTransferMovie += movie.Price; Console.WriteLine(movie.Price); } } private static IEnumerable<Movie> GetMovies() { return new List<Movie>() { new Movie(), new MovieTransfer(), new Movie() }; } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }   If you take a look at the foreach now, you can see it still actually do two things, calculate the price and display the price. We can do some more refactoring here by using the Sum extension method to calculate the total price of the movies:   static void Main(string[] args) { var movies = GetMovies(); int totalPriceOfNormalMovie = movies.Where(m => m is Movie) .Sum(m => m.Price); int totalPriceOfTransferMovie = movies.Where(m => m is MovieTransfer) .Sum(m => m.Price); foreach (var movie in movies.Where( m => m is Movie)) Console.WriteLine(movie.Price); foreach (var movie in movies.Where( m => m is MovieTransfer)) Console.WriteLine(movie.Price); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now when the Movie object will hold the price, there is no need to use two separate foreach to display the price of the movies in the list, so we can use only one instead: foreach (var movie in movies) Console.WriteLine(movie.Price); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } If we want to increase the Maintainability index we can use the Extract Method to move the Sum of the prices into two separate methods. The name of the method will explain what we are doing: static void Main(string[] args) { var movies = GetMovies(); int totalPriceOfMovie = TotalPriceOfMovie(movies); int totalPriceOfTransferMovie = TotalPriceOfMovieTransfer(movies); foreach (var movie in movies) Console.WriteLine(movie.Price); } private static int TotalPriceOfMovieTransfer(IEnumerable<Movie> movies) { return movies.Where(m => m is MovieTransfer) .Sum(m => m.Price); } private static int TotalPriceOfMovie(IEnumerable<Movie> movies) { return movies.Where(m => m is Movie) .Sum(m => m.Price); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now to the last thing, I love the ForEach method of the List<T>, but the IEnumerable<T> doesn’t have it, so I created my own ForEach extension, here is the code of the ForEach extension method: public static class LoopExtensions { public static void ForEach<T>(this IEnumerable<T> values, Action<T> action) { Contract.Requires(values != null); Contract.Requires(action != null); foreach (var v in values) action(v); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } I will now replace the foreach by using this ForEach method: static void Main(string[] args) { var movies = GetMovies(); int totalPriceOfMovie = TotalPriceOfMovie(movies); int totalPriceOfTransferMovie = TotalPriceOfMovieTransfer(movies); movies.ForEach(m => Console.WriteLine(m.Price)); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } The ForEach on the movies will now display the price of the movie, but maybe we want to display the name of the movie etc, so we can use Extract Method by moving the lamdba expression into a method instead, and let the method explains what we are displaying: movies.ForEach(DisplayMovieInfo); private static void DisplayMovieInfo(Movie movie) { Console.WriteLine(movie.Price); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now the refactoring is done! Here is the complete code:   class Program { static void Main(string[] args) { var movies = GetMovies(); int totalPriceOfMovie = TotalPriceOfMovie(movies); int totalPriceOfTransferMovie = TotalPriceOfMovieTransfer(movies); movies.ForEach(DisplayMovieInfo); } private static void DisplayMovieInfo(Movie movie) { Console.WriteLine(movie.Price); } private static int TotalPriceOfMovieTransfer(IEnumerable<Movie> movies) { return movies.Where(m => m is MovieTransfer) .Sum(m => m.Price); } private static int TotalPriceOfMovie(IEnumerable<Movie> movies) { return movies.Where(m => m is Movie) .Sum(m => m.Price); } private static IEnumerable<Movie> GetMovies() { return new List<Movie>() { new Movie(), new MovieTransfer(), new Movie() }; } } public class Movie { public virtual int Price { get { return 2; } } } public class MovieTransfer : Movie { public override int Price { get { return 3; } } } pulbic static class LoopExtensions { public static void ForEach<T>(this IEnumerable<T> values, Action<T> action) { Contract.Requires(values != null); Contract.Requires(action != null); foreach (var v in values) action(v); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } I think the new code is much cleaner than the first one, and I love the ForEach extension on the IEnumerable<T>, I can use it for different kind of things, for example: movies.Where(m => m is Movie) .ForEach(DoSomething); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } By using the Where and ForEach extension method, some if statements can be removed and will make the code much cleaner. But the beauty is in the eye of the beholder. What would you have done different, what do you think will make the first example in the blog post look much cleaner than my results, comments are welcome! If you want to know when I will publish a new blog post, you can follow me on twitter: http://www.twitter.com/fredrikn

    Read the article

  • Skinny controller in ASP.NET MVC 4

    - by thangchung
    Rails community are always inspire a lot of best ideas. I really love this community by the time. One of these is "Fat models and skinny controllers". I have spent a lot of time on ASP.NET MVC, and really I did some miss-takes, because I made the controller so fat. That make controller is really dirty and very hard to maintain in the future. It is violate seriously SRP principle and KISS as well. But how can we achieve that in ASP.NET MVC? That question is really clear after I read "Manning ASP.NET MVC 4 in Action". It is simple that we can separate it into ActionResult, and try to implementing logic and persistence data inside this. In last 2 years, I have read this from Jimmy Bogard blog, but in that time I never had a consideration about it. That's enough for talking now. I just published a sample on ASP.NET MVC 4, implemented on Visual Studio 2012 RC at here. I used EF framework at here for implementing persistence layer, and also use 2 free templates from internet to make the UI for this sample. In this sample, I try to implementing the simple magazine website that managing all articles, categories and news. It is not finished at all in this time, but no problems, because I just show you about how can we make the controller skinny. And I wanna hear more about your ideas. The first thing, I am abstract the base ActionResult class like this:    public abstract class MyActionResult : ActionResult, IEnsureNotNull     {         public abstract void EnsureAllInjectInstanceNotNull();     }     public abstract class ActionResultBase<TController> : MyActionResult where TController : Controller     {         protected readonly Expression<Func<TController, ActionResult>> ViewNameExpression;         protected readonly IExConfigurationManager ConfigurationManager;         protected ActionResultBase (Expression<Func<TController, ActionResult>> expr)             : this(DependencyResolver.Current.GetService<IExConfigurationManager>(), expr)         {         }         protected ActionResultBase(             IExConfigurationManager configurationManager,             Expression<Func<TController, ActionResult>> expr)         {             Guard.ArgumentNotNull(expr, "ViewNameExpression");             Guard.ArgumentNotNull(configurationManager, "ConfigurationManager");             ViewNameExpression = expr;             ConfigurationManager = configurationManager;         }         protected ViewResult GetViewResult<TViewModel>(TViewModel viewModel)         {             var m = (MethodCallExpression)ViewNameExpression.Body;             if (m.Method.ReturnType != typeof(ActionResult))             {                 throw new ArgumentException("ControllerAction method '" + m.Method.Name + "' does not return type ActionResult");             }             var result = new ViewResult             {                 ViewName = m.Method.Name             };             result.ViewData.Model = viewModel;             return result;         }         public override void ExecuteResult(ControllerContext context)         {             EnsureAllInjectInstanceNotNull();         }     } I also have an interface for validation all inject objects. This interface make sure all inject objects that I inject using Autofac container are not null. The implementation of this as below public interface IEnsureNotNull     {         void EnsureAllInjectInstanceNotNull();     } Afterwards, I am just simple implementing the HomePageViewModelActionResult class like this public class HomePageViewModelActionResult<TController> : ActionResultBase<TController> where TController : Controller     {         #region variables & ctors         private readonly ICategoryRepository _categoryRepository;         private readonly IItemRepository _itemRepository;         private readonly int _numOfPage;         public HomePageViewModelActionResult(Expression<Func<TController, ActionResult>> viewNameExpression)             : this(viewNameExpression,                    DependencyResolver.Current.GetService<ICategoryRepository>(),                    DependencyResolver.Current.GetService<IItemRepository>())         {         }         public HomePageViewModelActionResult(             Expression<Func<TController, ActionResult>> viewNameExpression,             ICategoryRepository categoryRepository,             IItemRepository itemRepository)             : base(viewNameExpression)         {             _categoryRepository = categoryRepository;             _itemRepository = itemRepository;             _numOfPage = ConfigurationManager.GetAppConfigBy("NumOfPage").ToInteger();         }         #endregion         #region implementation         public override void ExecuteResult(ControllerContext context)         {             base.ExecuteResult(context);             var cats = _categoryRepository.GetCategories();             var mainViewModel = new HomePageViewModel();             var headerViewModel = new HeaderViewModel();             var footerViewModel = new FooterViewModel();             var mainPageViewModel = new MainPageViewModel();             headerViewModel.SiteTitle = "Magazine Website";             if (cats != null && cats.Any())             {                 headerViewModel.Categories = cats.ToList();                 footerViewModel.Categories = cats.ToList();             }             mainPageViewModel.LeftColumn = BindingDataForMainPageLeftColumnViewModel();             mainPageViewModel.RightColumn = BindingDataForMainPageRightColumnViewModel();             mainViewModel.Header = headerViewModel;             mainViewModel.DashBoard = new DashboardViewModel();             mainViewModel.Footer = footerViewModel;             mainViewModel.MainPage = mainPageViewModel;             GetViewResult(mainViewModel).ExecuteResult(context);         }         public override void EnsureAllInjectInstanceNotNull()         {             Guard.ArgumentNotNull(_categoryRepository, "CategoryRepository");             Guard.ArgumentNotNull(_itemRepository, "ItemRepository");             Guard.ArgumentMustMoreThanZero(_numOfPage, "NumOfPage");         }         #endregion         #region private functions         private MainPageRightColumnViewModel BindingDataForMainPageRightColumnViewModel()         {             var mainPageRightCol = new MainPageRightColumnViewModel();             mainPageRightCol.LatestNews = _itemRepository.GetNewestItem(_numOfPage).ToList();             mainPageRightCol.MostViews = _itemRepository.GetMostViews(_numOfPage).ToList();             return mainPageRightCol;         }         private MainPageLeftColumnViewModel BindingDataForMainPageLeftColumnViewModel()         {             var mainPageLeftCol = new MainPageLeftColumnViewModel();             var items = _itemRepository.GetNewestItem(_numOfPage);             if (items != null && items.Any())             {                 var firstItem = items.First();                 if (firstItem == null)                     throw new NoNullAllowedException("First Item".ToNotNullErrorMessage());                 if (firstItem.ItemContent == null)                     throw new NoNullAllowedException("First ItemContent".ToNotNullErrorMessage());                 mainPageLeftCol.FirstItem = firstItem;                 if (items.Count() > 1)                 {                     mainPageLeftCol.RemainItems = items.Where(x => x.ItemContent != null && x.Id != mainPageLeftCol.FirstItem.Id).ToList();                 }             }             return mainPageLeftCol;         }         #endregion     }  Final step, I get into HomeController and add some line of codes like this [Authorize]     public class HomeController : BaseController     {         [AllowAnonymous]         public ActionResult Index()         {             return new HomePageViewModelActionResult<HomeController>(x=>x.Index());         }         [AllowAnonymous]         public ActionResult Details(int id)         {             return new DetailsViewModelActionResult<HomeController>(x => x.Details(id), id);         }         [AllowAnonymous]         public ActionResult Category(int id)         {             return new CategoryViewModelActionResult<HomeController>(x => x.Category(id), id);         }     } As you see, the code in controller is really skinny, and all the logic I move to the custom ActionResult class. Some people said, it just move the code out of controller and put it to another class, so it is still hard to maintain. Look like it just move the complicate codes from one place to another place. But if you have a look and think it in details, you have to find out if you have code for processing all logic that related to HttpContext or something like this. You can do it on Controller, and try to delegating another logic  (such as processing business requirement, persistence data,...) to custom ActionResult class. Tell me more your thinking, I am really willing to hear all of its from you guys. All source codes can be find out at here. Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="http://weblogs.asp.net//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");

    Read the article

  • Need guidance on a Google Map application that has to show 250 000 polylines.

    - by lucian.jp
    I am looking for advice for an application I am developing that uses Google Map. Summary: A user has a list of criteria for searching a street segment that fulfills the criteria. The street segments will be colored with 3 colors for showing those below average, average and over average. Then the user clicks on the street segment to see an information window showing the properties of that specific segment hiding those not selected until he/she closes the window and other polyline becomes visible again. This looks quite like the Monopoly City Streets game Hasbro made some month ago the difference being I do not use Flash, I can’t use Open Street Map because it doesn’t list street segment (if it does the IDs won’t be the same anyway) and I do not have to show Google sketch building over. Information: I have a database of street segments with IDs, polyline points and centroid. The database has 6,000,000 street segment records in it. To narrow the generated data a bit we focus on city. The largest city we must show has 250,000 street segments. This means 250,000 line segment polyline to show. Our longest polyline uses 9600 characters which is stored in two 8000 varchar columns in SQL Server 2008. We need to use the API v3 because it is faster than the API v2 and the application will be ported to iPhone. For now it's an ASP.NET 3.5 with SQl Server 2008 application. Performance is a priority. Problems: Most of the demo projects that do this are made with API v2. So besides tutorial on the Google API v3 reference page I have nothing to compare performance or technology use to achieve my goal. There is no available .NET wrapper for the API v3 yet. Generating a 250,000 line segment polyline creates a heavy file which takes time to transfer and parse. (I have found a demo of one polyline of 390,000 points. I think the encoder would be far less efficient with more polylines with less points since there will be less rounding.) Since streets segments are shown based on criteria, polylines must be dynamically created and cache can't be used. Some thoughts: KML/KMZ: Pros: Since it is a standard we can easily load Bing maps, Yahoo! maps, Google maps, Google Earth, with the same KML file. The data generation would be the same. Cons: LineString in KML cannot be encoded polyline like the Google map API can handle. So it would probably be bigger and slower to display. Zipping the file at the size it will take more processing time and require the client side to uncompress the data and I am not quite sure with 250,000 data how an iPhone would handle this and how a server would handle 40 users browsing at the same time. JavaScript file: Pros: JavaScript file can have encoded polyline and would significantly reduce the file to transfer. Cons: Have to create my own stripped version of API v3 to add overlays, create polyline, etc. It is more complex than just create a KML file and point to the source. GeoRSS: This option isn't adapted for my needs I think, but I could be wrong. MapServer: I saw some post suggesting using MapServer to generate overlays. Not quite sure for the connection with our database and the performance it would give. Plus it requires a plugin for generating KML. It seems to me that it wouldn't allow me to do better than creating my own KML or JavaScript file. Maintenance would be simpler without. Monopoly City Streets: The game is now over, but for those who know what I am talking about Monopoly City Streets was showing at max zoom level only the streets that the centroid was inside the Bounds of the window. Moving the map was sending request to the server for the new streets to show. While I think this was ingenious, I have no idea how to implement something similar. The only thing I thought about was to compare if the long was inside the bound of map area X and same with Y. While this could improve performance significantly at high zoom level, this would give nothing when showing a whole city. Clustering: While cluster is awesome for marker, it seems we cannot cluster polylines. I would have liked something like MarkerClusterer for polylines and be able to cluster by my 3 polyline colors. This will probably stay as a “would have been freaking awesome but forget it”. Arrow: I will have in a future version to show a direction for the polyline and will have to show an arrow at the centroid. Loading an image or marker will only double my data so creating a custom overlay will probably be my only option. I have found that demo for something similar I would like to achieve. Unfortunately, the demo is very slow, but I only wish to show 1 arrow per polyline and not multiple like the demo. This functionality will depend on the format of data since I don't think KML support custom overlays. Criteria: While the application is done with ASP.NET 3.5, the port to the iPhone won't use the web to show the application and be limited in screen size for selecting the criteria. This is why I was more orienting on a service or page generating the file based on criteria passed in parameters. The service would than generate the file I need to display the polylines on the map. I could also create an aspx page that does this. The aspx page is more documented than the service way. There should be a reason. Questions: Should I create a web service to returns the street segments file or create an aspx page that return the file? Should I create a JavaScript file with encoded polyline or a KML with longitude/latitude based on the fact that maximum longitude/latitude polyline have 9600 characters and I have to render maximum 250,000 line segment polyline. Or should I go with a MapServer that generate the overlay? Will I be able to display simple arrow on the polyline on the next version. In case of KML generation is it faster to create the file with XDocument, XmlDocument, XmlWriter and this manually or just serialize the street segment in the stream? This is more a brainstorming Stack Overflow question than an actual code problem. Any answer helping narrow the possibilities is as good as someone having all the knowledge to point me out a better choice.

    Read the article

  • Reflect.Emit Dynamic Type Memory Blowup

    - by Firestrand
    Using C# 3.5 I am trying to generate dynamic types at runtime using reflection emit. I used the Dynamic Query Library sample from Microsoft to create a class generator. Everything works, my problem is that 100 generated types inflate the memory usage by approximately 25MB. This is a completely unacceptable memory profile as eventually I want to support having several hundred thousand types generated in memory. Memory profiling shows that the memory is apparently being held by various System.Reflection.Emit types and methods though I can't figure out why. I haven't found others talking about this problem so I am hoping someone in this community either knows what I am doing wrong or if this is expected behavior. Contrived Example below: using System; using System.Collections.Generic; using System.Text; using System.Reflection; using System.Reflection.Emit; namespace SmallRelfectExample { class Program { static void Main(string[] args) { int typeCount = 100; int propCount = 100; Random rand = new Random(); Type dynType = null; for (int i = 0; i < typeCount; i++) { List<DynamicProperty> dpl = new List<DynamicProperty>(propCount); for (int j = 0; j < propCount; j++) { dpl.Add(new DynamicProperty("Key" + rand.Next().ToString(), typeof(String))); } SlimClassFactory scf = new SlimClassFactory(); dynType = scf.CreateDynamicClass(dpl.ToArray(), i); //Optionally do something with the type here } Console.WriteLine("SmallRelfectExample: {0} Types generated.", typeCount); Console.ReadLine(); } } public class SlimClassFactory { private readonly ModuleBuilder module; public SlimClassFactory() { AssemblyName name = new AssemblyName("DynamicClasses"); AssemblyBuilder assembly = AppDomain.CurrentDomain.DefineDynamicAssembly(name, AssemblyBuilderAccess.Run); module = assembly.DefineDynamicModule("Module"); } public Type CreateDynamicClass(DynamicProperty[] properties, int Id) { string typeName = "DynamicClass" + Id.ToString(); TypeBuilder tb = module.DefineType(typeName, TypeAttributes.Class | TypeAttributes.Public, typeof(DynamicClass)); FieldInfo[] fields = GenerateProperties(tb, properties); GenerateEquals(tb, fields); GenerateGetHashCode(tb, fields); Type result = tb.CreateType(); return result; } static FieldInfo[] GenerateProperties(TypeBuilder tb, DynamicProperty[] properties) { FieldInfo[] fields = new FieldBuilder[properties.Length]; for (int i = 0; i < properties.Length; i++) { DynamicProperty dp = properties[i]; FieldBuilder fb = tb.DefineField("_" + dp.Name, dp.Type, FieldAttributes.Private); PropertyBuilder pb = tb.DefineProperty(dp.Name, PropertyAttributes.HasDefault, dp.Type, null); MethodBuilder mbGet = tb.DefineMethod("get_" + dp.Name, MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.HideBySig, dp.Type, Type.EmptyTypes); ILGenerator genGet = mbGet.GetILGenerator(); genGet.Emit(OpCodes.Ldarg_0); genGet.Emit(OpCodes.Ldfld, fb); genGet.Emit(OpCodes.Ret); MethodBuilder mbSet = tb.DefineMethod("set_" + dp.Name, MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.HideBySig, null, new Type[] { dp.Type }); ILGenerator genSet = mbSet.GetILGenerator(); genSet.Emit(OpCodes.Ldarg_0); genSet.Emit(OpCodes.Ldarg_1); genSet.Emit(OpCodes.Stfld, fb); genSet.Emit(OpCodes.Ret); pb.SetGetMethod(mbGet); pb.SetSetMethod(mbSet); fields[i] = fb; } return fields; } static void GenerateEquals(TypeBuilder tb, FieldInfo[] fields) { MethodBuilder mb = tb.DefineMethod("Equals", MethodAttributes.Public | MethodAttributes.ReuseSlot | MethodAttributes.Virtual | MethodAttributes.HideBySig, typeof(bool), new Type[] { typeof(object) }); ILGenerator gen = mb.GetILGenerator(); LocalBuilder other = gen.DeclareLocal(tb); Label next = gen.DefineLabel(); gen.Emit(OpCodes.Ldarg_1); gen.Emit(OpCodes.Isinst, tb); gen.Emit(OpCodes.Stloc, other); gen.Emit(OpCodes.Ldloc, other); gen.Emit(OpCodes.Brtrue_S, next); gen.Emit(OpCodes.Ldc_I4_0); gen.Emit(OpCodes.Ret); gen.MarkLabel(next); foreach (FieldInfo field in fields) { Type ft = field.FieldType; Type ct = typeof(EqualityComparer<>).MakeGenericType(ft); next = gen.DefineLabel(); gen.EmitCall(OpCodes.Call, ct.GetMethod("get_Default"), null); gen.Emit(OpCodes.Ldarg_0); gen.Emit(OpCodes.Ldfld, field); gen.Emit(OpCodes.Ldloc, other); gen.Emit(OpCodes.Ldfld, field); gen.EmitCall(OpCodes.Callvirt, ct.GetMethod("Equals", new Type[] { ft, ft }), null); gen.Emit(OpCodes.Brtrue_S, next); gen.Emit(OpCodes.Ldc_I4_0); gen.Emit(OpCodes.Ret); gen.MarkLabel(next); } gen.Emit(OpCodes.Ldc_I4_1); gen.Emit(OpCodes.Ret); } static void GenerateGetHashCode(TypeBuilder tb, FieldInfo[] fields) { MethodBuilder mb = tb.DefineMethod("GetHashCode", MethodAttributes.Public | MethodAttributes.ReuseSlot | MethodAttributes.Virtual | MethodAttributes.HideBySig, typeof(int), Type.EmptyTypes); ILGenerator gen = mb.GetILGenerator(); gen.Emit(OpCodes.Ldc_I4_0); foreach (FieldInfo field in fields) { Type ft = field.FieldType; Type ct = typeof(EqualityComparer<>).MakeGenericType(ft); gen.EmitCall(OpCodes.Call, ct.GetMethod("get_Default"), null); gen.Emit(OpCodes.Ldarg_0); gen.Emit(OpCodes.Ldfld, field); gen.EmitCall(OpCodes.Callvirt, ct.GetMethod("GetHashCode", new Type[] { ft }), null); gen.Emit(OpCodes.Xor); } gen.Emit(OpCodes.Ret); } } public abstract class DynamicClass { public override string ToString() { PropertyInfo[] props = GetType().GetProperties(BindingFlags.Instance | BindingFlags.Public); StringBuilder sb = new StringBuilder(); sb.Append("{"); for (int i = 0; i < props.Length; i++) { if (i > 0) sb.Append(", "); sb.Append(props[i].Name); sb.Append("="); sb.Append(props[i].GetValue(this, null)); } sb.Append("}"); return sb.ToString(); } } public class DynamicProperty { private readonly string name; private readonly Type type; public DynamicProperty(string name, Type type) { if (name == null) throw new ArgumentNullException("name"); if (type == null) throw new ArgumentNullException("type"); this.name = name; this.type = type; } public string Name { get { return name; } } public Type Type { get { return type; } } } }

    Read the article

  • Returning Arrays from .net web service to Java ME web service results in compile error of stub?

    - by sphereinabox
    So, I'm getting some compile errors on netbeans 6.5 generated web service code for a java ME client to a c# (vs2005) web service. I've trimmed my example significantly, and it still shows the problem, and not being able to return a collection of things is pretty much a deal-breaker. c# web service (SimpleWebService.asmx) <%@ WebService Language="C#" Class="SimpleWebService" %> using System; using System.Web; using System.Web.Services; using System.Web.Services.Protocols; [WebService(Namespace = "http://sphereinabox.com/")] [WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)] public class SimpleWebService : System.Web.Services.WebService { [WebMethod] public CustomType[] GetSomething() { return new CustomType[] {new CustomType("hi"), new CustomType("bye")}; } public class CustomType { public string Name; public CustomType(string _name) { Name = _name; } public CustomType() { } } } WSDL (automatically generated by vs2005): <?xml version="1.0" encoding="utf-8"?> <wsdl:definitions xmlns:s="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:tns="http://sphereinabox.com/" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:tm="http://microsoft.com/wsdl/mime/textMatching/" xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" targetNamespace="http://sphereinabox.com/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"> <wsdl:types> <s:schema elementFormDefault="qualified" targetNamespace="http://sphereinabox.com/"> <s:element name="GetSomething"> <s:complexType /> </s:element> <s:element name="GetSomethingResponse"> <s:complexType> <s:sequence> <s:element minOccurs="0" maxOccurs="1" name="GetSomethingResult" type="tns:ArrayOfCustomType" /> </s:sequence> </s:complexType> </s:element> <s:complexType name="ArrayOfCustomType"> <s:sequence> <s:element minOccurs="0" maxOccurs="unbounded" name="CustomType" nillable="true" type="tns:CustomType" /> </s:sequence> </s:complexType> <s:complexType name="CustomType"> <s:sequence> <s:element minOccurs="0" maxOccurs="1" name="Name" type="s:string" /> </s:sequence> </s:complexType> </s:schema> </wsdl:types> <wsdl:message name="GetSomethingSoapIn"> <wsdl:part name="parameters" element="tns:GetSomething" /> </wsdl:message> <wsdl:message name="GetSomethingSoapOut"> <wsdl:part name="parameters" element="tns:GetSomethingResponse" /> </wsdl:message> <wsdl:portType name="SimpleWebServiceSoap"> <wsdl:operation name="GetSomething"> <wsdl:input message="tns:GetSomethingSoapIn" /> <wsdl:output message="tns:GetSomethingSoapOut" /> </wsdl:operation> </wsdl:portType> <wsdl:binding name="SimpleWebServiceSoap" type="tns:SimpleWebServiceSoap"> <soap:binding transport="http://schemas.xmlsoap.org/soap/http" /> <wsdl:operation name="GetSomething"> <soap:operation soapAction="http://sphereinabox.com/GetSomething" style="document" /> <wsdl:input> <soap:body use="literal" /> </wsdl:input> <wsdl:output> <soap:body use="literal" /> </wsdl:output> </wsdl:operation> </wsdl:binding> <wsdl:binding name="SimpleWebServiceSoap12" type="tns:SimpleWebServiceSoap"> <soap12:binding transport="http://schemas.xmlsoap.org/soap/http" /> <wsdl:operation name="GetSomething"> <soap12:operation soapAction="http://sphereinabox.com/GetSomething" style="document" /> <wsdl:input> <soap12:body use="literal" /> </wsdl:input> <wsdl:output> <soap12:body use="literal" /> </wsdl:output> </wsdl:operation> </wsdl:binding> <wsdl:service name="SimpleWebService"> <wsdl:port name="SimpleWebServiceSoap" binding="tns:SimpleWebServiceSoap"> <soap:address location="http://localhost/SimpleWebService/SimpleWebService.asmx" /> </wsdl:port> <wsdl:port name="SimpleWebServiceSoap12" binding="tns:SimpleWebServiceSoap12"> <soap12:address location="http://localhost/SimpleWebService/SimpleWebService.asmx" /> </wsdl:port> </wsdl:service> </wsdl:definitions> Generated (netbeans) code that fails to compile, this was created going through the "Add - New JavaME to Web Services Client" wizard. (SimpleWebService_Stub.java) public ArrayOfCustomType GetSomething() throws java.rmi.RemoteException { Object inputObject[] = new Object[] { }; Operation op = Operation.newInstance( _qname_operation_GetSomething, _type_GetSomething, _type_GetSomethingResponse ); _prepOperation( op ); op.setProperty( Operation.SOAPACTION_URI_PROPERTY, "http://sphereinabox.com/GetSomething" ); Object resultObj; try { resultObj = op.invoke( inputObject ); } catch( JAXRPCException e ) { Throwable cause = e.getLinkedCause(); if( cause instanceof java.rmi.RemoteException ) { throw (java.rmi.RemoteException) cause; } throw e; } //////// Error on next line, symbol ArrayOfCustomType_fromObject not defined return ArrayOfCustomType_fromObject((Object[])((Object[]) resultObj)[0]); } it turns out with this contrived example (the "CustomType" in my production problem has more than one field) I also get errors from this fun code in the same generated (SimpleWebService_Stub.java) generated code. The errors are that string isn't defined (it's String in java, and besides I think this should be talking about CustomType anyway). private static string string_fromObject( Object obj[] ) { if(obj == null) return null; string result = new string(); return result; }

    Read the article

  • Would anyone tell me how to fetch the media:thumb element's attribute from a json feed?

    - by ash
    I made a yahoo pipe that pulls up the atoms as json format; however, I can fetch and display all the elements in my html page except for the element's attribute. Would anyone tell me how to fetch the media:thumb element's attribute from a json feed? I am pasting the html page's code with javascript. If you save the html page and then view it in browser, you will see that all the necessary elements get output at html page except for the media:thumb as I cannot display the attribute of media:thumb when the feed is formatted as json. I am also pasting the some portion of the json feed so that you can have an idea what i am talking about. Please tell me how to retrieve attribute from media:thumb element of a json feed by using plain javascript but no server side code or javascript library. Thank you. function getFeed(feed){ var newScript = document.createElement('script'); newScript.type = 'text/javascript'; newScript.src = 'http://pipes.yahoo.com/pipes/pipe.run?_id=40616620df99780bceb3fe923cecd216&_render=json&_callback=piper'; document.getElementsByTagName("head")[0].appendChild(newScript); } function piper(feed){ var tmp=''; for (var i=0; i'; tmp+=feed.value.items[i].title+''; tmp+=feed.value.items[i].author.name+''; tmp+=feed.value.items[i].published+''; if (feed.value.items[i].description) { tmp+=feed.value.items[i].description+''; } tmp+='<hr>'; } document.getElementById('rssLayer').innerHTML=tmp; } </script> bchnbc .............................................................. Some portion of the json feed that gets generated by yahoo pipe .............................................................. piper({"count":2,"value":{"title":"myPipe","description":"Pipes Output","link":"http:\/\/pipes.yahoo.com\/pipes\/pipe.info?_id=f7f4175d493cf1171aecbd3268fea5ee","pubDate":"Fri, 02 Apr 2010 17:59:22 -0700","generator":"http:\/\/pipes.yahoo.com\/pipes\/","callback":"piper", "items": [{ "rights":"Attribution - Noncommercial - No Derivative Works", "link":"http:\/\/vodo.net\/mixtape1", "y:id":{"value":null,"permalink":"true"}, "content":{"content":"We're proud to be releasing this first VODO MIXTAPE. Actual tape might be a thing of the past, but before P2P, mixtapes were the most popular way of sharing popular culture the world had known -- and once called the 'most widely practiced American art form'. We want to resuscitate the spirit of the mixtape for this VODO MIXTAPE series: compilations of our favourite shorts, the weird, the wild and the wonky, all brought together in a temporary and uncomfortable company.","type":"text"}, "author": {"name":"Various"}, "description":"We're proud to be releasing this first VODO MIXTAPE. Actual tape might be a thing of the past, but before P2P, mixtapes were the most popular way of sharing popular culture the world had known -- and once called the 'most widely practiced American art form'. We want to resuscitate the spirit of the mixtape for this VODO MIXTAPE series: compilations of our favourite shorts, the weird, the wild and the wonky, all brought together in a temporary and uncomfortable company.", "media:thumbnail": { "url":"http:\/\/vodo.net\/\/thumbnails\/Mixtape1.jpg" }, "published":"2010-03-08-09:20:20 PM", "format": { "audio_bitrate":null, "width":"608", "xmlns":"http:\/\/xmlns.transmission.cc\/FileFormat", "channels":"2", "samplerate":"44100.0", "duration":"3092.36", "height":"352", "size":"733925376.0", "framerate":"25.0", "audio_codec":"mp3", "video_bitrate":"1898.0", "video_codec":"XVID", "pixel_aspect_ratio":"16:9" }, "y:title":"Mixtape #1: VODO's favourite short films", "title":"Mixtape #1: VODO's favourite short films", "id":null, "pubDate":"2010-03-08-09:20:20 PM", "y:published":{"hour":"3","timezone":"UTC","second":"0","month":"4","minute":"10","utime":"1270264200","day":"3","day_of_week":"6","year":"2010" }}, {"rights":"Attribution - Noncommercial - No Derivative Works","link":"http:\/\/vodo.net\/gilbert","y:id":{"value":"cd6584e06ea4ce7fcd34172f4bbd919e295f8680","permalink":"true"},"content":{"content":"A documentary short about Gilbert, the Beacon Hill \"town crier.\" For the last 9 years, since losing his job and becoming homeless, Gilbert has delivered the weather, sports, and breaking headlines from his spot on the Boston Common. Music (used with permission) in this piece is called \"Blue Bicycle\" by Dusseldorf-based pianist \/ composer Volker Bertelmann also known as Hauschka. Artistic Statement: This is the first in a series of profiles of people who I think are interesting, and who I see on almost a daily basis. I don't want to limit the series to people who live \"on the fringe,\" but it would be appropriate to say that most of the people I interview are eclectic, eccentric, and just a little bit unique. The art is in the viewing - but I hope to turn my lens on individuals that don't always color in the lines, whether they can help it or not.","type":"text"},"author":{"name":"Nathaniel Hansen"},"description":"A documentary short about Gilbert, the Beacon Hill \"town crier.\" For the last 9 years, since losing his job and becoming homeless, Gilbert has delivered the weather, sports, and breaking headlines from his spot on the Boston Common. Music (used with permission) in this piece is called \"Blue Bicycle\" by Dusseldorf-based pianist \/ composer Volker Bertelmann also known as Hauschka. Artistic Statement: This is the first in a series of profiles of people who I think are interesting, and who I see on almost a daily basis. I don't want to limit the series to people who live \"on the fringe,\" but it would be appropriate to say that most of the people I interview are eclectic, eccentric, and just a little bit unique. The art is in the viewing - but I hope to turn my lens on individuals that don't always color in the lines, whether they can help it or not.","media:thumbnail":{"url":"http:\/\/vodo.net\/\/thumbnails\/gilbert.jpeg"},"published":"2010-03-03-10:37:05 AM","format":{"audio_bitrate":null,"width":"624","xmlns":"http:\/\/xmlns.transmission.cc\/FileFormat","channels":"2","samplerate":null,"duration":"373.673","height":"352","size":"123321266.0","framerate":null,"audio_codec":"mp3","video_bitrate":null,"video_codec":"XVID","pixel_aspect_ratio":"16:9"},"y:title":"Gilbert","title":"Gilbert","id":"cd6584e06ea4ce7fcd34172f4bbd919e295f8680","pubDate":"2010-03-03-10:37:05 AM","y:published":{"hour":"3","timezone":"UTC","second":"0","month":"4","minute":"10","utime":"1270264200","day":"3","day_of_week":"6","year":"2010" }} ] }})

    Read the article

  • Null-free "maps": Is a callback solution slower than tryGet()?

    - by David Moles
    In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same. But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches. (Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.) First, the callback example: class CallbackMap<K, V> { private final Map<K, V> backingMap; public CallbackMap(Map<K, V> backingMap) { this.backingMap = backingMap; } void lookup(K key, Callback<K, V> handler) { V val = backingMap.get(key); if (val == null) { handler.handleMissing(key); } else { handler.handleFound(key, val); } } } interface Callback<K, V> { void handleFound(K key, V value); void handleMissing(K key); } class CallbackExample { private final Map<String, String> map; private final List<String> found; private final List<String> missing; private Callback<String, String> handler; public CallbackExample(Map<String, String> map) { this.map = map; found = new ArrayList<String>(map.size()); missing = new ArrayList<String>(map.size()); handler = new Callback<String, String>() { public void handleFound(String key, String value) { found.add(key + ": " + value); } public void handleMissing(String key) { missing.add(key); } }; } void test() { CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map); for (int i = 0, count = map.size(); i < count; i++) { String key = "key" + i; cbMap.lookup(key, handler); } System.out.println(found.size() + " found"); System.out.println(missing.size() + " missing"); } } Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong): class TryGetMap<K, V> { private final Map<K, V> backingMap; public TryGetMap(Map<K, V> backingMap) { this.backingMap = backingMap; } boolean tryGet(K key, OutParameter<V> valueParam) { V val = backingMap.get(key); if (val == null) { return false; } valueParam.value = val; return true; } } class OutParameter<V> { V value; } class TryGetExample { private final Map<String, String> map; private final List<String> found; private final List<String> missing; public TryGetExample(Map<String, String> map) { this.map = map; found = new ArrayList<String>(map.size()); missing = new ArrayList<String>(map.size()); } void test() { TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map); for (int i = 0, count = map.size(); i < count; i++) { String key = "key" + i; OutParameter<String> out = new OutParameter<String>(); if (tgMap.tryGet(key, out)) { found.add(key + ": " + out.value); } else { missing.add(key); } } System.out.println(found.size() + " found"); System.out.println(missing.size() + " missing"); } } And finally, the performance test code: public static void main(String[] args) { int size = 200000; Map<String, String> map = new HashMap<String, String>(); for (int i = 0; i < size; i++) { String val = (i % 5 == 0) ? null : "value" + i; map.put("key" + i, val); } long totalCallback = 0; long totalTryGet = 0; int iterations = 20; for (int i = 0; i < iterations; i++) { { TryGetExample tryGet = new TryGetExample(map); long tryGetStart = System.currentTimeMillis(); tryGet.test(); totalTryGet += (System.currentTimeMillis() - tryGetStart); } System.gc(); { CallbackExample callback = new CallbackExample(map); long callbackStart = System.currentTimeMillis(); callback.test(); totalCallback += (System.currentTimeMillis() - callbackStart); } System.gc(); } System.out.println("Avg. callback: " + (totalCallback / iterations)); System.out.println("Avg. tryGet(): " + (totalTryGet / iterations)); } On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished. This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?

    Read the article

  • Numpy zero rank array indexing/broadcasting

    - by Lemming
    I'm trying to write a function that supports broadcasting and is fast at the same time. However, numpy's zero-rank arrays are causing trouble as usual. I couldn't find anything useful on google, or by searching here. So, I'm asking you. How should I implement broadcasting efficiently and handle zero-rank arrays at the same time? This whole post became larger than anticipated, sorry. Details: To clarify what I'm talking about I'll give a simple example: Say I want to implement a Heaviside step-function. I.e. a function that acts on the real axis, which is 0 on the negative side, 1 on the positive side, and from case to case either 0, 0.5, or 1 at the point 0. Implementation Masking The most efficient way I found so far is the following. It uses boolean arrays as masks to assign the correct values to the corresponding slots in the output vector. from numpy import * def step_mask(x, limit=+1): """Heaviside step-function. y = 0 if x < 0 y = 1 if x > 0 See below for x == 0. Arguments: x Evaluate the function at these points. limit Which limit at x == 0? limit > 0: y = 1 limit == 0: y = 0.5 limit < 0: y = 0 Return: The values corresponding to x. """ b = broadcast(x, limit) out = zeros(b.shape) out[x>0] = 1 mask = (limit > 0) & (x == 0) out[mask] = 1 mask = (limit == 0) & (x == 0) out[mask] = 0.5 mask = (limit < 0) & (x == 0) out[mask] = 0 return out List Comprehension The following-the-numpy-docs way is to use a list comprehension on the flat iterator of the broadcast object. However, list comprehensions become absolutely unreadable for such complicated functions. def step_comprehension(x, limit=+1): b = broadcast(x, limit) out = empty(b.shape) out.flat = [ ( 1 if x_ > 0 else ( 0 if x_ < 0 else ( 1 if l_ > 0 else ( 0.5 if l_ ==0 else ( 0 ))))) for x_, l_ in b ] return out For Loop And finally, the most naive way is a for loop. It's probably the most readable option. However, Python for-loops are anything but fast. And hence, a really bad idea in numerics. def step_for(x, limit=+1): b = broadcast(x, limit) out = empty(b.shape) for i, (x_, l_) in enumerate(b): if x_ > 0: out[i] = 1 elif x_ < 0: out[i] = 0 elif l_ > 0: out[i] = 1 elif l_ < 0: out[i] = 0 else: out[i] = 0.5 return out Test First of all a brief test to see if the output is correct. >>> x = array([-1, -0.1, 0, 0.1, 1]) >>> step_mask(x, +1) array([ 0., 0., 1., 1., 1.]) >>> step_mask(x, 0) array([ 0. , 0. , 0.5, 1. , 1. ]) >>> step_mask(x, -1) array([ 0., 0., 0., 1., 1.]) It is correct, and the other two functions give the same output. Performance How about efficiency? These are the timings: In [45]: xl = linspace(-2, 2, 500001) In [46]: %timeit step_mask(xl) 10 loops, best of 3: 19.5 ms per loop In [47]: %timeit step_comprehension(xl) 1 loops, best of 3: 1.17 s per loop In [48]: %timeit step_for(xl) 1 loops, best of 3: 1.15 s per loop The masked version performs best as expected. However, I'm surprised that the comprehension is on the same level as the for loop. Zero Rank Arrays But, 0-rank arrays pose a problem. Sometimes you want to use a function scalar input. And preferably not have to worry about wrapping all scalars in at least 1-D arrays. >>> step_mask(1) Traceback (most recent call last): File "<ipython-input-50-91c06aa4487b>", line 1, in <module> step_mask(1) File "script.py", line 22, in step_mask out[x>0] = 1 IndexError: 0-d arrays can't be indexed. >>> step_for(1) Traceback (most recent call last): File "<ipython-input-51-4e0de4fcb197>", line 1, in <module> step_for(1) File "script.py", line 55, in step_for out[i] = 1 IndexError: 0-d arrays can't be indexed. >>> step_comprehension(1) array(1.0) Only the list comprehension can handle 0-rank arrays. The other two versions would need special case handling for 0-rank arrays. Numpy gets a bit messy when you want to use the same code for arrays and scalars. However, I really like to have functions that work on as arbitrary input as possible. Who knows which parameters I'll want to iterate over at some point. Question: What is the best way to implement a function as the one above? Is there a way to avoid if scalar then like special cases? I'm not looking for a built-in Heaviside. It's just a simplified example. In my code the above pattern appears in many places to make parameter iteration as simple as possible without littering the client code with for loops or comprehensions. Furthermore, I'm aware of Cython, or weave & Co., or implementation directly in C. However, the performance of the masked version above is sufficient for the moment. And for the moment I would like to keep things as simple as possible.

    Read the article

  • Red Gate Coder interviews: Alex Davies

    - by Michael Williamson
    Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

    Read the article

  • A way of doing real-world test-driven development (and some thoughts about it)

    - by Thomas Weller
    Lately, I exchanged some arguments with Derick Bailey about some details of the red-green-refactor cycle of the Test-driven development process. In short, the issue revolved around the fact that it’s not enough to have a test red or green, but it’s also important to have it red or green for the right reasons. While for me, it’s sufficient to initially have a NotImplementedException in place, Derick argues that this is not totally correct (see these two posts: Red/Green/Refactor, For The Right Reasons and Red For The Right Reason: Fail By Assertion, Not By Anything Else). And he’s right. But on the other hand, I had no idea how his insights could have any practical consequence for my own individual interpretation of the red-green-refactor cycle (which is not really red-green-refactor, at least not in its pure sense, see the rest of this article). This made me think deeply for some days now. In the end I found out that the ‘right reason’ changes in my understanding depending on what development phase I’m in. To make this clear (at least I hope it becomes clear…) I started to describe my way of working in some detail, and then something strange happened: The scope of the article slightly shifted from focusing ‘only’ on the ‘right reason’ issue to something more general, which you might describe as something like  'Doing real-world TDD in .NET , with massive use of third-party add-ins’. This is because I feel that there is a more general statement about Test-driven development to make:  It’s high time to speak about the ‘How’ of TDD, not always only the ‘Why’. Much has been said about this, and me myself also contributed to that (see here: TDD is not about testing, it's about how we develop software). But always justifying what you do is very unsatisfying in the long run, it is inherently defensive, and it costs time and effort that could be used for better and more important things. And frankly: I’m somewhat sick and tired of repeating time and again that the test-driven way of software development is highly preferable for many reasons - I don’t want to spent my time exclusively on stating the obvious… So, again, let’s say it clearly: TDD is programming, and programming is TDD. Other ways of programming (code-first, sometimes called cowboy-coding) are exceptional and need justification. – I know that there are many people out there who will disagree with this radical statement, and I also know that it’s not a description of the real world but more of a mission statement or something. But nevertheless I’m absolutely sure that in some years this statement will be nothing but a platitude. Side note: Some parts of this post read as if I were paid by Jetbrains (the manufacturer of the ReSharper add-in – R#), but I swear I’m not. Rather I think that Visual Studio is just not production-complete without it, and I wouldn’t even consider to do professional work without having this add-in installed... The three parts of a software component Before I go into some details, I first should describe my understanding of what belongs to a software component (assembly, type, or method) during the production process (i.e. the coding phase). Roughly, I come up with the three parts shown below:   First, we need to have some initial sort of requirement. This can be a multi-page formal document, a vague idea in some programmer’s brain of what might be needed, or anything in between. In either way, there has to be some sort of requirement, be it explicit or not. – At the C# micro-level, the best way that I found to formulate that is to define interfaces for just about everything, even for internal classes, and to provide them with exhaustive xml comments. The next step then is to re-formulate these requirements in an executable form. This is specific to the respective programming language. - For C#/.NET, the Gallio framework (which includes MbUnit) in conjunction with the ReSharper add-in for Visual Studio is my toolset of choice. The third part then finally is the production code itself. It’s development is entirely driven by the requirements and their executable formulation. This is the delivery, the two other parts are ‘only’ there to make its production possible, to give it a decent quality and reliability, and to significantly reduce related costs down the maintenance timeline. So while the first two parts are not really relevant for the customer, they are very important for the developer. The customer (or in Scrum terms: the Product Owner) is not interested at all in how  the product is developed, he is only interested in the fact that it is developed as cost-effective as possible, and that it meets his functional and non-functional requirements. The rest is solely a matter of the developer’s craftsmanship, and this is what I want to talk about during the remainder of this article… An example To demonstrate my way of doing real-world TDD, I decided to show the development of a (very) simple Calculator component. The example is deliberately trivial and silly, as examples always are. I am totally aware of the fact that real life is never that simple, but I only want to show some development principles here… The requirement As already said above, I start with writing down some words on the initial requirement, and I normally use interfaces for that, even for internal classes - the typical question “intf or not” doesn’t even come to mind. I need them for my usual workflow and using them automatically produces high componentized and testable code anyway. To think about their usage in every single situation would slow down the production process unnecessarily. So this is what I begin with: namespace Calculator {     /// <summary>     /// Defines a very simple calculator component for demo purposes.     /// </summary>     public interface ICalculator     {         /// <summary>         /// Gets the result of the last successful operation.         /// </summary>         /// <value>The last result.</value>         /// <remarks>         /// Will be <see langword="null" /> before the first successful operation.         /// </remarks>         double? LastResult { get; }       } // interface ICalculator   } // namespace Calculator So, I’m not beginning with a test, but with a sort of code declaration - and still I insist on being 100% test-driven. There are three important things here: Starting this way gives me a method signature, which allows to use IntelliSense and AutoCompletion and thus eliminates the danger of typos - one of the most regular, annoying, time-consuming, and therefore expensive sources of error in the development process. In my understanding, the interface definition as a whole is more of a readable requirement document and technical documentation than anything else. So this is at least as much about documentation than about coding. The documentation must completely describe the behavior of the documented element. I normally use an IoC container or some sort of self-written provider-like model in my architecture. In either case, I need my components defined via service interfaces anyway. - I will use the LinFu IoC framework here, for no other reason as that is is very simple to use. The ‘Red’ (pt. 1)   First I create a folder for the project’s third-party libraries and put the LinFu.Core dll there. Then I set up a test project (via a Gallio project template), and add references to the Calculator project and the LinFu dll. Finally I’m ready to write the first test, which will look like the following: namespace Calculator.Test {     [TestFixture]     public class CalculatorTest     {         private readonly ServiceContainer container = new ServiceContainer();           [Test]         public void CalculatorLastResultIsInitiallyNull()         {             ICalculator calculator = container.GetService<ICalculator>();               Assert.IsNull(calculator.LastResult);         }       } // class CalculatorTest   } // namespace Calculator.Test       This is basically the executable formulation of what the interface definition states (part of). Side note: There’s one principle of TDD that is just plain wrong in my eyes: I’m talking about the Red is 'does not compile' thing. How could a compiler error ever be interpreted as a valid test outcome? I never understood that, it just makes no sense to me. (Or, in Derick’s terms: this reason is as wrong as a reason ever could be…) A compiler error tells me: Your code is incorrect, but nothing more.  Instead, the ‘Red’ part of the red-green-refactor cycle has a clearly defined meaning to me: It means that the test works as intended and fails only if its assumptions are not met for some reason. Back to our Calculator. When I execute the above test with R#, the Gallio plugin will give me this output: So this tells me that the test is red for the wrong reason: There’s no implementation that the IoC-container could load, of course. So let’s fix that. With R#, this is very easy: First, create an ICalculator - derived type:        Next, implement the interface members: And finally, move the new class to its own file: So far my ‘work’ was six mouse clicks long, the only thing that’s left to do manually here, is to add the Ioc-specific wiring-declaration and also to make the respective class non-public, which I regularly do to force my components to communicate exclusively via interfaces: This is what my Calculator class looks like as of now: using System; using LinFu.IoC.Configuration;   namespace Calculator {     [Implements(typeof(ICalculator))]     internal class Calculator : ICalculator     {         public double? LastResult         {             get             {                 throw new NotImplementedException();             }         }     } } Back to the test fixture, we have to put our IoC container to work: [TestFixture] public class CalculatorTest {     #region Fields       private readonly ServiceContainer container = new ServiceContainer();       #endregion // Fields       #region Setup/TearDown       [FixtureSetUp]     public void FixtureSetUp()     {        container.LoadFrom(AppDomain.CurrentDomain.BaseDirectory, "Calculator.dll");     }       ... Because I have a R# live template defined for the setup/teardown method skeleton as well, the only manual coding here again is the IoC-specific stuff: two lines, not more… The ‘Red’ (pt. 2) Now, the execution of the above test gives the following result: This time, the test outcome tells me that the method under test is called. And this is the point, where Derick and I seem to have somewhat different views on the subject: Of course, the test still is worthless regarding the red/green outcome (or: it’s still red for the wrong reasons, in that it gives a false negative). But as far as I am concerned, I’m not really interested in the test outcome at this point of the red-green-refactor cycle. Rather, I only want to assert that my test actually calls the right method. If that’s the case, I will happily go on to the ‘Green’ part… The ‘Green’ Making the test green is quite trivial. Just make LastResult an automatic property:     [Implements(typeof(ICalculator))]     internal class Calculator : ICalculator     {         public double? LastResult { get; private set; }     }         One more round… Now on to something slightly more demanding (cough…). Let’s state that our Calculator exposes an Add() method:         ...   /// <summary>         /// Adds the specified operands.         /// </summary>         /// <param name="operand1">The operand1.</param>         /// <param name="operand2">The operand2.</param>         /// <returns>The result of the additon.</returns>         /// <exception cref="ArgumentException">         /// Argument <paramref name="operand1"/> is &lt; 0.<br/>         /// -- or --<br/>         /// Argument <paramref name="operand2"/> is &lt; 0.         /// </exception>         double Add(double operand1, double operand2);       } // interface ICalculator A remark: I sometimes hear the complaint that xml comment stuff like the above is hard to read. That’s certainly true, but irrelevant to me, because I read xml code comments with the CR_Documentor tool window. And using that, it looks like this:   Apart from that, I’m heavily using xml code comments (see e.g. here for a detailed guide) because there is the possibility of automating help generation with nightly CI builds (using MS Sandcastle and the Sandcastle Help File Builder), and then publishing the results to some intranet location.  This way, a team always has first class, up-to-date technical documentation at hand about the current codebase. (And, also very important for speeding up things and avoiding typos: You have IntelliSense/AutoCompletion and R# support, and the comments are subject to compiler checking…).     Back to our Calculator again: Two more R# – clicks implement the Add() skeleton:         ...           public double Add(double operand1, double operand2)         {             throw new NotImplementedException();         }       } // class Calculator As we have stated in the interface definition (which actually serves as our requirement document!), the operands are not allowed to be negative. So let’s start implementing that. Here’s the test: [Test] [Row(-0.5, 2)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) {     ICalculator calculator = container.GetService<ICalculator>();       Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } As you can see, I’m using a data-driven unit test method here, mainly for these two reasons: Because I know that I will have to do the same test for the second operand in a few seconds, I save myself from implementing another test method for this purpose. Rather, I only will have to add another Row attribute to the existing one. From the test report below, you can see that the argument values are explicitly printed out. This can be a valuable documentation feature even when everything is green: One can quickly review what values were tested exactly - the complete Gallio HTML-report (as it will be produced by the Continuous Integration runs) shows these values in a quite clear format (see below for an example). Back to our Calculator development again, this is what the test result tells us at the moment: So we’re red again, because there is not yet an implementation… Next we go on and implement the necessary parameter verification to become green again, and then we do the same thing for the second operand. To make a long story short, here’s the test and the method implementation at the end of the second cycle: // in CalculatorTest:   [Test] [Row(-0.5, 2)] [Row(295, -123)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) {     ICalculator calculator = container.GetService<ICalculator>();       Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); }   // in Calculator: public double Add(double operand1, double operand2) {     if (operand1 < 0.0)     {         throw new ArgumentException("Value must not be negative.", "operand1");     }     if (operand2 < 0.0)     {         throw new ArgumentException("Value must not be negative.", "operand2");     }     throw new NotImplementedException(); } So far, we have sheltered our method from unwanted input, and now we can safely operate on the parameters without further caring about their validity (this is my interpretation of the Fail Fast principle, which is regarded here in more detail). Now we can think about the method’s successful outcomes. First let’s write another test for that: [Test] [Row(1, 1, 2)] public void TestAdd(double operand1, double operand2, double expectedResult) {     ICalculator calculator = container.GetService<ICalculator>();       double result = calculator.Add(operand1, operand2);       Assert.AreEqual(expectedResult, result); } Again, I’m regularly using row based test methods for these kinds of unit tests. The above shown pattern proved to be extremely helpful for my development work, I call it the Defined-Input/Expected-Output test idiom: You define your input arguments together with the expected method result. There are two major benefits from that way of testing: In the course of refining a method, it’s very likely to come up with additional test cases. In our case, we might add tests for some edge cases like ‘one of the operands is zero’ or ‘the sum of the two operands causes an overflow’, or maybe there’s an external test protocol that has to be fulfilled (e.g. an ISO norm for medical software), and this results in the need of testing against additional values. In all these scenarios we only have to add another Row attribute to the test. Remember that the argument values are written to the test report, so as a side-effect this produces valuable documentation. (This can become especially important if the fulfillment of some sort of external requirements has to be proven). So your test method might look something like that in the end: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 2)] [Row(0, 999999999, 999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, double.MaxValue)] [Row(4, double.MaxValue - 2.5, double.MaxValue)] public void TestAdd(double operand1, double operand2, double expectedResult) {     ICalculator calculator = container.GetService<ICalculator>();       double result = calculator.Add(operand1, operand2);       Assert.AreEqual(expectedResult, result); } And this will produce the following HTML report (with Gallio):   Not bad for the amount of work we invested in it, huh? - There might be scenarios where reports like that can be useful for demonstration purposes during a Scrum sprint review… The last requirement to fulfill is that the LastResult property is expected to store the result of the last operation. I don’t show this here, it’s trivial enough and brings nothing new… And finally: Refactor (for the right reasons) To demonstrate my way of going through the refactoring portion of the red-green-refactor cycle, I added another method to our Calculator component, namely Subtract(). Here’s the code (tests and production): // CalculatorTest.cs:   [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtract(double operand1, double operand2, double expectedResult) {     ICalculator calculator = container.GetService<ICalculator>();       double result = calculator.Subtract(operand1, operand2);       Assert.AreEqual(expectedResult, result); }   [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtractGivesExpectedLastResult(double operand1, double operand2, double expectedResult) {     ICalculator calculator = container.GetService<ICalculator>();       calculator.Subtract(operand1, operand2);       Assert.AreEqual(expectedResult, calculator.LastResult); }   ...   // ICalculator.cs: /// <summary> /// Subtracts the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the subtraction.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is &lt; 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is &lt; 0. /// </exception> double Subtract(double operand1, double operand2);   ...   // Calculator.cs:   public double Subtract(double operand1, double operand2) {     if (operand1 < 0.0)     {         throw new ArgumentException("Value must not be negative.", "operand1");     }       if (operand2 < 0.0)     {         throw new ArgumentException("Value must not be negative.", "operand2");     }       return (this.LastResult = operand1 - operand2).Value; }   Obviously, the argument validation stuff that was produced during the red-green part of our cycle duplicates the code from the previous Add() method. So, to avoid code duplication and minimize the number of code lines of the production code, we do an Extract Method refactoring. One more time, this is only a matter of a few mouse clicks (and giving the new method a name) with R#: Having done that, our production code finally looks like that: using System; using LinFu.IoC.Configuration;   namespace Calculator {     [Implements(typeof(ICalculator))]     internal class Calculator : ICalculator     {         #region ICalculator           public double? LastResult { get; private set; }           public double Add(double operand1, double operand2)         {             ThrowIfOneOperandIsInvalid(operand1, operand2);               return (this.LastResult = operand1 + operand2).Value;         }           public double Subtract(double operand1, double operand2)         {             ThrowIfOneOperandIsInvalid(operand1, operand2);               return (this.LastResult = operand1 - operand2).Value;         }           #endregion // ICalculator           #region Implementation (Helper)           private static void ThrowIfOneOperandIsInvalid(double operand1, double operand2)         {             if (operand1 < 0.0)             {                 throw new ArgumentException("Value must not be negative.", "operand1");             }               if (operand2 < 0.0)             {                 throw new ArgumentException("Value must not be negative.", "operand2");             }         }           #endregion // Implementation (Helper)       } // class Calculator   } // namespace Calculator But is the above worth the effort at all? It’s obviously trivial and not very impressive. All our tests were green (for the right reasons), and refactoring the code did not change anything. It’s not immediately clear how this refactoring work adds value to the project. Derick puts it like this: STOP! Hold on a second… before you go any further and before you even think about refactoring what you just wrote to make your test pass, you need to understand something: if your done with your requirements after making the test green, you are not required to refactor the code. I know… I’m speaking heresy, here. Toss me to the wolves, I’ve gone over to the dark side! Seriously, though… if your test is passing for the right reasons, and you do not need to write any test or any more code for you class at this point, what value does refactoring add? Derick immediately answers his own question: So why should you follow the refactor portion of red/green/refactor? When you have added code that makes the system less readable, less understandable, less expressive of the domain or concern’s intentions, less architecturally sound, less DRY, etc, then you should refactor it. I couldn’t state it more precise. From my personal perspective, I’d add the following: You have to keep in mind that real-world software systems are usually quite large and there are dozens or even hundreds of occasions where micro-refactorings like the above can be applied. It’s the sum of them all that counts. And to have a good overall quality of the system (e.g. in terms of the Code Duplication Percentage metric) you have to be pedantic on the individual, seemingly trivial cases. My job regularly requires the reading and understanding of ‘foreign’ code. So code quality/readability really makes a HUGE difference for me – sometimes it can be even the difference between project success and failure… Conclusions The above described development process emerged over the years, and there were mainly two things that guided its evolution (you might call it eternal principles, personal beliefs, or anything in between): Test-driven development is the normal, natural way of writing software, code-first is exceptional. So ‘doing TDD or not’ is not a question. And good, stable code can only reliably be produced by doing TDD (yes, I know: many will strongly disagree here again, but I’ve never seen high-quality code – and high-quality code is code that stood the test of time and causes low maintenance costs – that was produced code-first…) It’s the production code that pays our bills in the end. (Though I have seen customers these days who demand an acceptance test battery as part of the final delivery. Things seem to go into the right direction…). The test code serves ‘only’ to make the production code work. But it’s the number of delivered features which solely counts at the end of the day - no matter how much test code you wrote or how good it is. With these two things in mind, I tried to optimize my coding process for coding speed – or, in business terms: productivity - without sacrificing the principles of TDD (more than I’d do either way…).  As a result, I consider a ratio of about 3-5/1 for test code vs. production code as normal and desirable. In other words: roughly 60-80% of my code is test code (This might sound heavy, but that is mainly due to the fact that software development standards only begin to evolve. The entire software development profession is very young, historically seen; only at the very beginning, and there are no viable standards yet. If you think about software development as a kind of casting process, where the test code is the mold and the resulting production code is the final product, then the above ratio sounds no longer extraordinary…) Although the above might look like very much unnecessary work at first sight, it’s not. With the aid of the mentioned add-ins, doing all the above is a matter of minutes, sometimes seconds (while writing this post took hours and days…). The most important thing is to have the right tools at hand. Slow developer machines or the lack of a tool or something like that - for ‘saving’ a few 100 bucks -  is just not acceptable and a very bad decision in business terms (though I quite some times have seen and heard that…). Production of high-quality products needs the usage of high-quality tools. This is a platitude that every craftsman knows… The here described round-trip will take me about five to ten minutes in my real-world development practice. I guess it’s about 30% more time compared to developing the ‘traditional’ (code-first) way. But the so manufactured ‘product’ is of much higher quality and massively reduces maintenance costs, which is by far the single biggest cost factor, as I showed in this previous post: It's the maintenance, stupid! (or: Something is rotten in developerland.). In the end, this is a highly cost-effective way of software development… But on the other hand, there clearly is a trade-off here: coding speed vs. code quality/later maintenance costs. The here described development method might be a perfect fit for the overwhelming majority of software projects, but there certainly are some scenarios where it’s not - e.g. if time-to-market is crucial for a software project. So this is a business decision in the end. It’s just that you have to know what you’re doing and what consequences this might have… Some last words First, I’d like to thank Derick Bailey again. His two aforementioned posts (which I strongly recommend for reading) inspired me to think deeply about my own personal way of doing TDD and to clarify my thoughts about it. I wouldn’t have done that without this inspiration. I really enjoy that kind of discussions… I agree with him in all respects. But I don’t know (yet?) how to bring his insights into the described production process without slowing things down. The above described method proved to be very “good enough” in my practical experience. But of course, I’m open to suggestions here… My rationale for now is: If the test is initially red during the red-green-refactor cycle, the ‘right reason’ is: it actually calls the right method, but this method is not yet operational. Later on, when the cycle is finished and the tests become part of the regular, automated Continuous Integration process, ‘red’ certainly must occur for the ‘right reason’: in this phase, ‘red’ MUST mean nothing but an unfulfilled assertion - Fail By Assertion, Not By Anything Else!

    Read the article

  • Web Browser Control &ndash; Specifying the IE Version

    - by Rick Strahl
    I use the Internet Explorer Web Browser Control in a lot of my applications to display document type layout. HTML happens to be one of the most common document formats and displaying data in this format – even in desktop applications, is often way easier than using normal desktop technologies. One issue the Web Browser Control has that it’s perpetually stuck in IE 7 rendering mode by default. Even though IE 8 and now 9 have significantly upgraded the IE rendering engine to be more CSS and HTML compliant by default the Web Browser control will have none of it. IE 9 in particular – with its much improved CSS support and basic HTML 5 support is a big improvement and even though the IE control uses some of IE’s internal rendering technology it’s still stuck in the old IE 7 rendering by default. This applies whether you’re using the Web Browser control in a WPF application, a WinForms app, a FoxPro or VB classic application using the ActiveX control. Behind the scenes all these UI platforms use the COM interfaces and so you’re stuck by those same rules. Rendering Challenged To see what I’m talking about here are two screen shots rendering an HTML 5 doctype page that includes some CSS 3 functionality – rounded corners and border shadows - from an earlier post. One uses IE 9 as a standalone browser, and one uses a simple WPF form that includes the Web Browser control. IE 9 Browser:   Web Browser control in a WPF form: The IE 9 page displays this HTML correctly – you see the rounded corners and shadow displayed. Obviously the latter rendering using the Web Browser control in a WPF application is a bit lacking. Not only are the new CSS features missing but the page also renders in Internet Explorer’s quirks mode so all the margins, padding etc. behave differently by default, even though there’s a CSS reset applied on this page. If you’re building an application that intends to use the Web Browser control for a live preview of some HTML this is clearly undesirable. Feature Delegation via Registry Hacks Fortunately starting with Internet Explore 8 and later there’s a fix for this problem via a registry setting. You can specify a registry key to specify which rendering mode and version of IE should be used by that application. These are not global mind you – they have to be enabled for each application individually. There are two different sets of keys for 32 bit and 64 bit applications. 32 bit: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION Value Key: yourapplication.exe 64 bit: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION Value Key: yourapplication.exe The value to set this key to is (taken from MSDN here) as decimal values: 9999 (0x270F) Internet Explorer 9. Webpages are displayed in IE9 Standards mode, regardless of the !DOCTYPE directive. 9000 (0x2328) Internet Explorer 9. Webpages containing standards-based !DOCTYPE directives are displayed in IE9 mode. 8888 (0x22B8) Webpages are displayed in IE8 Standards mode, regardless of the !DOCTYPE directive. 8000 (0x1F40) Webpages containing standards-based !DOCTYPE directives are displayed in IE8 mode. 7000 (0x1B58) Webpages containing standards-based !DOCTYPE directives are displayed in IE7 Standards mode.   The added key looks something like this in the Registry Editor: With this in place my Html Html Help Builder application which has wwhelp.exe as its main executable now works with HTML 5 and CSS 3 documents in the same way that Internet Explorer 9 does. Incidentally I accidentally added an ‘empty’ DWORD value of 0 to my EXE name and that worked as well giving me IE 9 rendering. Although not documented I suspect 0 (or an invalid value) will default to the installed browser. Don’t have a good way to test this but if somebody could try this with IE 8 installed that would be great: What happens when setting 9000 with IE 8 installed? What happens when setting 0 with IE 8 installed? Don’t forget to add Keys for Host Environments If you’re developing your application in Visual Studio and you run the debugger you may find that your application is still not rendering right, but if you run the actual generated EXE from Explorer or the OS command prompt it works. That’s because when you run the debugger in Visual Studio it wraps your application into a debugging host container. For this reason you might want to also add another registry key for yourapp.vshost.exe on your development machine. If you’re developing in Visual FoxPro make sure you add a key for vfp9.exe to see the rendering adjustments in the Visual FoxPro development environment. Cleaner HTML - no more HTML mangling! There are a number of additional benefits to setting up rendering of the Web Browser control to the IE 9 engine (or even the IE 8 engine) beyond the obvious rendering functionality. IE 9 actually returns your HTML in something that resembles the original HTML formatting, as opposed to the IE 7 default format which mangled the original HTML content. If you do the following in the WPF application: private void button2_Click(object sender, RoutedEventArgs e) { dynamic doc = this.webBrowser.Document; MessageBox.Show(doc.body.outerHtml); } you get different output depending on the rendering mode active. With the default IE 7 rendering you get: <BODY><DIV> <H1>Rounded Corners and Shadows - Creating Dialogs in CSS</H1> <DIV class=toolbarcontainer><A class=hoverbutton href="./"><IMG src="../../css/images/home.gif"> Home</A> <A class=hoverbutton href="RoundedCornersAndShadows.htm"><IMG src="../../css/images/refresh.gif"> Refresh</A> </DIV> <DIV class=containercontent> <FIELDSET><LEGEND>Plain Box</LEGEND><!-- Simple Box with rounded corners and shadow --> <DIV style="BORDER-BOTTOM: steelblue 2px solid; BORDER-LEFT: steelblue 2px solid; WIDTH: 550px; BORDER-TOP: steelblue 2px solid; BORDER-RIGHT: steelblue 2px solid" class="roundbox boxshadow"> <DIV style="BACKGROUND: khaki" class="boxcontenttext roundbox">Simple Rounded Corner Box. </DIV></DIV></FIELDSET> <FIELDSET><LEGEND>Box with Header</LEGEND> <DIV style="BORDER-BOTTOM: steelblue 2px solid; BORDER-LEFT: steelblue 2px solid; WIDTH: 550px; BORDER-TOP: steelblue 2px solid; BORDER-RIGHT: steelblue 2px solid" class="roundbox boxshadow"> <DIV class="gridheaderleft roundbox-top">Box with a Header</DIV> <DIV style="BACKGROUND: khaki" class="boxcontenttext roundbox-bottom">Simple Rounded Corner Box. </DIV></DIV></FIELDSET> <FIELDSET><LEGEND>Dialog Style Window</LEGEND> <DIV style="POSITION: relative; WIDTH: 450px" id=divDialog class="dialog boxshadow" jQuery16107208195684204002="2"> <DIV style="POSITION: relative" class=dialog-header> <DIV class=closebox></DIV>User Sign-in <DIV class=closebox jQuery16107208195684204002="3"></DIV></DIV> <DIV class=descriptionheader>This dialog is draggable and closable</DIV> <DIV class=dialog-content><LABEL>Username:</LABEL> <INPUT name=txtUsername value=" "> <LABEL>Password</LABEL> <INPUT name=txtPassword value=" "> <HR> <INPUT id=btnLogin value=Login type=button> </DIV> <DIV class=dialog-statusbar>Ready</DIV></DIV></FIELDSET> </DIV> <SCRIPT type=text/javascript>     $(document).ready(function () {         $("#divDialog")             .draggable({ handle: ".dialog-header" })             .closable({ handle: ".dialog-header",                 closeHandler: function () {                     alert("Window about to be closed.");                     return true;  // true closes - false leaves open                 }             });     }); </SCRIPT> </DIV></BODY> Now lest you think I’m out of my mind and create complete whacky HTML rooted in the last century, here’s the IE 9 rendering mode output which looks a heck of a lot cleaner and a lot closer to my original HTML of the page I’m accessing: <body> <div>         <h1>Rounded Corners and Shadows - Creating Dialogs in CSS</h1>     <div class="toolbarcontainer">         <a class="hoverbutton" href="./"> <img src="../../css/images/home.gif"> Home</a>         <a class="hoverbutton" href="RoundedCornersAndShadows.htm"> <img src="../../css/images/refresh.gif"> Refresh</a>     </div>         <div class="containercontent">     <fieldset>         <legend>Plain Box</legend>                <!-- Simple Box with rounded corners and shadow -->             <div style="border: 2px solid steelblue; width: 550px;" class="roundbox boxshadow">                              <div style="background: khaki;" class="boxcontenttext roundbox">                     Simple Rounded Corner Box.                 </div>             </div>     </fieldset>     <fieldset>         <legend>Box with Header</legend>         <div style="border: 2px solid steelblue; width: 550px;" class="roundbox boxshadow">                          <div class="gridheaderleft roundbox-top">Box with a Header</div>             <div style="background: khaki;" class="boxcontenttext roundbox-bottom">                 Simple Rounded Corner Box.             </div>         </div>     </fieldset>       <fieldset>         <legend>Dialog Style Window</legend>         <div style="width: 450px; position: relative;" id="divDialog" class="dialog boxshadow">             <div style="position: relative;" class="dialog-header">                 <div class="closebox"></div>                 User Sign-in             <div class="closebox"></div></div>             <div class="descriptionheader">This dialog is draggable and closable</div>                    <div class="dialog-content">                             <label>Username:</label>                 <input name="txtUsername" value=" " type="text">                 <label>Password</label>                 <input name="txtPassword" value=" " type="text">                                 <hr/>                                 <input id="btnLogin" value="Login" type="button">                        </div>             <div class="dialog-statusbar">Ready</div>         </div>     </fieldset>     </div> <script type="text/javascript">     $(document).ready(function () {         $("#divDialog")             .draggable({ handle: ".dialog-header" })             .closable({ handle: ".dialog-header",                 closeHandler: function () {                     alert("Window about to be closed.");                     return true;  // true closes - false leaves open                 }             });     }); </script>        </div> </body> IOW, in IE9 rendering mode IE9 is much closer (but not identical) to the original HTML from the page on the Web that we’re reading from. As a side note: Unfortunately, the browser feature emulation can't be applied against the Html Help (CHM) Engine in Windows which uses the Web Browser control (or COM interfaces anyway) to render Html Help content. I tried setting up hh.exe which is the help viewer, to use IE 9 rendering but a help file generated with CSS3 features will simply show in IE 7 mode. Bummer - this would have been a nice quick fix to allow help content served from CHM files to look better. HTML Editing leaves HTML formatting intact In the same vane, if you do any inline HTML editing in the control by setting content to be editable, IE 9’s control does a much more reasonable job of creating usable and somewhat valid HTML. It also leaves the original content alone other than the text your are editing or adding. No longer is the HTML output stripped of excess spaces and reformatted in IEs format. So if I do: private void button3_Click(object sender, RoutedEventArgs e) { dynamic doc = this.webBrowser.Document; doc.body.contentEditable = true; } and then make some changes to the document by typing into it using IE 9 mode, the document formatting stays intact and only the affected content is modified. The created HTML is reasonably clean (although it does lack proper XHTML formatting for things like <br/> <hr/>). This is very different from IE 7 mode which mangled the HTML as soon as the page was loaded into the control. Any editing you did stripped out all white space and lost all of your existing XHTML formatting. In IE 9 mode at least *most* of your original formatting stays intact. This is huge! In Html Help Builder I have supported HTML editing for a long time but the HTML mangling by the Web Browser control made it very difficult to edit the HTML later. Previously IE would mangle the HTML by stripping out spaces, upper casing all tags and converting many XHTML safe tags to its HTML 3 tags. Now IE leaves most of my document alone while editing, and creates cleaner and more compliant markup (with exception of self-closing elements like BR/HR). The end result is that I now have HTML editing in place that's much cleaner and actually capable of being manually edited. Caveats, Caveats, Caveats It wouldn't be Internet Explorer if there weren't some major compatibility issues involved in using this various browser version interaction. The biggest thing I ran into is that there are odd differences in some of the COM interfaces and what they return. I specifically ran into a problem with the document.selection.createRange() function which with IE 7 compatibility returns an expected text range object. When running in IE 8 or IE 9 mode however. I could not retrieve a valid text range with this code where loEdit is the WebBrowser control: loRange = loEdit.document.selection.CreateRange() The loRange object returned (here in FoxPro) had a length property of 0 but none of the other properties of the TextRange or TextRangeCollection objects were available. I figured this was due to some changed security settings but even after elevating the Intranet Security Zone and mucking with the other browser feature flags pertaining to security I had no luck. In the end I relented and used a JavaScript function in my editor document that returns a selection range object: function getselectionrange() { var range = document.selection.createRange(); return range; } and call that JavaScript function from my host applications code: *** Use a function in the document to get around HTML Editing issues loRange = loEdit.document.parentWindow.getselectionrange(.f.) and that does work correctly. This wasn't a big deal as I'm already loading a support script file into the editor page so all I had to do is add the function to this existing script file. You can find out more how to call script code in the Web Browser control from a host application in a previous post of mine. IE 8 and 9 also clamp down the security environment a little more than the default IE 7 control, so there may be other issues you run into. Other than the createRange() problem above I haven't seen anything else that is breaking in my code so far though and that's encouraging at least since it uses a lot of HTML document manipulation for the custom editor I've created (and would love to replace - any PROFESSIONAL alternatives anybody?) Registry Key Installation for your Application It’s important to remember that this registry setting is made per application, so most likely this is something you want to set up with your installer. Also remember that 32 and 64 bit settings require separate settings in the registry so if you’re creating your installer you most likely will want to set both keys in the registry preemptively for your application. I use Tarma Installer for all of my application installs and in Tarma I configure registry keys for both and set a flag to only install the latter key group in the 64 bit version: Because this setting is application specific you have to do this for every application you install unfortunately, but this also means that you can safely configure this setting in the registry because it is after only applied to your application. Another problem with install based installation is version detection. If IE 8 is installed I’d want 8000 for the value, if IE 9 is installed I want 9000. I can do this easily in code but in the installer this is much more difficult. I don’t have a good solution for this at the moment, but given that the app works with IE 7 mode now, IE 9 mode is just a bonus for the moment. If IE 9 is not installed and 9000 is used the default rendering will remain in use.   It sure would be nice if we could specify the IE rendering mode as a property, but I suspect the ActiveX container has to know before it loads what actual version to load up and once loaded can only load a single version of IE. This would account for this annoying application level configuration… Summary The registry feature emulation has been available for quite some time, but I just found out about it today and started experimenting around with it. I’m stoked to see that this is available as I’d pretty much given up in ever seeing any better rendering in the Web Browser control. Now at least my apps can take advantage of newer HTML features. Now if we could only get better HTML Editing support somehow <snicker>… ah can’t have everything.© Rick Strahl, West Wind Technologies, 2005-2011Posted in .NET  FoxPro  Windows  

    Read the article

< Previous Page | 120 121 122 123 124 125  | Next Page >