Search Results

Search found 1242 results on 50 pages for 'costs'.

Page 50/50 | < Previous Page | 46 47 48 49 50

Sending Javascript variables.

- by shinjuo

I have a page that allows a user to choose some things in a form and it will calculate the weight using javascript. It breaks it up into 5 variables that I need to send to another page. Originally I was just having it put the variable into a text box and then I was posting that text box. However I dont want to have 5 text boxes. So now I need to somehow send or post the five variables to another page. Here is my javascript function. I need to post weightBoxOne - weightBoxFive js function function getWeight(){ var weightBoxOne; var weightBoxTwo; var totalWeight; var box = .5; var quantity = document.dhform.quantity.value; var cardSize = document.dhform.cardSize.value; if(cardSize == 0.0141){ if(quantity <= 1000){ weightBoxOne = (quantity * cardSize) + box; totalWeight = weightBoxOne; }else if(quantity > 1000 && quantity <= 2000){ weightBoxOne = (1000 * cardSize) + box; weightBoxTwo = ((quantity - 1000) * cardSize) + box; totalWeight = weightBoxOne + weightBoxTwo; }else if(quantity > 2000 && quantity <= 3000){ weightBoxOne = (1000 * cardSize) + box; weightBoxTwo = (1000 * cardSize) + box; weightBoxThree = ((quantity - 2000) * cardSize) + box; totalWeight = weightBoxOne + weightBoxTwo + weightBoxThree; }else if(quantity > 3000 && quantity <= 4000){ weightBoxOne = (1000 * cardSize) + box; weightBoxTwo = (1000 * cardSize) + box; weightBoxThree = (1000 * cardSize) + box; weightBoxFour = ((quantity - 3000) * cardSize) + box; totalWeight = weightBoxOne + weightBoxTwo + weightBoxThree + weightBoxFour; }else{ weightBoxOne = (1000 * cardSize) + box; weightBoxTwo = (1000 * cardSize) + box; weightBoxThree = (1000 * cardSize) + box; weightBoxFour = (1000 * cardSize) + box; weightBoxFive = ((quantity - 4000) * cardSize) + box; totalWeight = weightBoxOne + weightBoxTwo + weightBoxThree + weightBoxFour + weightBoxFive; } }else if(cardSize == 0.00949){ if(quantity <= 4000){ weightBoxOne = (quantity * cardSize) + box; totalWeight = weightBoxOne; }else{ weightBoxOne = (4000 * cardSize) + box; weightBoxTwo = ((quantity - 4000) * cardSize) + box; totalWeight = weightBoxOne + weightBoxTwo; } } document.rates.weight.value = totalWeight; } //--> this is the form that was originally posting <form action="getRates.php" name="rates" method="post" onSubmit="popupform(this, 'join')"> <table style="width: 216px"> <tr> <td style="width: 115px; height: 49px;"><span class="style16">Weight</span><br/> <input type="text" id="weight" name="weight" size="10" maxlength="4"/> </td> <td align="right" style="width: 68px; height: 49px;" valign="top"><span class="style16">Zip Code</span><br/> <input type="text" id="zip" name="zip" size="10" maxlength="5"/> </td> </tr> <tr> <td style="width: 115px"> <input name="submit" type="submit" value="Get Rate Costs" style="width: 138px" />

Read the article
Effective optimization strategies on modern C++ compilers

- by user168715

I'm working on scientific code that is very performance-critical. An initial version of the code has been written and tested, and now, with profiler in hand, it's time to start shaving cycles from the hot spots. It's well-known that some optimizations, e.g. loop unrolling, are handled these days much more effectively by the compiler than by a programmer meddling by hand. Which techniques are still worthwhile? Obviously, I'll run everything I try through a profiler, but if there's conventional wisdom as to what tends to work and what doesn't, it would save me significant time. I know that optimization is very compiler- and architecture- dependent. I'm using Intel's C++ compiler targeting the Core 2 Duo, but I'm also interested in what works well for gcc, or for "any modern compiler." Here are some concrete ideas I'm considering: Is there any benefit to replacing STL containers/algorithms with hand-rolled ones? In particular, my program includes a very large priority queue (currently a std::priority_queue) whose manipulation is taking a lot of total time. Is this something worth looking into, or is the STL implementation already likely the fastest possible? Along similar lines, for std::vectors whose needed sizes are unknown but have a reasonably small upper bound, is it profitable to replace them with statically-allocated arrays? I've found that dynamic memory allocation is often a severe bottleneck, and that eliminating it can lead to significant speedups. As a consequence I'm interesting in the performance tradeoffs of returning large temporary data structures by value vs. returning by pointer vs. passing the result in by reference. Is there a way to reliably determine whether or not the compiler will use RVO for a given method (assuming the caller doesn't need to modify the result, of course)? How cache-aware do compilers tend to be? For example, is it worth looking into reordering nested loops? Given the scientific nature of the program, floating-point numbers are used everywhere. A significant bottleneck in my code used to be conversions from floating point to integers: the compiler would emit code to save the current rounding mode, change it, perform the conversion, then restore the old rounding mode --- even though nothing in the program ever changed the rounding mode! Disabling this behavior significantly sped up my code. Are there any similar floating-point-related gotchas I should be aware of? One consequence of C++ being compiled and linked separately is that the compiler is unable to do what would seem to be very simple optimizations, such as move method calls like strlen() out of the termination conditions of loop. Are there any optimization like this one that I should look out for because they can't be done by the compiler and must be done by hand? On the flip side, are there any techniques I should avoid because they are likely to interfere with the compiler's ability to automatically optimize code? Lastly, to nip certain kinds of answers in the bud: I understand that optimization has a cost in terms of complexity, reliability, and maintainability. For this particular application, increased performance is worth these costs. I understand that the best optimizations are often to improve the high-level algorithms, and this has already been done.

Read the article
When is a SQL function not a function?

- by Rob Farley

Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function. SELECT GETDATE(), SomeDatetimeColumn FROM dbo.SomeTable; There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on. But it’s rubbish. Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”) I’m going to consider the three main types of user-defined functions in SQL Server: Scalar Inline Table-Valued Multi-statement Table-Valued I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions. Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’. If it’s not inline, it’s rubbish. It really is. Let’s start by considering the two kinds of table-valued function, and compare them. These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database. CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int) RETURNS TABLE AS RETURN ( SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ) ; GO CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int) RETURNS @results TABLE ( EmployeeLogin nvarchar(512), OrderDate datetime, SalesOrderID int ) AS BEGIN INSERT @results (EmployeeLogin, OrderDate, SalesOrderID) SELECT e.LoginID, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ; RETURN END ; GO You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter. Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on. Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column. Trickery. Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing... To see what’s actually going on, let’s look at the Estimated plan. Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires. And the cost of the TVF is 57% of the batch! But it gets worse. Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column. Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed. A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic. But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation. The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog. Now how about scalar functions... Suppose we wanted a scalar function to return the count of these. CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int) RETURNS int AS BEGIN RETURN ( SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); END ; GO Notice the evil words? They’re required. Try to remove them, you just get an error. That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions. Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan: SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on. If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999. If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged. You see, this 0.0337999 is the cost of running the scalar function ONCE. But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years. And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here. The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here. Let’s look at a better way. Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this: SELECT e.LoginID, y.year, (SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = p.SalesPersonID AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101') AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101') ) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this). Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly. To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok. CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int) RETURNS table AS RETURN (SELECT COUNT(*) as NumSales FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); GO But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator. SELECT e.LoginID, y.year, n.NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n; And now, we get the plan that we want for this query. All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way. It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions). So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it. @rob_farley

Read the article
The Windows Store... why did I sign up with this mess again?

- by FransBouma

Yesterday, Microsoft revealed that the Windows Store is now open to all developers in a wide range of countries and locations. For the people who think "wtf is the 'Windows Store'?", it's the central place where Windows 8 users will be able to find, download and purchase applications (or as we now have to say to not look like a computer illiterate: <accent style="Kentucky">aaaaappss</accent>) for Windows 8. As this is the store which is integrated into Windows 8, it's an interesting place for ISVs, as potential customers might very well look there first. This of course isn't true for all kinds of software, and developer tools in general aren't the kind of applications most users will download from the Windows store, but a presence there can't hurt. Now, this Windows Store hosts two kinds of applications: 'Metro-style' applications and 'Desktop' applications. The 'Metro-style' applications are applications created for the new 'Metro' UI which is present on Windows 8 desktop and Windows RT (the single color/big font fingerpaint-oriented UI). 'Desktop' applications are the applications we all run and use on Windows today. Our software are desktop applications. The Windows Store hosts all Metro-style applications locally in the store and handles the payment for these applications. This means you upload your application (sorry, 'app') to the store, jump through a lot of hoops, Microsoft verifies that your application is not violating a tremendous long list of rules and after everything is OK, it's published and hopefully you get customers and thus earn money. Money which Microsoft will pay you on a regular basis after customers buy your application. Desktop applications are not following this path however. Desktop applications aren't hosted by the Windows Store. Instead, the Windows Store more or less hosts a page with the application's information and where to get the goods. I.o.w.: it's nothing more than a product's Facebook page. Microsoft will simply redirect a visitor of the Windows Store to your website and the visitor will then use your site's system to purchase and download the application. This last bit of information is very important. So, this morning I started with fresh energy to register our company 'Solutions Design bv' at the Windows Store and our two applications, LLBLGen Pro and ORM Profiler. First I went to the Windows Store dashboard page. If you don't have an account, you have to log in or sign up if you don't have a live account. I signed in with my live account. After that, it greeted me with a page where I had to fill in a code which was mailed to me. My local mail server polls every several minutes for email so I had to kick it to get it immediately. I grabbed the code from the email and I was presented with a multi-step process to register myself as a company or as an individual. In red I was warned that this choice was permanent and not changeable. I chuckled: Microsoft apparently stores its data on paper, not in digital form. I chose 'company' and was presented with a lengthy form to fill out. On the form there were two strange remarks: Per company there can just be 1 (one, uno, not zero, not two or more) registered developer, and only that developer is able to upload stuff to the store. I have no idea how this works with large companies, oh the overhead nightmares... "Sorry, but John, our registered developer with the Windows Store is on holiday for 3 months, backpacking through Australia, no, he's not reachable at this point. M'yeah, sorry bud. Hey, did you fill in those TPS reports yesterday?" A separate Approver has to be specified, which has to be a different person than the registered developer. Apparently to Microsoft a company with just 1 person is not a company. Luckily we're with two people! *pfew*, dodged that one, otherwise I would be stuck forever: the choice I already made was not reversible! After I had filled out the form and it was all well and good and accepted by the Microsoft lackey who had to write it all down in some paper notebook ("Hey, be warned! It's a permanent choice! Written down in ink, can't be changed!"), I was presented with the question how I wanted to pay for all this. "Pay for what?" I wondered. Must be the paper they were scribbling the information on, I concluded. After all, there's a financial crisis going on! How could I forget! Silly me. "Ok fair enough". The price was 75 Euros, not the end of the world. I could only pay by credit card, so it was accepted quickly. Or so I thought. You see, Microsoft has a different idea about CC payments. In the normal world, you type in your CC number, some date, a name and a security code and that's it. But Microsoft wants to verify this even more. They want to make a verification purchase of a very small amount and are doing that with a special code in the description. You then have to type in that code in a special form in the Windows Store dashboard and after that you're verified. Of course they'll refund the small amount they pull from your card. Sounds simple, right? Well... no. The problem starts with the fact that I can't see the CC activity on some website: I have a bank issued CC card. I get the CC activity once a month on a piece of paper sent to me. The bank's online website doesn't show them. So it's possible I have to wait for this code till October 12th. One month. "So what, I'm not going to use it anyway, Desktop applications don't use the payment system", I thought. "Haha, you're so naive, dear developer!" Microsoft won't allow you to publish any applications till this verification is done. So no application publishing for a month. Wouldn't it be nice if things were, you know, digital, so things got done instantly? But of course, that lackey who scribbled everything in the Big Windows Store Registration Book isn't that quick. Can't blame him though. He's just doing his job. Now, after the payment was done, I was presented with a page which tells me Microsoft is going to use a third party company called 'Symantec', which will verify my identity again. The page explains to me that this could be done through email or phone and that they'll contact the Approver to verify my identity. "Phone?", I thought... that's a little drastic for a developer account to publish a single page of information about an external hosted software product, isn't it? On Facebook I just added a page, done. And paying you, Microsoft, took less information: you were happy to take my money before my identity was even 'verified' by this 3rd party's minions! "Double standards!", I roared. No-one cared. But it's the thought of getting it off your chest, you know. Luckily for me, everyone at Symantec was asleep when I was registering so they went for the fallback option in case phone calls were not possible: my Approver received an email. Imagine you have to explain the idiot web of security theater I was caught in to someone else who then has to reply a random person over the internet that I indeed was who I said I was. As she's a true sweetheart, she gave me the benefit of the doubt and assured that for now, I was who I said I was. Remember, this is for a desktop application, which is only a link to a website, some pictures and a piece of text. No file hosting, no payment processing, nothing, just a single page. Yeah, I also thought I was crazy. But we're not at the end of this quest yet. I clicked around in the confusing menus of the Windows Store dashboard and found the 'Desktop' section. I get a helpful screen with a warning in red that it can't find any certified 'apps'. True, I'm just getting started, buddy. I see a link: "Check the Windows apps you submitted for certification". Well, I haven't submitted anything, but let's see where it brings me. Oh the thrill of adventure! I click the link and I end up on this site: the hardware/desktop dashboard account registration. "Erm... but I just registered...", I mumbled to no-one in particular. Apparently for desktop registration / verification I have to register again, it tells me. But not only that, the desktop application has to be signed with a certificate. And not just some random el-cheapo certificate you can get at any mall's discount store. No, this certificate is special. It's precious. This certificate, the 'Microsoft Authenticode' Digital Certificate, is the only certificate that's acceptable, and jolly, it can be purchased from VeriSign for the price of only ... $99.-, but be quick, because this is a limited time offer! After that it's, I kid you not, $499.-. 500 dollars for a certificate to sign an executable. But, I do feel special, I got a special price. Only for me! I'm glowing. Not for long though. Here I started to wonder, what the benefit of it all was. I now again had to pay money for a shiny certificate which will add 'Solutions Design bv' to our installer as the publisher instead of 'unknown', while our customers download the file from our website. Not only that, but this was all about a Desktop application, which wasn't hosted by Microsoft. They only link to it. And make no mistake. These prices aren't single payments. Every year these have to be renewed. Like a membership of an exclusive club: you're special and privileged, but only if you cough up the dough. To give you an example how silly this all is: I added LLBLGen Pro and ORM Profiler to the Visual Studio Gallery some time ago. It's the same thing: it's a central place where one can find software which adds to / extends / works with Visual Studio. I could simply create the pages, add the information and they show up inside Visual Studio. No files are hosted at Microsoft, they're downloaded from our website. Exactly the same system. As I have to wait for the CC transcripts to arrive anyway, I can't proceed with publishing in this new shiny store. After the verification is complete I have to wait for verification of my software by Microsoft. Even Desktop applications need to be verified using a long list of rules which are mainly focused on Metro-style applications. Even while they're not hosted by Microsoft. I wonder what they'll find. "Your application wasn't approved. It violates rule 14 X sub D: it provides more value than our own competing framework". While I was writing this post, I tried to check something in the Windows Store Dashboard, to see whether I remembered it correctly. I was presented again with the question, after logging in with my live account, to enter the code that was just mailed to me. Not the previous code, a brand new one. Again I had to kick my mail server to pull the email to proceed. This was it. This 'experience' is so beyond miserable, I'm afraid I have to say goodbye for now to the 'Windows Store'. It's simply not worth my time. Now, about live accounts. You might know this: live accounts are tied to everything you do with Microsoft. So if you have an MSDN subscription, e.g. the one which costs over $5000.-, it's tied to this same live account. But the fun thing is, you can login with your live account to the MSDN subscriptions with just the account id and password. No additional code is mailed to you. While it gives you access to all Microsoft software available, including your licenses. Why the draconian security theater with this Windows Store, while all I want is to publish some desktop applications while on other Microsoft sites it's OK to simply sign in with your live account: no codes needed, no verification and no certificates? Microsoft, one thing you need with this store and that's: apps. Apps, apps, apps, apps, aaaaaaaaapps. Sorry, my bad, got carried away. I just can't stand the word 'app'. This store's shelves have to be filled to the brim with goods. But instead of being welcomed into the store with open arms, I have to fight an uphill battle with an endless list of rules and bullshit to earn the privilege to publish in this shiny store. As if I have to be thrilled to be one of the exclusive club called 'Windows Store Publishers'. As if Microsoft doesn't want it to succeed. Craig Stuntz sent me a link to an old blog post of his regarding code signing and uploading to Microsoft's old mobile store from back in the WinMo5 days: http://blogs.teamb.com/craigstuntz/2006/10/11/28357/. Good read and good background info about how little things changed over the years. I hope this helps Microsoft make things more clearer and smoother and also helps ISVs with their decision whether to go with the Windows Store scheme or ignore it. For now, I don't see the advantage of publishing there, especially not with the nonsense rules Microsoft cooked up. Perhaps it changes in the future, who knows.

Read the article
CodePlex Daily Summary for Tuesday, May 31, 2011

CodePlex Daily Summary for Tuesday, May 31, 2011Popular ReleasesNearforums - ASP.NET MVC forum engine: Nearforums v6.0: Version 6.0 of Nearforums, the ASP.NET MVC Forum Engine, containing new features: Authentication using Membership Provider for SQL Server and MySql Spam prevention: Flood Control Moderation: Flag messages Content management: Pages: Create pages (about us/contact/texts) through web administration Allow nearforums to run as an IIS subapp Migrated Facebook Connect to OAuth 2.0 Visit the project Roadmap for more details.NetOffice - The easiest way to use Office in .NET: NetOffice Release 0.8b: Changes: - fix critical issue 15922(AccessViolationException) once and for all update is strongly recommended Includes: - Runtime Binaries and Source Code for .NET Framework:......v2.0, v3.0, v3.5, v4.0 - Tutorials in C# and VB.Net:..............................................................COM Proxy Management, Events, etc. - Examples in C# and VB.Net:............................................................Excel, Word, Outlook, PowerPoint, Access - COMAddin Examples in C# and VB....Facebook Graph Toolkit: Facebook Graph Toolkit 1.5.4186: Updates the API in response to Facebook's recent change of policy: All Graph Api accessing feeds or posts must provide a AccessToken.SharePoint Farm Poster: SharePoint Farm Poster: SharePoint Farm Poster is generated by a PowerShell Script. Run this script under the Farm Admin Account. After downloading, unblock the file in the Property Window. Current version is beta : v0.3.0VCC: Latest build, v2.1.40530.0: Automatic drop of latest buildServiio for Windows Home Server: Beta Release 0.5.2.0: Ready for widespread beta. Synchronized build number to Serviio version to avoid confusion.AcDown????? - Anime&Comic Downloader: AcDown????? v3.0 Beta4: ??AcDown?????????????，??????????????，????、????。?????Acfun????? ????32??64? Windows XP/Vista/7 ????????????? ??：????????Windows XP???,?????????.NET Framework 2.0???(x86)?.NET Framework 2.0???(x64)，?????"?????????"??? ??v3.0 Beta4 2011-5-31?? ???Bilibili.us????? ???? ?? ???"????" ???Bilibili.us??? ??????? ?? ??????? ?? ???????? ?? ?? ???Bilibili.us?????(??????????????????) ??????(6.cn)?????(????) ?? ?????Acfun?????????? ?????????????? ???QQ???????? ????????????Discussion...Terraria Map Generator: TerrariaMapTool 1.0.0.2 Beta: Version 1.0.0.2 Beta Release - Now has a Gui - Draws backgrounds (May still not be exact) - Hopefully fixed support on DirectX 9 machine.CodeCopy Auto Code Converter: Code Copy v0.1: Full add-in, setup project source code and setup fileEnhSim: EnhSim 2.4.5 ALPHA: 2.4.5 ALPHAThis release supports WoW patch 4.1 at level 85 To use this release, you must have the Microsoft Visual C++ 2010 Redistributable Package installed. This can be downloaded from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=A7B7A05E-6DE6-4D3A-A423-37BF0912DB84 To use the GUI you must have the .NET 4.0 Framework installed. This can be downloaded from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=9cfb2d51-5ff4-4491-b0e5-b386f32c0992 - Added in the T12 s...TerrariViewer: TerrariViewer v2.4.1: Added Piggy Bank editor and fixed some minor bugs.Kooboo CMS: Kooboo CMS 3.02: What is new in kooboo cms 3.02 The most important updates of this version is the Kooboo site builder, an unique and creative web design tool, design an professional website and export to Kooboo CMS. See: http://www.sitekin.com Add Version contorl on View, Layout and other elements. Add user CMS language selection, user can select a language to use on their CMS backend. Add User profile provider, you can use now stop website user information on a SQL database. Previously it stored on XML...mojoPortal: 2.3.6.6: see release notes on mojoportal.com http://www.mojoportal.com/mojoportal-2366-released Note that we have separate deployment packages for .NET 3.5 and .NET 4.0 The deployment package downloads on this page are pre-compiled and ready for production deployment, they contain no C# source code. To download the source code see the Source Code Tab I recommend getting the latest source code using TortoiseHG, you can get the source code corresponding to this release here.Terraria World Creator: Terraria World Creator: Version 1.01 Fixed a bug that would cause the application to crash. Re-named the Application.VidCoder: 0.9.0: New startup UI for one-click scanning of discs or opening a file/folder. New seek bar on the preview window to make switching previews easier (you can click anywhere on the bar). Added gradient backgrounds to the main window to visually group the sections. Added Open Video File and Open Video Folder options to the File menu. Moved preview button to be in line with the other control buttons. Fixed settings getting in a weird state if they were saved without an output folder being chos...General Media Access WebService: 0.2.0.0 Beta: Updated GMA release with sorting/ordering mechanisms. Several bug fixes.Microsoft All-In-One Code Framework - a centralized code sample library: All-In-One Code Framework 2011-05-26: Alternatively, you can install Sample Browser or Sample Browser VS extension, and download the code samples from Sample Browser. Improved and Newly Added Examples:For an up-to-date code sample index, please refer to All-In-One Code Framework Sample Catalog. NEW Samples for Dynamics Sample Description Owner CSDynamicsNAVWebServices The code sample shows syntax for calling Dynamics NAV Web Services. Lars Lohndorf-Larsen NEW Samples for WPF Sample Description Owner CSWPFDataGridCustomS...Terraria World Viewer: Version 1.1: Update May 26th Added Chest Filtering, this allows chests only containing certain items to have their symbol drawn. (Its under advanced settings tab) GUI elements (checkboxes/etc) are persistant between uses of the application Beta Worlds (i.e. Release #38) will work properly Symbols can be enabled or disabled on a per symbol basis Chest Information tab which is just a dump of the current chest information Meterorite is now visible as a bright magenta pink Application defaults to ...MVC Controls Toolkit: Mvc Controls Toolkit 1.1 RC: *Added: Compatibility with jQuery 1.6.1 Rendering of enumerables with images and/or customizable strings improved the client side tempate engine added new parameters to the template definition binding all new knockout bindings helpers have been fully implemented added a new overload for defining the client-side ViewModel The SetTme method has the option to store the theme in a permanent cookie If no CSS class is provided for the watermark of a TypedTextBox the watermark class of the current t...patterns & practices: Project Silk: Project Silk - Documentation Only Drop - May 24: To get the latest code, please see the previous drop here. Guidance Chapters Ready for Review The following chapters (provided in CHM or PDF format) are ready for community review. Our team very much appreciates your feedback and technical review. All documentation feedback should be posted in the Issue Tracker; if required, a document can be attached along with the feedback. Architecture jQuery UI Widgets Server-Side Implementation Security Unit Testing Web Applications Widget Q...New Projects#liveDB: liveDB is an in-memory database engine for Microsoft .NET providing full ACID support, lightning fast performance and offering a significant reduction of development and operational costs. liveDB is built on Live Domain Technology(TM).8 hours: 8hours Private studyABox2d: A port of Box 2d game engine doing it has an exercise to study how the game engine work.ADempiere.NET: If I have enough time and support I we will translate this into .NETAlmonaster: Almonaster is a turn-based multi-player war game. It is free for all players and comes with absolutely no warranty. The game is fully web-based and requires no downloads, Javascript, Java or ActiveX controls. ASPone API: ASPone partnerské API (aplikacní programové rozhraní) je rozhraní pro vytvorené a urcené pro partnery spolecnosti ASPone, s.r.o. Pomocí tohoto aplikacního rozhraní mužete zautomatizovat radu úkonu, které by pomocí webového rozhraní mohly být casove nárocnejší nebo vyžadují interakci cloveka. API umožnuje zautomatizovat radu úkonu souvisejících se správou domén, doménových kontaktu, webhostingu, databází, serveru a mnoha dalších. Pro zjednodušení práce s API jsou již pripraveni dva ukázkový...CodeCopy Auto Code Converter: This add-in project converts c# and vb.net codes in visual studio.drms: Data Resource Management SystemDrop Down CheckBoxList control (DropDownCheckBoxes): DropDownCheckBoxes is an ASP.NET server control directly inheriting standard ASP.NET CheckBoxList control and fully it supports parent's API (except members responsible for rendering and styling). Thus in most cases CheckBoxList control can be simply replaced with DropDownCheckBoxes with no need to change any data binding code or event handlers. In normal state the control is displayed as a select (DropDownList) control. Clicking the expand button shows a list with check boxes. When the se...Extended Registration module for Orchard CMS: This project has a dependency on the Contrib.Profile module. With this module enabled, users must fill out any parts you add to the User ContentItem in the Registration page. Ideal if you require additional information from your users.GreenWay: Car navigation softwareHost Profiles: Host Profiles is small tool to control, switch and management the hosts file of the computer. The hosts file is located in "c:\windows\system32\drivers\etc\hosts".HRM System MVVM sample code: This is the sample WPF MVVM application that i've described in my blog posts. I hope to give you a clear view of mvvm and other commonly used patterns.Mi Game Library: Ever wanted to store all the games you own into one place that you, could then later come see and search also with your own personal wish list!Micorrhiza: Micorrhiza is a client-server solution written in C# for voice- and video-communications between users in local and global networks.MPlayer.NET for Windows Forms & WPF: MPlayer.NET is a wrapper around MPlayer executable. It's developed on .NET platform and includes visual controls for both Windows Forms and WPF applications.MyGet - NuGet-as-a-Service: This project is the source for http://myget.org. MyGet offers you the possibility to create your own, private, filtered NuGet feed for use in the Visual Studio Package Manager. It can contain packages from the official NuGet feed as well as your private packages, hosted on MyGet.MZExtensions: A collection of handy C# Extension Methods.NCAds: NCadsNetSync: Universal file synchronization agent.OLE 1C7.7: OLE 1C7.7 ?????????? ??????? ??? ??????? ? 1?7.7 ????????? OLE ??????????.Pear 2.5: Pear 2.5 is a web browser which has MetroUI which is also known for WP7. Pear 2.5's graphics is totally made up with MetroUI and looks stunning when browse. This version has 3 builds - 2 alpha builds and 1 gamma delta (beta) build. It's developed in VB.NET which is the easiest.ProjectOne: ProjectOne is a Open Community Information Sharing Website regarding Realty as its primary source.russomi: russomiSopaco Server Foundation 1.x: The one earlier version of my server infrastructure(SSF, Sopaco Server Foundation 1.x, owned by ??)。 Network Layer Based On MINA， message meta in 1.x is hard coded to 6bytes message header like this struct NetworkMessageHeader { short msgId; int msgLength; } struct NetworkMTray Timer: A simple timer/stopwatch which runs fromt he system tray. I started it as a hobby learning project to understand the Win32 API. Now open sourcing it to get more inputs about the same, and at the same time it may prove helpful to othersVENSOFT DIPERCAX: Proyecto Final del Curso de Proyectos II de la Universidad Privada del NorteWindows Phone Blog Menu: A Silverlight navigation control that looks like a Windows Phone 7. The live tiles are links to websites. Use this control on your blog or website to show your love for WP7. It is a creative way to link to external sites you are interested in.

Read the article
Stored Procedures with SSRS? Hmm… not so much

- by Rob Farley

Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports. Let me say that again, in case you missed it the first time: I want to have a whinge about people who insist on using stored procedures for SSRS reports. Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this: create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin /* First check that @eomdate is a valid date */ if isdate(@eomdate) != 1 begin select 'Please enter a valid date' as ErrorMessage; return; end /* Then check that time has passed since @eomdate */ if datediff(day,@eomdate,sysdatetime()) < 5 begin select 'Sorry - EOM is not complete yet' as ErrorMessage; return; end /* If those checks have succeeded, return the data */ select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error. The other check is maybe fair enough, but I still don’t like it. The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment. If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set. Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set. Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY. When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!) So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think. set fmtonly on exec dbo.GetMonthSummaryPerSalesPerson '20030401'; set fmtonly off Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back. But what I do get back hurts Reporting Services. It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them. It would have been much better to push these checks down into the query itself. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date. First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation. …compared to one that uses a future date: You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller. The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it. Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0. That’s right – it doesn’t run. So much for reading plans right-to-left... The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false. Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID /* Now include the error messages */ union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 order by SalesPersonID; end But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions. That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.) create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) returns table as return ( select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 ); We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data. And we get so much more flexibility with this. If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago. So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like: select SalesPersonID from dbo.GetMonthSummaryPerSalesPerson(@eomonth) order by SalesPersonID; This doesn’t need to look up the TotalDue field, which makes a simpler plan. select * from dbo.GetMonthSummaryPerSalesPerson(@eomonth) where SalesPersonID is not null order by SalesPersonID; This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report. If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation? Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish. @rob_farley

Read the article
CodePlex Daily Summary for Tuesday, March 20, 2012

CodePlex Daily Summary for Tuesday, March 20, 2012Popular ReleasesNearforums - ASP.NET MVC forum engine: Nearforums v8.0: Version 8.0 of Nearforums, the ASP.NET MVC Forum Engine, containing new features: Internationalization Custom authentication provider Access control list for forums and threads Webdeploy package checksum: abc62990189cf0d488ef915d4a55e4b14169bc01BIDS Helper: BIDS Helper 1.6: This beta release is the first to support SQL Server 2012 (in addition to SQL Server 2005, 2008, and 2008 R2). Since it is marked as a beta release, we are looking for bug reports in the next few months as you use BIDS Helper on real projects. In addition to getting all existing BIDS Helper functionality working appropriately in SQL Server 2012 (SSDT), the following features are new... Analysis Services Tabular Smart Diff Tabular Actions Editor Tabular HideMemberIf Tabular Pre-Build ...JavaScript Web Resource Manager for Microsoft Dynamics CRM 2011: JavaScript Web Resource Manager (1.2.1420.191): BUG FIXED : When loading scripts from disk, the import of the web resource didn't do anything When scripts were saved to disk, it wasn't possible to edit them with an editorSQL Monitor - managing sql server performance: SQLMon 4.2 alpha 12: 1. improved process visualizer, now shows how many dead locks, and what are the locked objects 2. fixed some other problems.Json.NET: Json.NET 4.5 Release 1: New feature - Windows 8 Metro build New feature - JsonTextReader automatically reads ISO strings as dates New feature - Added DateFormatHandling to control whether dates are written in the MS format or ISO format, with ISO as the default New feature - Added DateTimeZoneHandling to control reading and writing DateTime time zone details New feature - Added async serialize/deserialize methods to JsonConvert New feature - Added Path to JsonReader/JsonWriter/ErrorContext and exceptions w...SCCM Client Actions Tool: SCCM Client Actions Tool v1.11: SCCM Client Actions Tool v1.11 is the latest version. It comes with following changes since last version: Fixed a bug when ping and cmd.exe kept running in endless loop after action progress was finished. Fixed update checking from Codeplex RSS feed. The tool is downloadable as a ZIP file that contains four files: ClientActionsTool.hta – The tool itself. Cmdkey.exe – command line tool for managing cached credentials. This is needed for alternate credentials feature when running the HTA...WebSocket4Net: WebSocket4Net 0.5: Changes in this release fixed the wss's default port bug improved JsonWebSocket supported set client access policy protocol for silverlight fixed a handshake issue in Silverlight fixed a bug that "Host" field in handshake hadn't contained port if the port is not default supported passing in Origin parameter for handshaking supported reacting pings from server side fixed a bug in data sending fixed the bug sending a closing handshake with no message which would cause an excepti...SuperWebSocket, a .NET WebSocket Server: SuperWebSocket 0.5: Changes included in this release: supported closing handshake queue checking improved JSON subprotocol supported sending ping from server to client fixed a bug about sending a closing handshake with no message refactored the code to improve protocol compatibility fixed a bug about sub protocol configuration loading in Mono improved BasicSubProtocol added JsonWebSocketSessionDaun Management Studio: Daun Management Studio 0.1 (Alpha Version): These are these the alpha application packages for Daun Management Studio to manage MongoDB Server. Please visit our official website http://www.daun-project.comSurvey™ - web survey & form engine: Survey™ 2.0: The new stable Survey™ Project 2.0.0.1 version contains many new features like: Technical changes: - Use of Jquery, ASTreeview, Tabs, Tooltips and new menuprovider Features & Bugfixes: Survey list and search function Folder structure for surveys New Menustructure Library list New Library fields User list and search functions Layout options for a survey with CSS, page header and footer New IP filter security feature Enhanced Token Management New Question fields as ID, Alias...RiP-Ripper & PG-Ripper: RiP-Ripper 2.9.28: changes NEW: Added Support for "PixHub.eu" linksSmartNet: V1.0.0.0: DY SmartNet ?????? V1.0callisto: callisto 2.0.21: Added an option to disable local host detection.Javascript .NET: Javascript .NET v0.6: Upgraded to the latest stable branch of v8 (/tags/3.9.18), and switched to using their scons build system. We no longer include v8 source code as part of this project's source code. Simultaneous multithreaded use of v8 now supported (v8 Isolates), although different contexts may not share objects or call each other. 64-bit .Net 4.0 DLL now included. (Download now includes x86 and x64 for both .Net 3.5 and .Net 4.0.)MyRouter (Virtual WiFi Router): MyRouter 1.0.6: This release should be more stable there were a few bug fixes including the x64 issue as well as an error popping up when MyRouter started this was caused by a NULL valueGoogle Books Downloader for Windows: Google Books Downloader-2.0.0.0.: Google Books DownloaderFinestra Virtual Desktops: 2.5.4501: This is a very minor update release. Please see the information about the 2.5 and 2.5.4500 releases for more information on recent changes. This update did not even have an automatic update triggered for it. Adds error checking and reporting to all threads, not only those with message loopsAcDown????? - Anime&Comic Downloader: AcDown????? v3.9.2: ?? ●AcDown??????????、??、??????，????1M，????，????，?????????????????????????。???????????Acfun、????(Bilibili)、??、??、YouTube、??、???、??????、SF????、????????????。??????AcPlay?????，??????、????????????????。 ● AcDown???????????????????????????，???，???????????????????。 ● AcDown???????C#??，????.NET Framework 2.0??。?????"Acfun?????"。 ????32??64? Windows XP/Vista/7/8 ????????????? ??：????????Windows XP???,?????????.NET Framework 2.0???(x86)，?????"?????????"??? ??????????????，??????????： ??"AcDo...ArcGIS Editor for OpenStreetMap: ArcGIS Editor for OSM 2.0 Release Candidate: Your feedback is welcome - and this is your last chance to get your fixes in for this version! Includes installer for both Feature Server extension and Desktop extension, enhanced functionality for the Desktop tools, and enhanced built-in Javascript Editor for the Feature Server component. This release candidate includes fixes to beta 4 that accommodate domain users for setting up the Server Component, and fixes for reporting/uploading references tracked in the revision table. See Code In-P...C.B.R. : Comic Book Reader: CBR 0.6: 20 Issue trackers are closed and a lot of bugs too Localize view is now MVVM and delete is working. Added the unused flag (take care that it goes to true only when displaying screen elements) Backstage - new input/output format choice control for the conversion Backstage - Add display, behaviour and register file type options in the extended options dialog Explorer list view has been transformed to a custom control. New group header, colunms order and size are saved Single insta...New Projects{3S} SQL Smart Security "Protect your T-SQL know-how!": {3S} SQL Smart Security is an add-in which can be installed in Microsoft SQL Server Management Studio (SSMS). It enables software companies to create a secured content for database objects. The add-in brings much higher level of protection in comparison with SQL Server built in WITH ENCRYPTION feature.BETA - Content Slider for SharePoint 2010 / Office 365: SharePoint Banner / SharePoint 2010 Sliding Banner / Content Slider tools in Office 365/ Sliding Content in SharePoint is a general tool which could be used for sliding Banners or any other sliding content to be placed on any Office 365 / SharePoint 2010 / SharePoint Foundation.BF3 Development Server: The main issue of this project is to deliver a test server to all developers working on RCon (Remote-Administration-Console) Tools for Battlefield 3. Actually the only possibility to test the work made is to hire a real Game Server. BizTalk Server 2010 TCP/IP Adapter: This project is migration of existing BizTalk server 2009 TCPIP adapter to BizTalk server 2010. I have made few configuration changes which are making this adapter and installation compatible to BizTalk Server 2010. I have not modified adapter source code.BryhtCommon: It`s for easy to develope WP7 ,it contains some useful method Bug.Net Defect Tracking Components: Bug.Net server-side controls and components to add defect (bug) tracking to your current ASP.NET website.Customize Survey With Client Object Model: Customize OOB Survey/Vote/Poll in Share?Point With Client Object Model Visit http://swatipoint.blogspot.in/2011/12/sharepoint-client-object-modellist.html for more detailsDropboxToDo: A simple todo, synchronizing by Dropbox or, in future, by SkydriveFoggy: Foggy is a WPF dashboard for FogBugz information.Fort Myers High School Website: A website for Fort Myers High School in Fort Myers, Florida. This website will allow for both students and parents to better interact with the school. Developed in ASP.net (C#).Gonte.DataAccess: Data access for NET. It's developed in C#.Gonte.ObjectModel: Metadata about objects. It's developed in C#Gonte.SqlGenerator: Sql Generator It's developed in C#.kLib: This project space is for datastructures and classes, which should always be available. Any developer should use these in their projects. Liuyi.Phone.CharmScreen: Liuyi.Phone.CharmScreen Liuyi windows phone appLoU: Lord of Ultima helper suite.Managing Supplies: This WP7 project is able to manage your own suppliesNAV Fixed Assets 2012: Changes related to 'Dossier Fiscal' in Microsoft Dynamics NAV 2009 - New Model 30 - Changes to model 31 and model 32NCAA Tourney DotNetNuke Module: The NCAA Tourney is a DotNetNuke 3.X - 6.X module that allows you to add a NCAA tournament to your portal. You can allow users to record their picks for the tournament and then manage the outcome of the tournament calculating the winner of your tourney based on customizable point system. The module has been designed to be very user friendly and efficient for the end user as well as the administrator of the tournament.nothing here anymore: nothing here anymoreOrchard Dream Store Project: A simple website using Orchard CMS. For a school projectPowerShell Management Library for TEM: A project to provide a PowerShell functionality for managing your Tivoli Endpoint Manager (built upon BigFix technology). You can locally or remotely manage endpoints and relays via these simple and easy to use PowerShell Module.PrismWebBuilder: Web Builder ProjectProjeto Northwind: Northwind - FPUQLCF: QLCFShipwire API: Shipwire makes it easier for consumers of Shipwire's international shipment fulfilment service to integrate their XML API quickly and easily. Current features are: Inventory Service Rate Service (Shipping Costs) Future features are: Order Entry Service Order Tracking ServiceSmith XNA tools: Smith's XNA tools is a set of useful that i make to improve some basics features to XNA and make the Game Design More Easy.ST Recover: ST Recover can read Atari ST floppy disks on a PC under Windows, including special formats as 800 or 900 KB and damaged or desynchronized disks, and produces standard .ST disk image files. Then the image files can be read in ST emulators as WinSTon or Steem.SyncSMS: Windows desktop client for the Android App SyncSMS. This code is not affiliated with SyncSMS in any way,tempzz: tempzzTruxtor: Truxtor modular concept for electronic gadgetsTT SA TEST1: TT SA Test 1VisualQuantCode: Neuroquants is a library in c# for quants It's developed in c#.Weeps: Generate Bass and drum line in type of midi to be guitar's backing track that was playing by userxxtest: xxtest???-????? "???????????": ?????-????????? ???????????? Digital Design 2012. ??? ????. ?????? ?: ?????????????????? ??????? ?????. ??????? ?????????? ????????? ????????. ?????????? ?????????????? ??????? ??????? ? ????? ???????????? Digital Design 2012.????? ???????????? Digital Design 2012. ??? ????. ?????? ?: ?????????????????? ??????? ?????. ??????? ?????????? ????????? ????????. ?????????? ?????????????? ??????? ??????? ? ????? ???????????? Digital Design 2012.

Read the article
Project Management Helps AmeriCares Deliver International Aid

- by Sylvie MacKenzie, PMP

Excerpt from PROFIT - ORACLE - by Alison Weiss Handle with Care Sound project management helps AmeriCares bring international aid to those in need. The stakes are always high for AmeriCares. On a mission to restore health and save lives during times of disaster, the nonprofit international relief and humanitarian aid organization delivers donated medicines, medical supplies, and humanitarian aid to people in the U.S. and around the globe. Founded in 1982 with the express mission of responding as quickly and efficiently as possible to help people in need, the Stamford, Connecticut-based AmeriCares has delivered more than US$10.5 billion in aid to 147 countries over the past three decades. Launch the Slideshow “It’s critically important to us that we steward all the donations and that the medical supplies and medicines get to people as quickly as possible with no loss,” says Kate Sears, senior vice president for finance and technology at AmeriCares. “Whether we’re shipping IV solutions to victims of cholera in Haiti or antibiotics to Somali famine victims, we need to get the medicines there sooner because it means more people will be helped and lives improved or even saved.” Ten years ago, the tracking systems used by AmeriCares associates were paper-based. In recent years, staff started using spreadsheets, but the tracking processes were not standardized between teams. “Every team was tracking completely different information,” says Megan McDermott, senior associate, Sub-Saharan Africa partnerships, at AmeriCares. “It was just a few key things. For example, we tracked the date a shipment was supposed to arrive and the date we got reports from our partner that a hospital received aid on their end.” While the data was accurate, much detail was being lost in the process. AmeriCares management knew it could do a better job of tracking this enterprise data and in 2011 took a significant step by implementing Oracle’s Primavera P6 Professional Project Management. “It’s a comprehensive solution that has helped us improve the monitoring and controlling processes. It has allowed us to do our distribution better,” says Sears. In addition, the implementation effort has been a change agent, helping AmeriCares leadership rethink project management across the entire organization. Initially, much of the focus was on standardizing processes, but staff members also learned the importance of thinking proactively to prevent possible problems and evaluating results to determine if goals and objectives are truly being met. Such data about process efficiency and overall results is critical not only to AmeriCares staff but also to the donors supporting the organization’s life-saving missions. Efficiency Saves Lives One of AmeriCares’ core operations is to gather product donations from the private sector, establish where the most-urgent needs are, and solicit monetary support to send the aid via ocean cargo or airlift to welfare- and health-oriented nongovernmental organizations, hospitals, health networks, and government ministries based in areas in need. In 2011 alone, AmeriCares sent more than 3,500 shipments to 95 countries in response to both ongoing humanitarian needs and more than two dozen emergencies, including deadly tornadoes and storms in the U.S. and the devastating tsunami in Japan. When it comes to nonprofits in general, donors want to know that the charitable organizations they support are using funds wisely. Typically, nonprofits are evaluated by donors in terms of efficiency, an area where AmeriCares has an excellent reputation: 98 percent of expenses go directly to supporting programs and less than 2 percent represent administrative and fundraising costs. Donors, however, should look at more than simple efficiency, says Peter York, senior partner and chief research and learning officer at TCC Group, a nonprofit consultancy headquartered in New York, New York. They should also look at whether organizations have the systems in place to sustain their missions and continue to thrive. An expert on nonprofit organizational management, York has spent years studying sustainable charitable organizations. He defines them as nonprofits that are able to achieve the ongoing financial support to stay relevant and continue doing core mission work. In his analysis of well over 2,500 larger nonprofits, York has found that many are not sustaining, and are actually scaling back in size. “One of the biggest challenges of nonprofit sustainability is the general public’s perception that every dollar donated has to go only to the delivery of service,” says York. “What our data shows is that there are some fundamental capacities that have to be there in order for organizations to sustain and grow.” York’s research highlights the importance of data-driven leadership at successful nonprofits. “You’ve got to have the tools, the systems, and the technologies to get objective information on what you do, the people you serve, and the results you’re achieving,” says York. “If leaders don’t have the knowledge and the data, they can’t make the strategic decisions about programs to take organizations to the next level.” Historically, AmeriCares associates have used time-tested and cost-effective strategies to ship and then track supplies from donation to delivery to their destinations in designated time frames. When disaster strikes, AmeriCares ships by air and generally pulls out all the stops to deliver the most urgently needed aid within the first few days and weeks. Then, as situations stabilize, AmeriCares turns to delivering sea containers for the postemergency and ongoing aid so often needed over the long term. According to McDermott, getting a shipment out the door is fairly complicated, requiring as many as five different AmeriCares teams collaborating together. The entire process can take months—from when products are received in the warehouse and deciding which recipients to allocate supplies to, to getting customs and governmental approvals in place, actually shipping products, and finally ensuring that the products are received in-country. Delivering that aid is no small affair. “Our volume exceeds half a billion dollars a year worth of donated medicines and medical supplies, so it’s a sizable logistical operation to bring these products in and get them out to the right place quickly to have the most impact,” says Sears. “We really pride ourselves on our controls and efficiencies.” Adding to that complexity is the fact that the longer it takes to deliver aid, the more dire the human need can be. Any time AmeriCares associates can shave off the complicated aid delivery process can translate into lives saved. “It’s really being able to track information consistently that will help us to see where are the bottlenecks and where can we work on improving our processes,” says McDermott. Setting a Standard Productivity and information management improvements were key objectives for AmeriCares when staff began the process of implementing Oracle’s Primavera solution. But before configuring the software, the staff needed to take the time to analyze the systems already in place. According to Greg Loop, manager of database systems at AmeriCares, the organization received guidance from several consultants, including Rich D’Addario, consulting project manager in the Primavera Global Business Unit at Oracle, who was instrumental in shepherding the critical requirements-gathering phase. D’Addario encouraged staff to begin documenting shipping processes by considering the order in which activities occur and which ones are dependent on others to get accomplished. This exercise helped everyone realize that to be more efficient, they needed to keep track of shipments in a more standard way. “The staff didn’t recognize formal project management methodology,” says D’Addario. “But they did understand what the most important things are and that if they go wrong, an entire project can go off course.” Before, if a boatload of supplies was being sent to Haiti and there was a problem somewhere, a lot of time was taken up finding out where the problem was—because staff was not tracking things in a standard way. As a result, even more time was needed to find possible solutions to the problem and alert recipients that the aid might be delayed. “For everyone to put on the project manager hat and standardize the way every single thing is done means that now the whole organization is on the same page as to what needs to occur from the time a hurricane hits Haiti and when a boat pulls in to unload supplies,” says D’Addario. With so much care taken to put a process foundation firmly in place, configuring the Primavera solution was actually quite simple. Specific templates were set up for different types of shipments, and dashboards were implemented to provide executives with clear overviews of every project in the system. AmeriCares’ Loop reports that system planning, refining, and testing, followed by writing up documentation and training, took approximately four months. The system went live in spring 2011 at AmeriCares’ Connecticut headquarters. While the nonprofit has an international presence, with warehouses in Europe and offices in Haiti, India, Japan, and Sri Lanka, most donated medicines come from U.S. entities and are shipped from the U.S. out to the rest of the world. In addition, all shipments are tracked from the U.S. office. AmeriCares doesn’t expect the Primavera system to take months off the shipping time, especially for sea containers. However, any time saved is still important because it will allow aid to be delivered to people more quickly at a lower overall cost. “If we can trim a day or two here or there, that can translate into lives that we’re saving, especially in emergency situations,” says Sears. A Cultural Change Beyond the measurable benefits that come with IT-driven process improvement, AmeriCares management is seeing a change in culture as a result of the Primavera project. One change has been treating every shipment of aid as a project, and everyone involved with facilitating shipments as a project manager. “This is a revolutionary concept for us,” says McDermott. “Before, we were used to thinking we were doing logistics—getting a container from point A to point B without looking at it as one project and really understanding what it meant to manage it.” AmeriCares staff is also happy to report that collaboration within the organization is much more efficient. When someone creates a shipment in the Primavera system, the same shared template is used, which means anyone can log in to the system to see the status of a shipment. Knowledgeable staff can access a shipment project to help troubleshoot a problem. Management can easily check the status of projects across the organization. “Dashboards are really useful,” says McDermott. “Instead of going into the details of each project, you can just see the high-level real-time information at a glance.” The new system is helping team members focus on proactively managing shipments rather than simply reacting when problems occur. For example, when a container is shipped, documents must be included for customs clearance. Now, the shipping template has built-in reminders to prompt team members to ask for copies of these documents from freight forwarders and to follow up with partners to discover if a shipment is on time. In the past, staff may not have worked on securing these documents until they’d been notified a shipment had arrived in-country. Another benefit of capturing and adopting best practices within the Primavera system is that staff training is easier. “Capturing the processes in documented steps and milestones allows us to teach new staff members how to do their jobs faster,” says Sears. “It provides them with the knowledge of their predecessors so they don’t have to keep reinventing the wheel.” With the Primavera system already generating positive results, management is eager to take advantage of advanced capabilities. Loop is working on integrating the company’s proprietary inventory management system with the Primavera system so that when logistics or warehousing operators input data, the information will automatically go into the Primavera system. In the past, this information had to be manually keyed into spreadsheets, often leading to errors. Mining Historical Data Another feature on the horizon for AmeriCares is utilizing Primavera P6 Professional Project Management reporting capabilities. As the system begins to include more historical data, management soon will be able to draw on this information to conduct analysis that has not been possible before and create customized reports. For example, at the beginning of the shipment process, staff will be able to use historical data to more accurately estimate how long the approval process should take for a particular country. This could help ensure that food and medicine with limited shelf lives do not get stuck in customs or used beyond their expiration dates. The historical data in the Primavera system will also help AmeriCares with better planning year to year. The nonprofit’s staff has always put together a plan at the beginning of the year, but this has been very challenging simply because it is impossible to predict disasters. Now, management will be able to look at historical data and see trends and statistics as they set current objectives and prepare for future need. In addition, this historical data will provide AmeriCares management with the ability to review year-end data and compare actual project results with goals set at the beginning of the year—to see if desired outcomes were achieved and if there are areas that need improvement. It’s this type of information that is so valuable to donors. And, according to York, project management software can play a critical role in generating the data to help nonprofits sustain and grow. “It is important to invest in systems to help replicate, expand, and deliver services,” says York. “Project management software can help because it encourages nonprofits to examine program or service changes and how to manage moving forward.” Sears believes that AmeriCares donors will support the return on investment the organization will achieve with the Primavera solution. “It won’t be financial returns, but rather how many more people we can help for a given dollar or how much more quickly we can respond to a need,” says Sears. “I think donors are receptive to such arguments.” And for AmeriCares, it is all about the future and increasing results. The project management environment currently may be quite simple, but IT staff plans to expand the complexity and functionality as the organization grows in its knowledge of project management and the goals it wants to achieve. “As we use the system over time, we’ll continue to refine our best practices and accumulate more data,” says Sears. “It will advance our ability to make better data-driven decisions.”

Read the article
Searching for Windows User SID's in C#

- by Ubiquitous Che

Context Context first - issues I'm trying to resolve are below. One of our clients has asked as to quote how long it would take for us to improve one of our applications. This application currently provides basic user authentication in the form of username/password combinations. This client would like the ability for their employees to log-in using the details of whatever Windows User account is currently logged in at the time of running the application. It's not a deal-breaker if I tell them know - but the client might be willing to pay the costs of development to add this feature to the application. It's worth looking into. Based on my hunting around, it seems like storing the user login details against Domain\Username will be problematic if those details are changed. But Windows User SID's aren't supposed to change at all. I've got the impression that it would be best to record Windows Users by SID - feel free to relieve me of that if I'm wrong. I've been having a fiddle with some Windows API calls. From within C#, grabbing the current user's SID is easy enough. I can already take any user's SID and process it using LookupAccountSid to get username and domain for display purposes. For the interested, my code for this is at the end of this post. That's just the tip of the iceberg, however. The two issues below are completely outside my experience. Not only do I not know how to implement them - I don't even known how to find out how to implement them, or what the pitfalls are on various systems. Any help getting myself aimed in the right direction would be very much appreciated. Issue 1) Getting hold of the local user at runtime is meaningless if that user hasn't been granted access to the application. We will need to add a new section to our application's 'administrator console' for adding Windows Users (or groups) and assigning within-app permissions against those users. Something like an 'Add Windows User Login' button that will raise a pop-up window that will allow the user to search for available Windows User accounts on the network (not just the local machine) to be added to the list of available application logins. If there's already a component in .NET or Windows that I can shanghai into doing this for me, it would make me a very happy man. Issue 2) I also want to know how to take a given Windows User SID and check it against a given Windows User Group (probably taken from a database). I'm not sure how to get started with this one either, though I expect it to be easier than the issue above. For the Interested [STAThread] static void Main(string[] args) { MessageBox.Show(WindowsUserManager.GetAccountNameFromSID(WindowsIdentity.GetCurrent().User.Value)); MessageBox.Show(WindowsUserManager.GetAccountNameFromSID("S-1-5-21-57989841-842925246-1957994488-1003")); } public static class WindowsUserManager { public static string GetAccountNameFromSID(string SID) { try { StringBuilder name = new StringBuilder(); uint cchName = (uint)name.Capacity; StringBuilder referencedDomainName = new StringBuilder(); uint cchReferencedDomainName = (uint)referencedDomainName.Capacity; WindowsUserManager.SID_NAME_USE sidUse; int err = (int)ESystemError.ERROR_SUCCESS; if (!WindowsUserManager.LookupAccountSid(null, SID, name, ref cchName, referencedDomainName, ref cchReferencedDomainName, out sidUse)) { err = Marshal.GetLastWin32Error(); if (err == (int)ESystemError.ERROR_INSUFFICIENT_BUFFER) { name.EnsureCapacity((int)cchName); referencedDomainName.EnsureCapacity((int)cchReferencedDomainName); err = WindowsUserManager.LookupAccountSid(null, SID, name, ref cchName, referencedDomainName, ref cchReferencedDomainName, out sidUse) ? (int)ESystemError.ERROR_SUCCESS : Marshal.GetLastWin32Error(); } } if (err != (int)ESystemError.ERROR_SUCCESS) throw new ApplicationException(String.Format("Could not retrieve acount name from SID. {0}", SystemExceptionManager.GetDescription(err))); return String.Format(@"{0}\{1}", referencedDomainName.ToString(), name.ToString()); } catch (Exception ex) { if (ex is ApplicationException) throw ex; throw new ApplicationException("Could not retrieve acount name from SID", ex); } } private enum SID_NAME_USE { SidTypeUser = 1, SidTypeGroup, SidTypeDomain, SidTypeAlias, SidTypeWellKnownGroup, SidTypeDeletedAccount, SidTypeInvalid, SidTypeUnknown, SidTypeComputer } [DllImport("advapi32.dll", EntryPoint = "GetLengthSid", CharSet = CharSet.Auto)] private static extern int GetLengthSid(IntPtr pSID); [DllImport("advapi32.dll", SetLastError = true)] private static extern bool ConvertStringSidToSid( string StringSid, out IntPtr ptrSid); [DllImport("advapi32.dll", CharSet = CharSet.Auto, SetLastError = true)] private static extern bool LookupAccountSid( string lpSystemName, [MarshalAs(UnmanagedType.LPArray)] byte[] Sid, StringBuilder lpName, ref uint cchName, StringBuilder ReferencedDomainName, ref uint cchReferencedDomainName, out SID_NAME_USE peUse); private static bool LookupAccountSid( string lpSystemName, string stringSid, StringBuilder lpName, ref uint cchName, StringBuilder ReferencedDomainName, ref uint cchReferencedDomainName, out SID_NAME_USE peUse) { byte[] SID = null; IntPtr SID_ptr = IntPtr.Zero; try { WindowsUserManager.ConvertStringSidToSid(stringSid, out SID_ptr); int err = SID_ptr == IntPtr.Zero ? Marshal.GetLastWin32Error() : (int)ESystemError.ERROR_SUCCESS; if (SID_ptr == IntPtr.Zero || err != (int)ESystemError.ERROR_SUCCESS) throw new ApplicationException(String.Format("'{0}' could not be converted to a SID byte array. {1}", stringSid, SystemExceptionManager.GetDescription(err))); int size = (int)GetLengthSid(SID_ptr); SID = new byte[size]; Marshal.Copy(SID_ptr, SID, 0, size); } catch (Exception ex) { if (ex is ApplicationException) throw ex; throw new ApplicationException(String.Format("'{0}' could not be converted to a SID byte array. {1}.", stringSid, ex.Message), ex); } finally { // Always want to release the SID_ptr (if it exists) to avoid memory leaks. if (SID_ptr != IntPtr.Zero) Marshal.FreeHGlobal(SID_ptr); } return WindowsUserManager.LookupAccountSid(lpSystemName, SID, lpName, ref cchName, ReferencedDomainName, ref cchReferencedDomainName, out peUse); } }

Read the article
Windows Azure: Backup Services Release, Hyper-V Recovery Manager, VM Enhancements, Enhanced Enterprise Management Support

- by ScottGu

This morning we released a huge set of updates to Windows Azure. These new capabilities include: Backup Services: General Availability of Windows Azure Backup Services Hyper-V Recovery Manager: Public preview of Windows Azure Hyper-V Recovery Manager Virtual Machines: Delete Attached Disks, Availability Set Warnings, SQL AlwaysOn Configuration Active Directory: Securely manage hundreds of SaaS applications Enterprise Management: Use Active Directory to Better Manage Windows Azure Windows Azure SDK 2.2: A massive update of our SDK + Visual Studio tooling support All of these improvements are now available to use immediately. Below are more details about them. Backup Service: General Availability Release of Windows Azure Backup Today we are releasing Windows Azure Backup Service as a general availability service. This release is now live in production, backed by an enterprise SLA, supported by Microsoft Support, and is ready to use for production scenarios. Windows Azure Backup is a cloud based backup solution for Windows Server which allows files and folders to be backed up and recovered from the cloud, and provides off-site protection against data loss. The service provides IT administrators and developers with the option to back up and protect critical data in an easily recoverable way from any location with no upfront hardware cost. Windows Azure Backup is built on the Windows Azure platform and uses Windows Azure blob storage for storing customer data. Windows Server uses the downloadable Windows Azure Backup Agent to transfer file and folder data securely and efficiently to the Windows Azure Backup Service. Along with providing cloud backup for Windows Server, Windows Azure Backup Service also provides capability to backup data from System Center Data Protection Manager and Windows Server Essentials, to the cloud. All data is encrypted onsite before it is sent to the cloud, and customers retain and manage the encryption key (meaning the data is stored entirely secured and can’t be decrypted by anyone but yourself). Getting Started To get started with the Windows Azure Backup Service, create a new Backup Vault within the Windows Azure Management Portal. Click New->Data Services->Recovery Services->Backup Vault to do this: Once the backup vault is created you’ll be presented with a simple tutorial that will help guide you on how to register your Windows Servers with it: Once the servers you want to backup are registered, you can use the appropriate local management interface (such as the Microsoft Management Console snap-in, System Center Data Protection Manager Console, or Windows Server Essentials Dashboard) to configure the scheduled backups and to optionally initiate recoveries. You can follow these tutorials to learn more about how to do this: Tutorial: Schedule Backups Using the Windows Azure Backup Agent This tutorial helps you with setting up a backup schedule for your registered Windows Servers. Additionally, it also explains how to use Windows PowerShell cmdlets to set up a custom backup schedule. Tutorial: Recover Files and Folders Using the Windows Azure Backup Agent This tutorial helps you with recovering data from a backup. Additionally, it also explains how to use Windows PowerShell cmdlets to do the same tasks. Below are some of the key benefits the Windows Azure Backup Service provides: Simple configuration and management. Windows Azure Backup Service integrates with the familiar Windows Server Backup utility in Windows Server, the Data Protection Manager component in System Center and Windows Server Essentials, in order to provide a seamless backup and recovery experience to a local disk, or to the cloud. Block level incremental backups. The Windows Azure Backup Agent performs incremental backups by tracking file and block level changes and only transferring the changed blocks, hence reducing the storage and bandwidth utilization. Different point-in-time versions of the backups use storage efficiently by only storing the changes blocks between these versions. Data compression, encryption and throttling. The Windows Azure Backup Agent ensures that data is compressed and encrypted on the server before being sent to the Windows Azure Backup Service over the network. As a result, the Windows Azure Backup Service only stores encrypted data in the cloud storage. The encryption key is not available to the Windows Azure Backup Service, and as a result the data is never decrypted in the service. Also, users can setup throttling and configure how the Windows Azure Backup service utilizes the network bandwidth when backing up or restoring information. Data integrity is verified in the cloud. In addition to the secure backups, the backed up data is also automatically checked for integrity once the backup is done. As a result, any corruptions which may arise due to data transfer can be easily identified and are fixed automatically. Configurable retention policies for storing data in the cloud. The Windows Azure Backup Service accepts and implements retention policies to recycle backups that exceed the desired retention range, thereby meeting business policies and managing backup costs. Hyper-V Recovery Manager: Now Available in Public Preview I’m excited to also announce the public preview of a new Windows Azure Service – the Windows Azure Hyper-V Recovery Manager (HRM). Windows Azure Hyper-V Recovery Manager helps protect your business critical services by coordinating the replication and recovery of System Center Virtual Machine Manager 2012 SP1 and System Center Virtual Machine Manager 2012 R2 private clouds at a secondary location. With automated protection, asynchronous ongoing replication, and orderly recovery, the Hyper-V Recovery Manager service can help you implement Disaster Recovery and restore important services accurately, consistently, and with minimal downtime. Application data in an Hyper-V Recovery Manager scenarios always travels on your on-premise replication channel. Only metadata (such as names of logical clouds, virtual machines, networks etc.) that is needed for orchestration is sent to Azure. All traffic sent to/from Azure is encrypted. You can begin using Windows Azure Hyper-V Recovery today by clicking New->Data Services->Recovery Services->Hyper-V Recovery Manager within the Windows Azure Management Portal. You can read more about Windows Azure Hyper-V Recovery Manager in Brad Anderson’s 9-part series, Transform the datacenter. To learn more about setting up Hyper-V Recovery Manager follow our detailed step-by-step guide. Virtual Machines: Delete Attached Disks, Availability Set Warnings, SQL AlwaysOn Today’s Windows Azure release includes a number of nice updates to Windows Azure Virtual Machines. These improvements include: Ability to Delete both VM Instances + Attached Disks in One Operation Prior to today’s release, when you deleted VMs within Windows Azure we would delete the VM instance – but not delete the drives attached to the VM. You had to manually delete these yourself from the storage account. With today’s update we’ve added a convenience option that now allows you to either retain or delete the attached disks when you delete the VM: We’ve also added the ability to delete a cloud service, its deployments, and its role instances with a single action. This can either be a cloud service that has production and staging deployments with web and worker roles, or a cloud service that contains virtual machines. To do this, simply select the Cloud Service within the Windows Azure Management Portal and click the “Delete” button: Warnings on Availability Sets with Only One Virtual Machine In Them One of the nice features that Windows Azure Virtual Machines supports is the concept of “Availability Sets”. An “availability set” allows you to define a tier/role (e.g. webfrontends, databaseservers, etc) that you can map Virtual Machines into – and when you do this Windows Azure separates them across fault domains and ensures that at least one of them is always available during servicing operations. This enables you to deploy applications in a high availability way. One issue we’ve seen some customers run into is where they define an availability set, but then forget to map more than one VM into it (which defeats the purpose of having an availability set). With today’s release we now display a warning in the Windows Azure Management Portal if you have only one virtual machine deployed in an availability set to help highlight this: You can learn more about configuring the availability of your virtual machines here. Configuring SQL Server Always On SQL Server Always On is a great feature that you can use with Windows Azure to enable high availability and DR scenarios with SQL Server. Today’s Windows Azure release makes it even easier to configure SQL Server Always On by enabling “Direct Server Return” endpoints to be configured and managed within the Windows Azure Management Portal. Previously, setting this up required using PowerShell to complete the endpoint configuration. Starting today you can enable this simply by checking the “Direct Server Return” checkbox: You can learn more about how to use direct server return for SQL Server AlwaysOn availability groups here. Active Directory: Application Access Enhancements This summer we released our initial preview of our Application Access Enhancements for Windows Azure Active Directory. This service enables you to securely implement single-sign-on (SSO) support against SaaS applications (including Office 365, SalesForce, Workday, Box, Google Apps, GitHub, etc) as well as LOB based applications (including ones built with the new Windows Azure AD support we shipped last week with ASP.NET and VS 2013). Since the initial preview we’ve enhanced our SAML federation capabilities, integrated our new password vaulting system, and shipped multi-factor authentication support. We've also turned on our outbound identity provisioning system and have it working with hundreds of additional SaaS Applications: Earlier this month we published an update on dates and pricing for when the service will be released in general availability form. In this blog post we announced our intention to release the service in general availability form by the end of the year. We also announced that the below features would be available in a free tier with it: SSO to every SaaS app we integrate with – Users can Single Sign On to any app we are integrated with at no charge. This includes all the top SAAS Apps and every app in our application gallery whether they use federation or password vaulting. Application access assignment and removal – IT Admins can assign access privileges to web applications to the users in their active directory assuring that every employee has access to the SAAS Apps they need. And when a user leaves the company or changes jobs, the admin can just as easily remove their access privileges assuring data security and minimizing IP loss User provisioning (and de-provisioning) – IT admins will be able to automatically provision users in 3rd party SaaS applications like Box, Salesforce.com, GoToMeeting, DropBox and others. We are working with key partners in the ecosystem to establish these connections, meaning you no longer have to continually update user records in multiple systems. Security and auditing reports – Security is a key priority for us. With the free version of these enhancements you'll get access to our standard set of access reports giving you visibility into which users are using which applications, when they were using them and where they are using them from. In addition, we'll alert you to un-usual usage patterns for instance when a user logs in from multiple locations at the same time. Our Application Access Panel – Users are logging in from every type of devices including Windows, iOS, & Android. Not all of these devices handle authentication in the same manner but the user doesn't care. They need to access their apps from the devices they love. Our Application Access Panel will support the ability for users to access access and launch their apps from any device and anywhere. You can learn more about our plans for application management with Windows Azure Active Directory here. Try out the preview and start using it today. Enterprise Management: Use Active Directory to Better Manage Windows Azure Windows Azure Active Directory provides the ability to manage your organization in a directory which is hosted entirely in the cloud, or alternatively kept in sync with an on-premises Windows Server Active Directory solution (allowing you to seamlessly integrate with the directory you already have). With today’s Windows Azure release we are integrating Windows Azure Active Directory even more within the core Windows Azure management experience, and enabling an even richer enterprise security offering. Specifically: 1) All Windows Azure accounts now have a default Windows Azure Active Directory created for them. You can create and map any users you want into this directory, and grant administrative rights to manage resources in Windows Azure to these users. 2) You can keep this directory entirely hosted in the cloud – or optionally sync it with your on-premises Windows Server Active Directory. Both options are free. The later approach is ideal for companies that wish to use their corporate user identities to sign-in and manage Windows Azure resources. It also ensures that if an employee leaves an organization, his or her access control rights to the company’s Windows Azure resources are immediately revoked. 3) The Windows Azure Service Management APIs have been updated to support using Windows Azure Active Directory credentials to sign-in and perform management operations. Prior to today’s release customers had to download and use management certificates (which were not scoped to individual users) to perform management operations. We still support this management certificate approach (don’t worry – nothing will stop working). But we think the new Windows Azure Active Directory authentication support enables an even easier and more secure way for customers to manage resources going forward. 4) The Windows Azure SDK 2.2 release (which is also shipping today) includes built-in support for the new Service Management APIs that authenticate with Windows Azure Active Directory, and now allow you to create and manage Windows Azure applications and resources directly within Visual Studio using your Active Directory credentials. This, combined with updated PowerShell scripts that also support Active Directory, enables an end-to-end enterprise authentication story with Windows Azure. Below are some details on how all of this works: Subscriptions within a Directory As part of today’s update, we have associated all existing Window Azure accounts with a Windows Azure Active Directory (and created one for you if you don’t already have one). When you login to the Windows Azure Management Portal you’ll now see the directory name in the URI of the browser. For example, in the screen-shot below you can see that I have a “scottgu” directory that my subscriptions are hosted within: Note that you can continue to use Microsoft Accounts (formerly known as Microsoft Live IDs) to sign-into Windows Azure. These map just fine to a Windows Azure Active Directory – so there is no need to create new usernames that are specific to a directory if you don’t want to. In the scenario above I’m actually logged in using my @hotmail.com based Microsoft ID which is now mapped to a “scottgu” active directory that was created for me. By default everything will continue to work just like you used to before. Manage your Directory You can manage an Active Directory (including the one we now create for you by default) by clicking the “Active Directory” tab in the left-hand side of the portal. This will list all of the directories in your account. Clicking one the first time will display a getting started page that provides documentation and links to perform common tasks with it: You can use the built-in directory management support within the Windows Azure Management Portal to add/remove/manage users within the directory, enable multi-factor authentication, associate a custom domain (e.g. mycompanyname.com) with the directory, and/or rename the directory to whatever friendly name you want (just click the configure tab to do this). You can also setup the directory to automatically sync with an on-premises Active Directory using the “Directory Integration” tab. Note that users within a directory by default do not have admin rights to login or manage Windows Azure based resources. You still need to explicitly grant them co-admin permissions on a subscription for them to login or manage resources in Windows Azure. You can do this by clicking the Settings tab on the left-hand side of the portal and then by clicking the administrators tab within it. Sign-In Integration within Visual Studio If you install the new Windows Azure SDK 2.2 release, you can now connect to Windows Azure from directly inside Visual Studio without having to download any management certificates. You can now just right-click on the “Windows Azure” icon within the Server Explorer and choose the “Connect to Windows Azure” context menu option to do so: Doing this will prompt you to enter the email address of the username you wish to sign-in with (make sure this account is a user in your directory with co-admin rights on a subscription): You can use either a Microsoft Account (e.g. Windows Live ID) or an Active Directory based Organizational account as the email. The dialog will update with an appropriate login prompt depending on which type of email address you enter: Once you sign-in you’ll see the Windows Azure resources that you have permissions to manage show up automatically within the Visual Studio server explorer and be available to start using: No downloading of management certificates required. All of the authentication was handled using your Windows Azure Active Directory! Manage Subscriptions across Multiple Directories If you have already have multiple directories and multiple subscriptions within your Windows Azure account, we have done our best to create a good default mapping of your subscriptions->directories as part of today’s update. If you don’t like the default subscription-to-directory mapping we have done you can click the Settings tab in the left-hand navigation of the Windows Azure Management Portal and browse to the Subscriptions tab within it: If you want to map a subscription under a different directory in your account, simply select the subscription from the list, and then click the “Edit Directory” button to choose which directory to map it to. Mapping a subscription to a different directory takes only seconds and will not cause any of the resources within the subscription to recycle or stop working. We’ve made the directory->subscription mapping process self-service so that you always have complete control and can map things however you want. Filtering By Directory and Subscription Within the Windows Azure Management Portal you can filter resources in the portal by subscription (allowing you to show/hide different subscriptions). If you have subscriptions mapped to multiple directory tenants, we also now have a filter drop-down that allows you to filter the subscription list by directory tenant. This filter is only available if you have multiple subscriptions mapped to multiple directories within your Windows Azure Account: Windows Azure SDK 2.2 Today we are also releasing a major update of our Windows Azure SDK. The Windows Azure SDK 2.2 release adds some great new features including: Visual Studio 2013 Support Integrated Windows Azure Sign-In support within Visual Studio Remote Debugging Cloud Services with Visual Studio Firewall Management support within Visual Studio for SQL Databases Visual Studio 2013 RTM VM Images for MSDN Subscribers Windows Azure Management Libraries for .NET Updated Windows Azure PowerShell Cmdlets and ScriptCenter I’ll post a follow-up blog shortly with more details about all of the above. Additional Updates In addition to the above enhancements, today’s release also includes a number of additional improvements: AutoScale: Richer time and date based scheduling support (set different rules on different dates) AutoScale: Ability to Scale to Zero Virtual Machines (very useful for Dev/Test scenarios) AutoScale: Support for time-based scheduling of Mobile Service AutoScale rules Operation Logs: Auditing support for Service Bus management operations Today we also shipped a major update to the Windows Azure SDK – Windows Azure SDK 2.2. It has so much goodness in it that I have a whole second blog post coming shortly on it! :-) Summary Today’s Windows Azure release enables a bunch of great new scenarios, and enables a much richer enterprise authentication offering. If you don’t already have a Windows Azure account, you can sign-up for a free trial and start using all of the above features today. Then visit the Windows Azure Developer Center to learn more about how to build apps with it. Hope this helps, Scott P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
Informed TDD – Kata “To Roman Numerals”

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons. PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

Read the article
webserver horrible slow, sometimes incredible fast

- by dhanke

i am running a small community ( 6000+ Members ) on a non-virtual 64-bit ubuntu 11.04 system. I am not a Linux-pro, not even advanced, i just tried to setup a webserver, which does nothing special actually. Delivering some dynamic PHP and RoR websites is its task. So it might be that my configuration files do look horrible bad. Also, i might use the wrong vocabulary, so in doubt, please ask. Having a current all-time record of 520 registered users (board-accounts, no system-users) online at same time, average server-load is about 2.0 - 5.0. Meantime (~250 users) average server load value is at about 0.4 - 0.8, sometimes, on some expensive searches a bit higher. everything fine. From time to time however, the load increases up to 120 (120.0, not 12.0 ;) ). In this time, its hard to even connect via SSH, but when i reach the server, and use top/htop/iotop to see whats happening, i cannot identify any process causing high CPU load. iotop tells me about a current reading/writing speed of about approx. 70kb/s, which is quite equal to power-off i think. Memory-Usage is max. at ~ 12GB of 16GB, so swap remains empty. now the odd (at least for me:) waiting some minutes ( since i always get a bit into a panic when this happens, it feels like 5 minutes, but i suppose its more like 20-30 minutes) and the server is back to normal. everything continues as normal. another odd fact: when i run hdparm -tT /dev/sda, i get answer like: /dev/sda: Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec when i run the same command while the server is "frozen", the answer is like /dev/sda: <- takes about 5 minutes until this line appears Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec <- 5 more minutes Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec <- another 5 minutes so the values are the same, but the quoted time is completely wrong. using time command as prefix also tells me that ~ 15 minutes were used. I searched in dmesg, /var/log/[messages|syslog] - nothing found. /var/log/errors however tells me that: Jul 4 20:28:30 localhost kernel: [19080.671415] INFO: task php5-fpm:27728 blocked for more than 120 seconds. Jul 4 20:28:30 localhost kernel: [19080.671419] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables this message. multiple times. now that message does tell me that php5-fpm task was blocked or did block ? - but not if that is the cause or just one of the results of that "freeze". Anyone? to cut the long story short, i dont know where even to start analyzing. So if you can give me any advice by looking at following specs and configs, or ask me to provide more information, i`d be glad. Specs: 6 Core AMD Phenom(tm) II X6 1055T Processor * 16 Gigabyte Ram 2x 1.5 TB Seagate ST1500DL003-9VT16L via SATA 3 via SoftwareRaid (i suppose) Services: (due to service --status-all, those with [ + ]) nginx Webserver 1.0.14 mySQL 5.1.63 Server Ruby on Rails 2.3.11 ( passenger-nginx-module ) php5-fpm 5.3.6-13ubuntu3.7 SSH ido2db Further services: default crontab + nightly backup. syslog-ng Website consists of 2 subdomains, forum. and www. where forum is a phpBB3.x PHP-Board, and www a Ruby on Rails 2.3.11 application (portal). Mini-Note: sometimes i notice that the forum is pretty slow, in contrast to the always-fast (except for this "freeze") portal. Both share the same Database, but the portal is using it read-only. The Webserver is nginx, using phusion passenger module to communicate with the ruby-application. Also, for the forum it communicates with php5-fpm via socket: relevant nginx configuration parts ( with comments/questions starting by ; ) ; in case of freeze due to too high Filesystem activity, maybe adding a limit? #worker_rlimit_nofile 50000; user www-data; ; 6 cores, so i read 6 fits. maybe already wrong? worker_processes 6; pid /var/run/nginx.pid; events { worker_connections 1024; } http { passenger_root /var/lib/gems/1.8/gems/passenger-3.0.11; passenger_ruby /usr/bin/ruby1.8; ; the forum once featured a chat, which was working w/o websockets. ; so it was a hell of pull requests (deactivated now, freeze still happening) keepalive_timeout 65; keepalive_requests 50; gzip on; server { listen 80; server_name www.domain.tld; root /var/www/domain/rails/public; passenger_enabled on; } server { listen 80; server_name forum.domain.tld; location / { root /var/www/domain/forum; index index.php; } ; satic stuff to be handled by nginx location ~* ^/style/.+.(jpg|jpeg|gif|css|png|js|ico|xml)$ { access_log off; expires 30d; root /var/www/domain/forum/; } ; now the php magic, note the "backend"-fcgi_pass location ~ .php$ { fastcgi_split_path_info ^(.+\.php)(.*)$; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /var/www/domain/forum$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors on; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 60; fastcgi_send_timeout 180; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 256 16k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; fastcgi_max_temp_file_size 0; } location ~ /\.ht { deny all; } } ;the php5-fpm socket. i read that /dev/shm/ whould be the fastes place for this. bad idea in general? upstream backend { server unix:/dev/shm/phpfpm; } ... } php5-fpm settings (i changed this values due to php5-fpm error log messages higher and higher.. (freeze-problem was there before as well)* listen = /dev/shm/phpfpm user = www-data group = www-data pm = dynamic ; holy, 4000! well, shinking this value to earth-level gave me ; 100s of 502 bad gateway commands. this values were quite stable. ; since there are only max 520 users online i dont get it, why i would need ; as many children as configured here. due to keep-alive maybe? ; asking questions is easier for me since restarting server will make ; my community-members angry ;) pm.max_children = 4000 pm.start_servers = 100 pm.min_spare_servers = 50 pm.max_spare_servers = 150 pm.max_requests = 10 pm.status_path = /status ping.path = /ping ping.response = pong slowlog = log/$pool.log.slow ;should i use rlimit? ;rlimit_files = 1024 chdir = / mysql/my.cnf [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking bind-address = 127.0.0.1 key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 myisam-recover = BACKUP ; high number, but less gives some phpBB errors. max_connections = 450 table_cache = 512 ; i read twice the cpu cores, bad? thread_concurrency = 12 join_buffer_size = 2084K concurrent_insert = 3 query_cache_limit = 64M query_cache_size = 512M query_cache_type = 1 log_error = /var/log/mysql/error.log log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 2 expire_logs_days = 10 max_binlog_size = 100M low_priority_updates=1 [mysqldump] quick quote-names max_allowed_packet = 16M [isamchk] key_buffer = 16M !includedir /etc/mysql/conf.d/ I used smartctl already, hdds seem to be fine. /proc/mdstatus quotes: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sda3[1] 1459264192 blocks [2/1] [_U] md1 : active raid1 sda1[0] 3911680 blocks [2/1] [U_] unused devices: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127727 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 127727 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited I quote some questions in my configuration files, these are not (intentional) directly problem-related, but would be nice for me to know wether they are indeed questionable or done right. One additional Fact: my MYSQL-database is at 12GB size. i dont know if that does matter, but mytop sometimes shows me 4-5 seconds long insert queries, some are 20-30 seconds long. Its just a feeling that i am unable to prove (because i dont know how), but when i disable the database, the freeze seems not to happen. Example: i created a dummy rails application to see the development log. the app made some sql-queries, reads and inserts. the log quite often was like: DbTest Load (0.3ms) SELECT * FROM `db_test` WHERE (`db_test`.`id` = 31722) LIMIT 1 SQL (0.1ms) BEGIN DbTest Update (0.3ms) UPDATE `db_test` SET `updated_at` = '2012-07-04 23:32:34' WHERE `id` = 31722 - now the log stands still for 5-60 seconds. SQL (49.1ms) COMMIT - SQL-Update time in the log does not include freeze time Rendering test/index Completed in 96ms (View: 16, DB: 59) | 200 OK [http://localhost:9000/test] Bad part is: this mini-freeze here only happens from time to time as well. note: meanwhile i cannot even upload files via scp. I currently feel like running form bad to worse and back by googling for my server-problem due to immense lack of knowledge regarding server configurations. It still makes me wonder, why those problems even appear, since 250 users a time is not such a high amount, right? So my questions: whats wrong and how to fix? ;) or: what information can i provide to make the situation more clear? can you point at some critical bad configuration-line which i should consider to catch up in the documentation? are there any tools i can run to see some possible bottlenecks? any further advice? (next to: "pay someone who knows what he does" - its a private project, server costs enough already. :)) Thanks for your time and help. Best Regards, Daniel P.S.: i renamed the configfiles to domain.tld since i dont want to have any % more load to the server until its fixed. might be a exaggeratedly thought.. P.P.S: if i asked a complete duplicate question, sorry. my search results seemed to be quite specific in their own way.

Read the article
Linq to LLBLGen query problem

- by Jeroen Breuer

Hello, I've got a Stored Procedure and i'm trying to convert it to a Linq to LLBLGen query. The query in Linq to LLBGen works, but when I trace the query which is send to sql server it is far from perfect. This is the Stored Procedure: ALTER PROCEDURE [dbo].[spDIGI_GetAllUmbracoProducts] -- Add the parameters for the stored procedure. @searchText nvarchar(255), @startRowIndex int, @maximumRows int, @sortExpression nvarchar(255) AS BEGIN SET @startRowIndex = @startRowIndex + 1 SET @searchText = '%' + @searchText + '%' -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; -- This is the query which will fetch all the UmbracoProducts. -- This query also supports paging and sorting. WITH UmbracoOverview As ( SELECT ROW_NUMBER() OVER( ORDER BY CASE WHEN @sortExpression = 'productName' THEN umbracoProduct.productName WHEN @sortExpression = 'productCode' THEN umbracoProduct.productCode END ASC, CASE WHEN @sortExpression = 'productName DESC' THEN umbracoProduct.productName WHEN @sortExpression = 'productCode DESC' THEN umbracoProduct.productCode END DESC ) AS row_num, umbracoProduct.umbracoProductId, umbracoProduct.productName, umbracoProduct.productCode FROM umbracoProduct INNER JOIN product ON umbracoProduct.umbracoProductId = product.umbracoProductId WHERE (umbracoProduct.productName LIKE @searchText OR umbracoProduct.productCode LIKE @searchText OR product.code LIKE @searchText OR product.description LIKE @searchText OR product.descriptionLong LIKE @searchText OR product.unitCode LIKE @searchText) ) SELECT UmbracoOverview.UmbracoProductId, UmbracoOverview.productName, UmbracoOverview.productCode FROM UmbracoOverview WHERE (row_num >= @startRowIndex AND row_num < (@startRowIndex + @maximumRows)) -- This query will count all the UmbracoProducts. -- This query is used for paging inside ASP.NET. SELECT COUNT (umbracoProduct.umbracoProductId) AS CountNumber FROM umbracoProduct INNER JOIN product ON umbracoProduct.umbracoProductId = product.umbracoProductId WHERE (umbracoProduct.productName LIKE @searchText OR umbracoProduct.productCode LIKE @searchText OR product.code LIKE @searchText OR product.description LIKE @searchText OR product.descriptionLong LIKE @searchText OR product.unitCode LIKE @searchText) END This is my Linq to LLBLGen query: using System.Linq.Dynamic; var q = ( from up in MetaData.UmbracoProduct join p in MetaData.Product on up.UmbracoProductId equals p.UmbracoProductId where up.ProductCode.Contains(searchText) || up.ProductName.Contains(searchText) || p.Code.Contains(searchText) || p.Description.Contains(searchText) || p.DescriptionLong.Contains(searchText) || p.UnitCode.Contains(searchText) select new UmbracoProductOverview { UmbracoProductId = up.UmbracoProductId, ProductName = up.ProductName, ProductCode = up.ProductCode } ).OrderBy(sortExpression); //Save the count in HttpContext.Current.Items. This value will only be saved during 1 single HTTP request. HttpContext.Current.Items["AllProductsCount"] = q.Count(); //Returns the results paged. return q.Skip(startRowIndex).Take(maximumRows).ToList<UmbracoProductOverview>(); This is my Initial expression to process: value(SD.LLBLGen.Pro.LinqSupportClasses.DataSource`1[Eurofysica.DB.EntityClasses.UmbracoProductEntity]).Join(value(SD.LLBLGen.Pro.LinqSupportClasses.DataSource`1[Eurofysica.DB.EntityClasses.ProductEntity]), up => up.UmbracoProductId, p => p.UmbracoProductId, (up, p) => new <>f__AnonymousType0`2(up = up, p = p)).Where(<>h__TransparentIdentifier0 => (((((<>h__TransparentIdentifier0.up.ProductCode.Contains(value(Eurofysica.BusinessLogic.BLL.Controllers.UmbracoProductController+<>c__DisplayClass1).searchText) || <>h__TransparentIdentifier0.up.ProductName.Contains(value(Eurofysica.BusinessLogic.BLL.Controllers.UmbracoProductController+<>c__DisplayClass1).searchText)) || <>h__TransparentIdentifier0.p.Code.Contains(value(Eurofysica.BusinessLogic.BLL.Controllers.UmbracoProductController+<>c__DisplayClass1).searchText)) || <>h__TransparentIdentifier0.p.Description.Contains(value(Eurofysica.BusinessLogic.BLL.Controllers.UmbracoProductController+<>c__DisplayClass1).searchText)) || <>h__TransparentIdentifier0.p.DescriptionLong.Contains(value(Eurofysica.BusinessLogic.BLL.Controllers.UmbracoProductController+<>c__DisplayClass1).searchText)) || <>h__TransparentIdentifier0.p.UnitCode.Contains(value(Eurofysica.BusinessLogic.BLL.Controllers.UmbracoProductController+<>c__DisplayClass1).searchText))).Select(<>h__TransparentIdentifier0 => new UmbracoProductOverview() {UmbracoProductId = <>h__TransparentIdentifier0.up.UmbracoProductId, ProductName = <>h__TransparentIdentifier0.up.ProductName, ProductCode = <>h__TransparentIdentifier0.up.ProductCode}).OrderBy( => .ProductName).Count() Now this is how the queries look like that are send to sql server: Select query: Query: SELECT [LPA_L2].[umbracoProductId] AS [UmbracoProductId], [LPA_L2].[productName] AS [ProductName], [LPA_L2].[productCode] AS [ProductCode] FROM ( [eurofysica].[dbo].[umbracoProduct] [LPA_L2] INNER JOIN [eurofysica].[dbo].[product] [LPA_L3] ON [LPA_L2].[umbracoProductId] = [LPA_L3].[umbracoProductId]) WHERE ( ( ( ( ( ( ( ( [LPA_L2].[productCode] LIKE @ProductCode1) OR ( [LPA_L2].[productName] LIKE @ProductName2)) OR ( [LPA_L3].[code] LIKE @Code3)) OR ( [LPA_L3].[description] LIKE @Description4)) OR ( [LPA_L3].[descriptionLong] LIKE @DescriptionLong5)) OR ( [LPA_L3].[unitCode] LIKE @UnitCode6)))) Parameter: @ProductCode1 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @ProductName2 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @Code3 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @Description4 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @DescriptionLong5 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @UnitCode6 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Count query: Query: SELECT TOP 1 COUNT(*) AS [LPAV_] FROM (SELECT [LPA_L2].[umbracoProductId] AS [UmbracoProductId], [LPA_L2].[productName] AS [ProductName], [LPA_L2].[productCode] AS [ProductCode] FROM ( [eurofysica].[dbo].[umbracoProduct] [LPA_L2] INNER JOIN [eurofysica].[dbo].[product] [LPA_L3] ON [LPA_L2].[umbracoProductId] = [LPA_L3].[umbracoProductId]) WHERE ( ( ( ( ( ( ( ( [LPA_L2].[productCode] LIKE @ProductCode1) OR ( [LPA_L2].[productName] LIKE @ProductName2)) OR ( [LPA_L3].[code] LIKE @Code3)) OR ( [LPA_L3].[description] LIKE @Description4)) OR ( [LPA_L3].[descriptionLong] LIKE @DescriptionLong5)) OR ( [LPA_L3].[unitCode] LIKE @UnitCode6))))) [LPA_L1] Parameter: @ProductCode1 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @ProductName2 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @Code3 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @Description4 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @DescriptionLong5 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". Parameter: @UnitCode6 : String. Length: 2. Precision: 0. Scale: 0. Direction: Input. Value: "%%". As you can see no sorting or paging is done (like in my Stored Procedure). This is probably done inside the code after all the results are fetched. This costs a lot of performance! Does anybody know how I can convert my Stored Procedure to Linq to LLBLGen the proper way?

Read the article
Compass - Lucene Full text search. Structure and Best Practice.

- by Rob

Hi, I have played about with the tutorial and Compass itself for a bit now. I have just started to ramp up the use of it and have found that the performance slows drastically. I am certain that this is due to my mappings and the relationships that I have between entities and was looking for suggestions about how this should be best done. Also as a side question I wanted to know if a variable is in an @searchableComponent but is not defined as @searchable when the component object is pulled out of Compass results will you be able to access that variable? I have 3 main classes that I want to search on - Provider, Location and Activity. They are all inter-related - a Provider can have many locations and activites and has an address; A Location has 1 provider, many activities and an address; An activity has 1 provider and many locations. I have a join table between activity and Location called ActivityLocation that can later provider additional information about the relationship. My classes are mapped to compass as shown below for provider location activity and address. This works but gives a huge index and searches on it are comparatively slow, any advice would be great. Cheers, Rob @Searchable public class AbstractActivity extends LightEntity implements Serializable { /** * Serialisation ID */ private static final long serialVersionUID = 3445339493203407152L; @SearchableId (name="actID") private Integer activityId =-1; @SearchableComponent() private Provider provider; @SearchableComponent(prefix = "activity") private Category category; private String status; @SearchableProperty (name = "activityName") @SearchableMetaData (name = "activityshortactivityName") private String activityName; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "activityshortDescription") private String shortDescription; @SearchableProperty (name = "abRating") @SearchableMetaData (name = "activityabRating") private Integer abRating; private String contactName; private String phoneNumber; private String faxNumber; private String email; private String url; @SearchableProperty (name = "completed") @SearchableMetaData (name = "activitycompleted") private Boolean completed= false; @SearchableProperty (name = "isprivate") @SearchableMetaData (name = "activityisprivate") private Boolean isprivate= false; private Boolean subs= false; private Boolean newsfeed= true; private Set news = new HashSet(0); private Set ActivitySession = new HashSet(0); private Set payments = new HashSet(0); private Set userclub = new HashSet(0); private Set ActivityOpeningTimes = new HashSet(0); private Set Events = new HashSet(0); private User creator; private Set userInterested = new HashSet(0); boolean freeEdit = false; private Integer activityType =0; @SearchableComponent (maxDepth=2) private Set activityLocations = new HashSet(0); private Double userRating = -1.00; Getters and Setters .... @Searchable public class AbstractLocation extends LightEntity implements Serializable { /** * Serialisation ID */ private static final long serialVersionUID = 3445339493203407152L; @SearchableId (name="locationID") private Integer locationId; @SearchableComponent (prefix = "location") private Category category; @SearchableComponent (maxDepth=1) private Provider provider; @SearchableProperty (name = "status") @SearchableMetaData (name = "locationstatus") private String status; @SearchableProperty private String locationName; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "locationshortDescription") private String shortDescription; @SearchableProperty (name = "abRating") @SearchableMetaData (name = "locationabRating") private Integer abRating; private Integer boolUseProviderDetails; @SearchableProperty (name = "contactName") @SearchableMetaData (name = "locationcontactName") private String contactName; @SearchableComponent private Address address; @SearchableProperty (name = "phoneNumber") @SearchableMetaData (name = "locationphoneNumber") private String phoneNumber; @SearchableProperty (name = "faxNumber") @SearchableMetaData (name = "locationfaxNumber") private String faxNumber; @SearchableProperty (name = "email") @SearchableMetaData (name = "locationemail") private String email; @SearchableProperty (name = "url") @SearchableMetaData (name = "locationurl") private String url; @SearchableProperty (name = "completed") @SearchableMetaData (name = "locationcompleted") private Boolean completed= false; @SearchableProperty (name = "isprivate") @SearchableMetaData (name = "locationisprivate") private Boolean isprivate= false; @SearchableComponent private Set activityLocations = new HashSet(0); private Set LocationOpeningTimes = new HashSet(0); private Set events = new HashSet(0); @SearchableProperty (name = "adult_cost") @SearchableMetaData (name = "locationadult_cost") private String adult_cost =""; @SearchableProperty (name = "child_cost") @SearchableMetaData (name = "locationchild_cost") private String child_cost =""; @SearchableProperty (name = "family_cost") @SearchableMetaData (name = "locationfamily_cost") private String family_cost =""; private Double userRating = -1.00; private Set costs = new HashSet(0); private String cost_caveats =""; Getters and Setters .... @Searchable public class AbstractActivitylocations implements java.io.Serializable { /** * */ private static final long serialVersionUID = 1365110541466870626L; @SearchableId (name="id") private Integer id; @SearchableComponent private Activity activity; @SearchableComponent private Location location; Getters and Setters..... @Searchable public class AbstractProvider extends LightEntity implements Serializable { private static final long serialVersionUID = 3060354043429663058L; @SearchableId private Integer providerId = -1; @SearchableComponent (prefix = "provider") private Category category; @SearchableProperty (name = "businessName") @SearchableMetaData (name = "providerbusinessName") private String businessName; @SearchableProperty (name = "contactName") @SearchableMetaData (name = "providercontactName") private String contactName; @SearchableComponent private Address address; @SearchableProperty (name = "phoneNumber") @SearchableMetaData (name = "providerphoneNumber") private String phoneNumber; @SearchableProperty (name = "faxNumber") @SearchableMetaData (name = "providerfaxNumber") private String faxNumber; @SearchableProperty (name = "email") @SearchableMetaData (name = "provideremail") private String email; @SearchableProperty (name = "url") @SearchableMetaData (name = "providerurl") private String url; @SearchableProperty (name = "status") @SearchableMetaData (name = "providerstatus") private String status; @SearchableProperty (name = "notes") @SearchableMetaData (name = "providernotes") private String notes; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "providershortDescription") private String shortDescription; private Boolean completed = false; private Boolean isprivate = false; private Double userRating = -1.00; private Integer ABRating = 1; @SearchableComponent private Set locations = new HashSet(0); @SearchableComponent private Set activities = new HashSet(0); private Set ProviderOpeningTimes = new HashSet(0); private User creator; boolean freeEdit = false; Getters and Setters... Thanks for reading!! Rob

Read the article
256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

- by Alan Smith

For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour. This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005> define wooden texture { noise surface { ambient 0.2 diffuse 0.7 specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1 lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7 } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } } viewpoint { from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60 resolution 640, 480 aspect 1.6 image_format 0 } light <-10, 30, 20> light <-10, 30, -20> object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden } object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome } After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue } In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000 As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72 Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: · On-Premise o Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. · Windows Azure o Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment. The architecture of each worker role is shown below. The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" > <WorkerRole name="CloudRayWorkerRole" vmsize="Small"> <Imports> </Imports> <ConfigurationSettings> <Setting name="DataConnectionString" /> </ConfigurationSettings> <LocalResources> <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" /> </LocalResources> </WorkerRole> </ServiceDefinition> The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1. The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2. PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3. DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4. The JPG file is uploaded from local storage to the images blob container. 5. A message is placed on the images queue to indicate a new image is available for download. 6. The job message is deleted from the job queue. 7. The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() { // Set environment variables string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation); string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation); LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder"); string localStorageRootPath = rayStorage.RootPath; JobQueue jobQueue = new JobQueue("renderjobs"); JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs"); CloudRayBlob sceneBlob = new CloudRayBlob("scenes"); CloudRayBlob imageBlob = new CloudRayBlob("images"); RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource(); Frames = 0; while (true) { // Get the render job from the queue CloudQueueMessage jobMsg = jobQueue.Get(); if (jobMsg != null) { // Get the file details string sceneFile = jobMsg.AsString; string tgaFile = sceneFile.Replace(".pi", ".tga"); string jpgFile = sceneFile.Replace(".pi", ".jpg"); string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile); string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile); string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile); // Copy the scene file to local storage sceneBlob.DownloadFile(sceneFilePath); // Run the ray tracer. string polyrayArguments = string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath); Process polyRayProcess = new Process(); polyRayProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath); polyRayProcess.StartInfo.Arguments = polyrayArguments; polyRayProcess.Start(); polyRayProcess.WaitForExit(); // Convert the image string dtaArguments = string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath)); Process dtaProcess = new Process(); dtaProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath); dtaProcess.StartInfo.Arguments = dtaArguments; dtaProcess.Start(); dtaProcess.WaitForExit(); // Upload the image to blob storage imageBlob.UploadFile(jpgFilePath); // Add a download job. downloadQueue.Add(jpgFile); // Delete the render job message jobQueue.Delete(jobMsg); Frames++; } else { Thread.Sleep(1000); } // Log the worker role activity. roleLifecycleDataSource.Alive ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames); } } Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage. public class RoleLifecycle : TableServiceEntity { public string ServerName { get; set; } public string Status { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } public long SecondsRunning { get; set; } public DateTime LastActiveTime { get; set; } public int Frames { get; set; } public string Comment { get; set; } public RoleLifecycle() { } public RoleLifecycle(string roleName) { PartitionKey = roleName; RowKey = Utils.GetAscendingRowKey(); Status = "Started"; StartTime = DateTime.UtcNow; LastActiveTime = StartTime; EndTime = StartTime; SecondsRunning = 0; Frames = 0; } } A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="16" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames. Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation. With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes. At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="256" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames. Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds. We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes. The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each. Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. · Windows Azure can provide massive compute power, on demand, in a matter of minutes. · The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. · Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. · No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. · Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. · Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. · Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. · Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. · If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. · Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. · Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

Search Results

Search found 1242 results on 50 pages for 'costs'.

Page 50/50 | < Previous Page | 46 47 48 49 50

- by shinjuo

- by user168715

- by Rob Farley

- by FransBouma

- by Rob Farley

- by Sylvie MacKenzie, PMP

- by Ubiquitous Che

- by ScottGu

- by Paul White

- by Ralf Westphal

- by dhanke

- by Jeroen Breuer

- by Rob

- by Alan Smith

- by rchrd

< Previous Page | 46 47 48 49 50