simplest - Page 45 - Developer IT

CodePlex Daily Summary for Thursday, October 11, 2012

CodePlex Daily Summary for Thursday, October 11, 2012Popular ReleasesOstrivDB: OstrivDB 0.1: - Storage configuration: objects serialization (Xml, Json), storage file compressing, data block size. - Caching for Select queries. - Indexing. - Batch of queries. - No special query language (LINQ used). - Integrated sorting and paging. - Multithreaded data processing.Mido: Mido v0.7: Mido is the simplest utility that helps to make watermarks on images and resize them. It has a very thin installer. The program has beta mark but it is able to draw watermark text, watermark images, resize pictures. Change list: + Opacity option + Stroke option + Bold, italic, underline options + Show progress during loading of images + Allow rotate watermart + Allow write multiline text as watermark + Add text aligment if text contains several lines + Add button 'clear custom position' + A...D3 Loot Tracker: 1.5.4: Fixed a bug where the server ip was not logged properly in the stats file.Captcha MVC: Captcha Mvc 2.1.2: v 2.1.2: Fixed problem with serialization. Made all classes from a namespace Jetbrains.Annotaions as the internal. Added autocomplete attribute and autocorrect attribute for captcha input element. Minor changes. v 2.1.1: Fixed problem with serialization. Minor changes. v 2.1: Added support for storing captcha in the session or cookie. See the updated example. Updated example. Minor changes. v 2.0.1: Added support for a partial captcha. Now you can easily customize the layout, s...DotNetNuke® Community Edition CMS: 06.02.04: Major Highlights Fixed issue where the module printing function was only visible to administrators Fixed issue where pane level skinning was being assigned to a default container for any content pane Fixed issue when using password aging and FB / Google authentication Fixed issue that was causing the DateEditControl to not load the assigned value Fixed issue that stopped additional profile properties to be displayed in the member directory after modifying the template Fixed er...Advanced DataGridView with Excel-like auto filter: 1.0.0.0: ?????? ??????Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.69: Fix for issue #18766: build task should not build the output if it's newer than all the input files. Fix for Issue #18764: build taks -res switch not working. update build task to concatenate input source and then minify, rather than minify and then concatenate. include resource string-replacement root name in the assumed globals list. Stop replacing new Date().getTime() with +new Date -- the latter is smaller, but turns out it executes up to 45% slower. add CSS support for single-...WinRT XAML Toolkit: WinRT XAML Toolkit - 1.3.3: WinRT XAML Toolkit based on the Windows 8 RTM SDK. Download the latest source from the SOURCE CODE page. For compiled version use NuGet. You can add it to your project in Visual Studio by going to View/Other Windows/Package Manager Console and entering: PM> Install-Package winrtxamltoolkit Features Attachable Behaviors AwaitableUI extensions Controls Converters Debugging helpers Extension methods Imaging helpers IO helpers VisualTree helpers Samples Recent changes NOTE:...VidCoder: 1.4.4 Beta: Fixed inability to create new presets with "Save As".MCEBuddy 2.x: MCEBuddy 2.3.2: Changelog for 2.3.2 (32bit and 64bit) 1. Added support for generating XBMC XML NFO files for files in the conversion queue (store it along with the source video with source video name.nfo). Right click on the file in queue and select generate XML 2. UI bugifx, start and end trim box locations interchanged 3. Added support for removing commercials from non DVRMS/WTV files (MP4, AVI etc) 4. Now checking for Firewall port status before enabling (might help with some firewall problems) 5. User In...Sandcastle Help File Builder: SHFB v1.9.5.0 with Visual Studio Package: General InformationIMPORTANT: On some systems, the content of the ZIP file is blocked and the installer may fail to run. Before extracting it, right click on the ZIP file, select Properties, and click on the Unblock button if it is present in the lower right corner of the General tab in the properties dialog. This release supports the Sandcastle October 2012 Release (v2.7.1.0). It includes full support for generating, installing, and removing MS Help Viewer files. This new release suppor...ClosedXML - The easy way to OpenXML: ClosedXML 0.68.0: ClosedXML now resolves formulas! Yes it finally happened. If you call cell.Value and it has a formula the library will try to evaluate the formula and give you the result. For example: var wb = new XLWorkbook(); var ws = wb.AddWorksheet("Sheet1"); ws.Cell("A1").SetValue(1).CellBelow().SetValue(1); ws.Cell("B1").SetValue(1).CellBelow().SetValue(1); ws.Cell("C1").FormulaA1 = "\"The total value is: \" & SUM(A1:B2)"; var...Json.NET: Json.NET 4.5 Release 10: New feature - Added Portable build to NuGet package New feature - Added GetValue and TryGetValue with StringComparison to JObject Change - Improved duplicate object reference id error message Fix - Fixed error when comparing empty JObjects Fix - Fixed SecAnnotate warnings Fix - Fixed error when comparing DateTime JValue with a DateTimeOffset JValue Fix - Fixed serializer sometimes not using DateParseHandling setting Fix - Fixed error in JsonWriter.WriteToken when writing a DateT...Readable Passphrase Generator: KeePass Plugin 0.7.2: Changes: Tested against KeePass 2.20.1 Tested under Ubuntu 12.10 (and KeePass 2.20) Added GenerateAsUtf8 method returning the encrypted passphrase as a UTF8 byte array.TelerikMvcGridCustomBindingHelper: Version 1.0.15.279-RC3: TelerikMvcGridCustomBindingHelper 1.0.15.279 RC3 Release notes: This is a RC version (hopefully the last one), please test and report any error or problem you encounter. Configurable null handling when filtering (AcceptNullValuesWhenFiltering, NullSubstitutes and NullAliases) Internal DynamicWhereClause improvments GridGridCustomBindingHelper.UseProjections method now acept ProjectionsOptions parameter for easely determine which properties should be projected Isolate and hide some are...JSLint for Visual Studio 2010: 1.4.2: 1.4.2patterns & practices: Prism: Prism for .NET 4.5: This is a release does not include any functionality changes over Prism 4.1 Desktop. These assemblies target .NET 4.5. These assemblies also were compiled against updated dependencies: Unity 3.0 and Common Service Locator (Portable Class Library).Snoop, the WPF Spy Utility: Snoop 2.8.0: Snoop 2.8.0Announcing Snoop 2.8.0! It's been exactly six months since the last release, and this one has a bunch of goodies in it. In particular, there is now a PowerShell scripting tab, compliments of Bailey Ling. With this tab, the possibilities are limitless. It basically lets you automate/script the application that you are Snooping. Bailey has a couple blog posts (one and two) on his tab already, and I am sure more is to come. Please note that if you do not have PowerShell installed, y...Z3: Z3 4.1.2: Minor fixes. Now, z3 compiles with gcc 4.7.x.NET Micro Framework: .NET MF 4.3 (Beta): This is the 4.3 Beta version of the .NET Micro Framework. Feature List for v4.3 Support for Visual Studio 2012 (including the Windows Desktop Express version) All v4.2 QFEs features and bug fixes (PWM enhancements, lwIP and network driver reliability improvements, Analog Output, WinUSB and latest GCC support) Improved diagnostic information for deployment Decreased boot time Bug fixes Work Item 1736 - Create link for MFDeploy under start menu Work Item 1504 - Customizing lwIP o...New Projects.NET 4.0 Object/Function DLL interface: A Visual Basic .NET 4.0 Object / Function interface for quick and easy updating.AzureDirectory Library for Lucene.Net: This project allows you to create Lucene Indexes via a Lucene Directory object which uses Windows Azure BlobStorage for persistent storage.Building Modern Mobile Web Apps: This project provides guidance on building mobile web experiences using HTML5, CSS3, and JavaScript. Developing web apps for mobile browsers can be less forgiviC# Singleton Base-Class: C# Singleton Base Class is a single, simple C# class used to implement the Singleton pattern through inheritance.C++ Unit Test Library for Windows Store apps with Async Helper: This Visual Studio extension will install a project template for C++ Unit Test Library for Windows Store apps that contains async helper.Citrix Mobile Application SDK Samples: This site contains our latest prototype samples for the Citrix Mobile Application SDK for you to play with.eBrain Engine: eBrain Engine is a software to administer neuropsychological tests by means of advanced I/F devices like BCI P300 and eye-tracking systemsFairycake: A game about a little witch. Java. FedEx Connector for AbleCommerce Gold: This plugin provides FedEx rating and tracking services for the AbleCommerce shopping cart.Free - Simple Phone Book (SimPB): An alternative to backup your telephone contacts. Portable, easy to use, multilingual.GPXLocalTime2UTC: Time Format Converter, from Local to UTC, for GPX Files A small utility made to open GPX files (XML), search for the "time" tag, then transform the local time Grid Solutions Framework: Core classes to manage, analyze, and visualize real-time and historical data. This project combines the time-series framework and TVA code library projects.HireMe: HireMe is a HR application that works in conjunction with the HiremeMagazine.com website. The app will run in Android, iOS and WebOS.my-sim-asdf: my greate toolOdeToFoodMvc4: Source code for "Building Applications with ASP.NET MVC 4"OrgCharts for SharePoint: Months of planning, design, development and testing have gone into a truly phenomenal Org Chart experiencePDC BW2: A pkm editor application that tries to make easier legal pkm Packed with Legality Analysis, Event downloader and more...PI Payroll system: This is a payroll system for People Index, LLC.Project13251011: fsdQuickWick.NET - Agile Pproject Management: QuickWick.NET is a simple and effective web-based solution for agile project management.SharePoint 2010 Top Nav: This project was created when a project I was on required a Navigation scheme to manage site collections. SisGAC: Sistema de Gerenciamento de Artigos para CongressosTriDes_project_3AO: encryptohideVB6Doc: Visual Basic 6 Documentor (VB6 Doc)VideoChat: Simple VideoChat web-application on ASP.NET MVC4, SignalR, Wowza Media Server/Windows Updater Open: This is an alternative to the Windows Update software on all versions of Windows OS and the WSUS provided for corporations to update multiple machines.WOWAddonsUpdater: WPF App to update lua addons for the Blizzard World of warcraft gameyygua: email:huliang@yahoo.cnZJUDTS2: ????Silverlight?????????

Read the article

Parsing Concerns

- by Jesse

If you’ve ever written an application that accepts date and/or time inputs from an external source (a person, an uploaded file, posted XML, etc.) then you’ve no doubt had to deal with parsing some text representing a date into a data structure that a computer can understand. Similarly, you’ve probably also had to take values from those same data structure and turn them back into their original formats. Most (all?) suitably modern development platforms expose some kind of parsing and formatting functionality for turning text into dates and vice versa. In .NET, the DateTime data structure exposes ‘Parse’ and ‘ToString’ methods for this purpose. This post will focus mostly on parsing, though most of the examples and suggestions below can also be applied to the ToString method. The DateTime.Parse method is pretty permissive in the values that it will accept (though apparently not as permissive as some other languages) which makes it pretty easy to take some text provided by a user and turn it into a proper DateTime instance. Here are some examples (note that the resulting DateTime values are shown using the RFC1123 format): DateTime.Parse("3/12/2010"); //Fri, 12 Mar 2010 00:00:00 GMT DateTime.Parse("2:00 AM"); //Sat, 01 Jan 2011 02:00:00 GMT (took today's date as date portion) DateTime.Parse("5-15/2010"); //Sat, 15 May 2010 00:00:00 GMT DateTime.Parse("7/8"); //Fri, 08 Jul 2011 00:00:00 GMT DateTime.Parse("Thursday, July 1, 2010"); //Thu, 01 Jul 2010 00:00:00 GMT Dealing With Inaccuracy While the DateTime struct has the ability to store a date and time value accurate down to the millisecond, most date strings provided by a user are not going to specify values with that much precision. In each of the above examples, the Parse method was provided a partial value from which to construct a proper DateTime. This means it had to go ahead and assume what you meant and fill in the missing parts of the date and time for you. This is a good thing, especially when we’re talking about taking input from a user. We can’t expect that every person using our software to provide a year, day, month, hour, minute, second, and millisecond every time they need to express a date. That said, it’s important for developers to understand what assumptions the software might be making and plan accordingly. I think the assumptions that were made in each of the above examples were pretty reasonable, though if we dig into this method a little bit deeper we’ll find that there are a lot more assumptions being made under the covers than you might have previously known. One of the biggest assumptions that the DateTime.Parse method has to make relates to the format of the date represented by the provided string. Let’s consider this example input string: ‘10-02-15’. To some people. that might look like ‘15-Feb-2010’. To others, it might be ‘02-Oct-2015’. Like many things, it depends on where you’re from. This Is America! Most cultures around the world have adopted a “little-endian” or “big-endian” formats. (Source: Date And Time Notation By Country) In this context, a “little-endian” date format would list the date parts with the least significant first while the “big-endian” date format would list them with the most significant first. For example, a “little-endian” date would be “day-month-year” and “big-endian” would be “year-month-day”. It’s worth nothing here that ISO 8601 defines a “big-endian” format as the international standard. While I personally prefer “big-endian” style date formats, I think both styles make sense in that they follow some logical standard with respect to ordering the date parts by their significance. Here in the United States, however, we buck that trend by using what is, in comparison, a completely nonsensical format of “month/day/year”. Almost no other country in the world uses this format. I’ve been fortunate in my life to have done some international travel, so I’ve been aware of this difference for many years, but never really thought much about it. Until recently, I had been developing software for exclusively US-based audiences and remained blissfully ignorant of the different date formats employed by other countries around the world. The web application I work on is being rolled out to users in different countries, so I was recently tasked with updating it to support different date formats. As it turns out, .NET has a great mechanism for dealing with different date formats right out of the box. Supporting date formats for different cultures is actually pretty easy once you understand this mechanism. Pulling the Curtain Back On the Parse Method Have you ever taken a look at the different flavors (read: overloads) that the DateTime.Parse method comes in? In it’s simplest form, it takes a single string parameter and returns the corresponding DateTime value (if it can divine what the date value should be). You can optionally provide two additional parameters to this method: an ‘System.IFormatProvider’ and a ‘System.Globalization.DateTimeStyles’. Both of these optional parameters have some bearing on the assumptions that get made while parsing a date, but for the purposes of this article I’m going to focus on the ‘System.IFormatProvider’ parameter. The IFormatProvider exposes a single method called ‘GetFormat’ that returns an object to be used for determining the proper format for displaying and parsing things like numbers and dates. This interface plays a big role in the globalization capabilities that are built into the .NET Framework. The cornerstone of these globalization capabilities can be found in the ‘System.Globalization.CultureInfo’ class. To put it simply, the CultureInfo class is used to encapsulate information related to things like language, writing system, and date formats for a certain culture. Support for many cultures are “baked in” to the .NET Framework and there is capacity for defining custom cultures if needed (thought I’ve never delved into that). While the details of the CultureInfo class are beyond the scope of this post, so for now let me just point out that the CultureInfo class implements the IFormatInfo interface. This means that a CultureInfo instance created for a given culture can be provided to the DateTime.Parse method in order to tell it what date formats it should expect. So what happens when you don’t provide this value? Let’s crack this method open in Reflector: When no IFormatInfo parameter is provided (i.e. we use the simple DateTime.Parse(string) overload), the ‘DateTimeFormatInfo.CurrentInfo’ is used instead. Drilling down a bit further we can see the implementation of the DateTimeFormatInfo.CurrentInfo property: From this property we can determine that, in the absence of an IFormatProvider being specified, the DateTime.Parse method will assume that the provided date should be treated as if it were in the format defined by the CultureInfo object that is attached to the current thread. The culture specified by the CultureInfo instance on the current thread can vary depending on several factors, but if you’re writing an application where a single instance might be used by people from different cultures (i.e. a web application with an international user base), it’s important to know what this value is. Having a solid strategy for setting the current thread’s culture for each incoming request in an internationally used ASP .NET application is obviously important, and might make a good topic for a future post. For now, let’s think about what the implications of not having the correct culture set on the current thread. Let’s say you’re running an ASP .NET application on a server in the United States. The server was setup by English speakers in the United States, so it’s configured for US English. It exposes a web page where users can enter order data, one piece of which is an anticipated order delivery date. Most users are in the US, and therefore enter dates in a ‘month/day/year’ format. The application is using the DateTime.Parse(string) method to turn the values provided by the user into actual DateTime instances that can be stored in the database. This all works fine, because your users and your server both think of dates in the same way. Now you need to support some users in South America, where a ‘day/month/year’ format is used. The best case scenario at this point is a user will enter March 13, 2011 as ‘25/03/2011’. This would cause the call to DateTime.Parse to blow up since that value doesn’t look like a valid date in the US English culture (Note: In all likelihood you might be using the DateTime.TryParse(string) method here instead, but that method behaves the same way with regard to date formats). “But wait a minute”, you might be saying to yourself, “I thought you said that this was the best case scenario?” This scenario would prevent users from entering orders in the system, which is bad, but it could be worse! What if the order needs to be delivered a day earlier than that, on March 12, 2011? Now the user enters ‘12/03/2011’. Now the call to DateTime.Parse sees what it thinks is a valid date, but there’s just one problem: it’s not the right date. Now this order won’t get delivered until December 3, 2011. In my opinion, that kind of data corruption is a much bigger problem than having the Parse call fail. What To Do? My order entry example is a bit contrived, but I think it serves to illustrate the potential issues with accepting date input from users. There are some approaches you can take to make this easier on you and your users: Eliminate ambiguity by using a graphical date input control. I’m personally a fan of a jQuery UI Datepicker widget. It’s pretty easy to setup, can be themed to match the look and feel of your site, and has support for multiple languages and cultures. Be sure you have a way to track the culture preference of each user in your system. For a web application this could be done using something like a cookie or session state variable. Ensure that the current user’s culture is being applied correctly to DateTime formatting and parsing code. This can be accomplished by ensuring that each request has the handling thread’s CultureInfo set properly, or by using the Format and Parse method overloads that accept an IFormatProvider instance where the provided value is a CultureInfo object constructed using the current user’s culture preference. When in doubt, favor formats that are internationally recognizable. Using the string ‘2010-03-05’ is likely to be recognized as March, 5 2011 by users from most (if not all) cultures. Favor standard date format strings over custom ones. So far we’ve only talked about turning a string into a DateTime, but most of the same “gotchas” apply when doing the opposite. Consider this code: someDateValue.ToString("MM/dd/yyyy"); This will output the same string regardless of what the current thread’s culture is set to (with the exception of some cultures that don’t use the Gregorian calendar system, but that’s another issue all together). For displaying dates to users, it would be better to do this: someDateValue.ToString("d"); This standard format string of “d” will use the “short date format” as defined by the culture attached to the current thread (or provided in the IFormatProvider instance in the proper method overload). This means that it will honor the proper month/day/year, year/month/day, or day/month/year format for the culture. Knowing Your Audience The examples and suggestions shown above can go a long way toward getting an application in shape for dealing with date inputs from users in multiple cultures. There are some instances, however, where taking approaches like these would not be appropriate. In some cases, the provider or consumer of date values that pass through your application are not people, but other applications (or other portions of your own application). For example, if your site has a page that accepts a date as a query string parameter, you’ll probably want to format that date using invariant date format. Otherwise, the same URL could end up evaluating to a different page depending on the user that is viewing it. In addition, if your application exports data for consumption by other systems, it’s best to have an agreed upon format that all systems can use and that will not vary depending upon whether or not the users of the systems on either side prefer a month/day/year or day/month/year format. I’ll look more at some approaches for dealing with these situations in a future post. If you take away one thing from this post, make it an understanding of the importance of knowing where the dates that pass through your system come from and are going to. You will likely want to vary your parsing and formatting approach depending on your audience.

Read the article

I never really understood: what is CGI?

- by claws

CGI is a Comman Gateway Interface. As the name says, it is a "common" gateway interface for everything. It is so trivial and naive from the name. I feel that I understood this and I felt this every time I encountered this word. But frankly, I didn't. I'm still confused. I am a PHP programmer. I did lot of web development. user (client) request for page --- webserver(-embedded PHP interpreter) ---- Server side(PHP) Script --- MySQL Server. Now say my PHP Script can fetch results from MySQL Server && MATLAB Server && Some other server. So, now PHP Script is the CGI? because its interface for the between webserver & All other servers? I don't know. Sometimes they call CGI, a technology & othertimes they call CGI a program or someother server. What exactly is CGI? Whats the big deal with /cgi-bin/*.cgi? Whats up with this? I don't know what is this cgi-bin directory on the server for. I don't know why they have *.cgi extensions. Why does Perl always comes in the way. CGI & Perl (language). I also don't know whats up with these two. Almost all the time I keep hearing these two in combination "CGI & Perl". This book is another great example CGI Programming with Perl Why not "CGI Programming with PHP/JSP/ASP". I never saw such things. CGI Programming in C this confuses me a lot. in C?? Seriously?? I don't know what to say. I"m just confused. "in C"?? This changes everything. Program needs to be compiled and executed. This entirely changes my view of web programming. When do I compile? How does the program gets executed (because it will be a machine code, so it must execute as a independent process). How does it communicate with the web server? IPC? and interfacing with all the servers (in my example MATLAB & MySQL) using socket programming? I'm lost!! They say that CGI is depreciated. Its no more in use. Is it so? What is its latest update? Once, I ran into a situation where I had to give HTTP PUT request access to web server (Apache HTTPD). Its a long back. So, as far as I remember this is what I did: Edited the configuration file of Apache HTTPD to tell webserver to pass all HTTP PUT requests to some put.php ( I had to write this PHP script) Implement put.php to handle the request (save the file to the location mentioned) People said that I wrote a CGI Script. Seriously, I didn't have clue what they were talking about. Did I really write CGI Script? I hope you understood what my confusion is. (Because I myself don't know where I'm confused). I request you guys to keep your answer as simple as possible. I really can't understand any fancy technical terminology. At least not in this case. EDIT: I found this amazing tutorial "CGI Programming Is Simple!" - CGI Tutorial Which explains the concepts in simplest possible way. I've only have one complaint about this tutorial. Just to make what ever he explained complete he should have shown the C code he used for generating response for those GET / POST requests. I've also added link to this tutorial to Wikipedia's article : http://en.wikipedia.org/wiki/Common_Gateway_Interface

Read the article

Am I crazy? (How) should I create a jQuery content editor?

- by Brendon Muir

Ok, so I created a CMS mainly aimed at Primary Schools. It's getting fairly popular in New Zealand but the one thing I hate with a passion is the largely bad quality of in browser WYSIWYG editors. I've been using KTML (made by InterAKT which was purchased by Adobe a few years ago). In my opinion this editor does a lot of great things (image editing/management, thumbnailing and pretty good content editing). Unfortunately time has had its nasty way with this product and new browsers are beginning to break features and generally degrade the performance of this tool. It's also quite scary basing my livelihood on a defunct product! I've been hunting, in fact I regularly hunt around to see if anything has changed in the WYSIWYG arena. The closest thing I've seen that excites me is the WYSIHAT framework, but they've decided to ignore a pretty relevant editing paradigm which I'm going to outline below. This is the idea for my proposed editor, and I don't know of any existing products that can do this properly: Right, so the traditional model for editing let's say a Page in a CMS is to log into a 'back end' and click edit on the page. This will then load another screen with the editor in it and perhaps a few other fields. More advanced CMS's will maybe have several editing boxes that are for different portions of the page. Anyway, the big problem with this way of doing things is that the user is editing a document outside of the final context it will appear in. In the simplest terms, this means the page template. Many things can be wrong, e.g. the with of the editing area might be different to the width of the actual template area. The height is nearly always fixed because existing editors always seem to use IFRAMES for backward compatibility. And there are plenty of other beefs which I'm sure you're quite aware of if you're in this development area. Here's my editor utopia: You click 'Edit Page': The actual page (with its actual template) displays. Portions of the page have been marked as editable via a class name. You click on one of these areas (in my case it'd just be the big 'body' area in the middle of the template) and a editing bar drops down from the top of the screen with all your standard controls (bold, italic, insert image etc...). Iframes are never used, instead we rely on setting contentEditable to true on the DIV's in question. Firefox 2 and IE6 can go away, let's move on. You can edit the page knowing exactly how it will look when you save it. Because all the styles for this template are loaded, your headings will look correct, everything will be just dandy. Is this such a radical concept? Why are we still content with TinyMCE and that other editor that is too embarrassing to use because it sounds like a swear word!? Let's face the facts: I'm a JavaScript novice. I did once play around in this area using the Javascript Anthology from SitePoint as a guide. It was quite a cool learning experience, but they of course used the IFRAME to make their lives easier. I tried to go a different route and just use contentEditable and even tried to sidestep the native content editing routines (execCommand) and instead wrote my own. They kind of worked but there were always issues. Now we have jQuery, and a few libraries that abstract things like IE's lack of Range support. I'm wondering, am I crazy, or is it actually a good idea to try and build an editor around this editing paradigm using jQuery and relevant plugins to make the job easier? My actual questions: Where would you start? What plugins do you know of that would help the most? Is it worth it, or is there a magical project that already exists that I should join in on? What are the biggest hurdles to overcome in a project like this? Am I crazy? I hope this question has been posted on the right board. I figured it is a technical question as I'm wanting to know specific hurdles and pitfalls to watch out for and also if it is technically feasible with todays technology. Looking forward to hearing peoples thoughts and opinions.

Read the article

How should I implement simple caches with concurrency on Redis?

- by solublefish

Background I have a 2-tier web service - just my app server and an RDBMS. I want to move to a pool of identical app servers behind a load balancer. I currently cache a bunch of objects in-process. I hope to move them to a shared Redis. I have a dozen or so caches of simple, small-sized business objects. For example, I have a set of Foos. Each Foo has a unique FooId and an OwnerId. One "owner" may own multiple Foos. In a traditional RDBMS this is just a table with an index on the PK FooId and one on OwnerId. I'm caching this in one process simply: Dictionary<int,Foo> _cacheFooById; Dictionary<int,HashSet<int>> _indexFooIdsByOwnerId; Reads come straight from here, and writes go here and to the RDBMS. I usually have this invariant: "For a given group [say by OwnerId], the whole group is in cache or none of it is." So when I cache miss on a Foo, I pull that Foo and all the owner's other Foos from the RDBMS. Updates make sure to keep the index up to date and respect the invariant. When an owner calls GetMyFoos I never have to worry that some are cached and some aren't. What I did already The first/simplest answer seems to be to use plain ol' SET and GET with a composite key and json value: SET( "ServiceCache:Foo:" + theFoo.Id, JsonSerialize(theFoo)); I later decided I liked: HSET( "ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo)); That lets me get all the values in one cache as HVALS. It also felt right - I'm literally moving hashtables to Redis, so perhaps my top-level items should be hashes. This works to first order. If my high-level code is like: UpdateCache(myFoo); AddToIndex(myFoo); That translates into: HSET ("ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo)); var myFoos = JsonDeserialize( HGET ("ServiceCache:FooIndex", theFoo.OwnerId) ); myFoos.Add(theFoo.OwnerId); HSET ("ServiceCache:FooIndex", theFoo.OwnerId, JsonSerialize(myFoos)); However, this is broken in two ways. Two concurrent operations can read/modify/write at the same time. The latter "wins" the final HSET and the former's index update is lost. Another operation could read the index in between the first and second lines. It would miss a Foo that it should find. So how do I index properly? I think I could use a Redis set instead of a json-encoded value for the index. That would solve part of the problem since the "add-to-index-if-not-already-present" would be atomic. I also read about using MULTI as a "transaction" but it doesn't seem like it does what I want. Am I right that I can't really MULTI; HGET; {update}; HSET; EXEC since it doesn't even do the HGET before I issue the EXEC? I also read about using WATCH and MULTI for optimistic concurrency, then retrying on failure. But WATCH only works on top-level keys. So it's back to SET/GET instead of HSET/HGET. And now I need a new index-like-thing to support getting all the values in a given cache. If I understand it right, I can combine all these things to do the job. Something like: while(!succeeded) { WATCH( "ServiceCache:Foo:" + theFoo.FooId ); WATCH( "ServiceCache:FooIndexByOwner:" + theFoo.OwnerId ); WATCH( "ServiceCache:FooIndexAll" ); MULTI(); SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo)); SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId); SADD ("ServiceCache:FooIndexAll", theFoo.FooId); EXEC(); //TODO somehow set succeeded properly } Finally I'd have to translate this pseudocode into real code depending how my client library uses WATCH/MULTI/EXEC; it looks like they need some sort of context to hook them together. All in all this seems like a lot of complexity for what has to be a very common case; I can't help but think there's a better, smarter, Redis-ish way to do things that I'm just not seeing. How do I lock properly? Even if I had no indexes, there's still a (probably rare) race condition. A: HGET - cache miss B: HGET - cache miss A: SELECT B: SELECT A: HSET C: HGET - cache hit C: UPDATE C: HSET B: HSET ** this is stale data that's clobbering C's update. Note that C could just be a really-fast A. Again I think WATCH, MULTI, retry would work, but... ick. I know in some places people use special Redis keys as locks for other objects. Is that a reasonable approach here? Should those be top-level keys like ServiceCache:FooLocks:{Id} or ServiceCache:Locks:Foo:{Id}? Or make a separate hash for them - ServiceCache:Locks with subkeys Foo:{Id}, or ServiceCache:Locks:Foo with subkeys {Id} ? How would I work around abandoned locks, say if a transaction (or a whole server) crashes while "holding" the lock?

Read the article

A Taxonomy of Numerical Methods v1

- by JoshReuben

Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

Read the article

Entity Framework Code-First, OData & Windows Phone Client

- by Jon Galloway

Entity Framework Code-First is the coolest thing since sliced bread, Windows Phone is the hottest thing since Tickle-Me-Elmo and OData is just too great to ignore. As part of the Full Stack project, we wanted to put them together, which turns out to be pretty easy… once you know how. EF Code-First CTP5 is available now and there should be very few breaking changes in the release edition, which is due early in 2011. Note: EF Code-First evolved rapidly and many of the existing documents and blog posts which were written with earlier versions, may now be obsolete or at least misleading. Code-First? With traditional Entity Framework you start with a database and from that you generate “entities” – classes that bridge between the relational database and your object oriented program. With Code-First (Magic-Unicorn) (see Hanselman’s write up and this later write up by Scott Guthrie) the Entity Framework looks at classes you created and says “if I had created these classes, the database would have to have looked like this…” and creates the database for you! By deriving your entity collections from DbSet and exposing them via a class that derives from DbContext, you "turn on" database backing for your POCO with a minimum of code and no hidden designer or configuration files. POCO == Plain Old CLR Objects Your entity objects can be used throughout your applications - in web applications, console applications, Silverlight and Windows Phone applications, etc. In our case, we'll want to read and update data from a Windows Phone client application, so we'll expose the entities through a DataService and hook the Windows Phone client application to that data via proxies. Piece of Pie. Easy as cake. The Demo Architecture To see this at work, we’ll create an ASP.NET/MVC application which will act as the host for our Data Service. We’ll create an incredibly simple data layer using EF Code-First on top of SQLCE4 and we’ll expose the data in a WCF Data Service using the oData protocol. Our Windows Phone 7 client will instantiate the data context via a URI and load the data asynchronously. Setting up the Server project with MVC 3, EF Code First, and SQL CE 4 Create a new application of type ASP.NET MVC 3 and name it DeadSimpleServer. We need to add the latest SQLCE4 and Entity Framework Code First CTP's to our project. Fortunately, NuGet makes that really easy. Open the Package Manager Console (View / Other Windows / Package Manager Console) and type in "Install-Package EFCodeFirst.SqlServerCompact" at the PM> command prompt. Since NuGet handles dependencies for you, you'll see that it installs everything you need to use Entity Framework Code First in your project. PM> install-package EFCodeFirst.SqlServerCompact 'SQLCE (= 4.0.8435.1)' not installed. Attempting to retrieve dependency from source... Done 'EFCodeFirst (= 0.8)' not installed. Attempting to retrieve dependency from source... Done 'WebActivator (= 1.0.0.0)' not installed. Attempting to retrieve dependency from source... Done You are downloading SQLCE from Microsoft, the license agreement to which is available at http://173.203.67.148/licenses/SQLCE/EULA_ENU.rtf. Check the package for additional dependencies, which may come with their own license agreement(s). Your use of the package and dependencies constitutes your acceptance of their license agreements. If you do not accept the license agreement(s), then delete the relevant components from your device. Successfully installed 'SQLCE 4.0.8435.1' You are downloading EFCodeFirst from Microsoft, the license agreement to which is available at http://go.microsoft.com/fwlink/?LinkID=206497. Check the package for additional dependencies, which may come with their own license agreement(s). Your use of the package and dependencies constitutes your acceptance of their license agreements. If you do not accept the license agreement(s), then delete the relevant components from your device. Successfully installed 'EFCodeFirst 0.8' Successfully installed 'WebActivator 1.0.0.0' You are downloading EFCodeFirst.SqlServerCompact from Microsoft, the license agreement to which is available at http://173.203.67.148/licenses/SQLCE/EULA_ENU.rtf. Check the package for additional dependencies, which may come with their own license agreement(s). Your use of the package and dependencies constitutes your acceptance of their license agreements. If you do not accept the license agreement(s), then delete the relevant components from your device. Successfully installed 'EFCodeFirst.SqlServerCompact 0.8' Successfully added 'SQLCE 4.0.8435.1' to EfCodeFirst-CTP5 Successfully added 'EFCodeFirst 0.8' to EfCodeFirst-CTP5 Successfully added 'WebActivator 1.0.0.0' to EfCodeFirst-CTP5 Successfully added 'EFCodeFirst.SqlServerCompact 0.8' to EfCodeFirst-CTP5 Note: We're using SQLCE 4 with Entity Framework here because they work really well together from a development scenario, but you can of course use Entity Framework Code First with other databases supported by Entity framework. Creating The Model using EF Code First Now we can create our model class. Right-click the Models folder and select Add/Class. Name the Class Person.cs and add the following code: using System.Data.Entity; namespace DeadSimpleServer.Models { public class Person { public int ID { get; set; } public string Name { get; set; } } public class PersonContext : DbContext { public DbSet<Person> People { get; set; } } } Notice that the entity class Person has no special interfaces or base class. There's nothing special needed to make it work - it's just a POCO. The context we'll use to access the entities in the application is called PersonContext, but you could name it anything you wanted. The important thing is that it inherits DbContext and contains one or more DbSet which holds our entity collections. Adding Seed Data We need some testing data to expose from our service. The simplest way to get that into our database is to modify the CreateCeDatabaseIfNotExists class in AppStart_SQLCEEntityFramework.cs by adding some seed data to the Seed method: protected virtual void Seed( TContext context ) { var personContext = context as PersonContext; personContext.People.Add( new Person { ID = 1, Name = "George Washington" } ); personContext.People.Add( new Person { ID = 2, Name = "John Adams" } ); personContext.People.Add( new Person { ID = 3, Name = "Thomas Jefferson" } ); personContext.SaveChanges(); } The CreateCeDatabaseIfNotExists class name is pretty self-explanatory - when our DbContext is accessed and the database isn't found, a new one will be created and populated with the data in the Seed method. There's one more step to make that work - we need to uncomment a line in the Start method at the top of of the AppStart_SQLCEEntityFramework class and set the context name, as shown here, public static class AppStart_SQLCEEntityFramework { public static void Start() { DbDatabase.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0"); // Sets the default database initialization code for working with Sql Server Compact databases // Uncomment this line and replace CONTEXT_NAME with the name of your DbContext if you are // using your DbContext to create and manage your database DbDatabase.SetInitializer(new CreateCeDatabaseIfNotExists<PersonContext>()); } } Now our database and entity framework are set up, so we can expose data via WCF Data Services. Note: This is a bare-bones implementation with no administration screens. If you'd like to see how those are added, check out The Full Stack screencast series. Creating the oData Service using WCF Data Services Add a new WCF Data Service to the project (right-click the project / Add New Item / Web / WCF Data Service). We’ll be exposing all the data as read/write. Remember to reconfigure to control and minimize access as appropriate for your own application. Open the code behind for your service. In our case, the service was called PersonTestDataService.svc so the code behind class file is PersonTestDataService.svc.cs. using System.Data.Services; using System.Data.Services.Common; using System.ServiceModel; using DeadSimpleServer.Models; namespace DeadSimpleServer { [ServiceBehavior( IncludeExceptionDetailInFaults = true )] public class PersonTestDataService : DataService<PersonContext> { // This method is called only once to initialize service-wide policies. public static void InitializeService( DataServiceConfiguration config ) { config.SetEntitySetAccessRule( "*", EntitySetRights.All ); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2; config.UseVerboseErrors = true; } } } We're enabling a few additional settings to make it easier to debug if you run into trouble. The ServiceBehavior attribute is set to include exception details in faults, and we're using verbose errors. You can remove both of these when your service is working, as your public production service shouldn't be revealing exception information. You can view the output of the service by running the application and browsing to http://localhost:[portnumber]/PersonTestDataService.svc/: <service xml:base="http://localhost:49786/PersonTestDataService.svc/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"> <workspace> <atom:title>Default</atom:title> <collection href="People"> <atom:title>People</atom:title> </collection> </workspace> </service> This indicates that the service exposes one collection, which is accessible by browsing to http://localhost:[portnumber]/PersonTestDataService.svc/People <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?> <feed xml:base=http://localhost:49786/PersonTestDataService.svc/ xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom"> <title type="text">People</title> <id>http://localhost:49786/PersonTestDataService.svc/People</id> <updated>2010-12-29T01:01:50Z</updated> <link rel="self" title="People" href="People" /> <entry> <id>http://localhost:49786/PersonTestDataService.svc/People(1)</id> <title type="text"></title> <updated>2010-12-29T01:01:50Z</updated> <author> <name /> </author> <link rel="edit" title="Person" href="People(1)" /> <category term="DeadSimpleServer.Models.Person" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> <content type="application/xml"> <m:properties> <d:ID m:type="Edm.Int32">1</d:ID> <d:Name>George Washington</d:Name> </m:properties> </content> </entry> <entry> ... </entry> </feed> Let's recap what we've done so far. But enough with services and XML - let's get this into our Windows Phone client application. Creating the DataServiceContext for the Client Use the latest DataSvcUtil.exe from http://odata.codeplex.com. As of today, that's in this download: http://odata.codeplex.com/releases/view/54698 You need to run it with a few options: /uri - This will point to the service URI. In this case, it's http://localhost:59342/PersonTestDataService.svc Pick up the port number from your running server (e.g., the server formerly known as Cassini). /out - This is the DataServiceContext class that will be generated. You can name it whatever you'd like. /Version - should be set to 2.0 /DataServiceCollection - Include this flag to generate collections derived from the DataServiceCollection base, which brings in all the ObservableCollection goodness that handles your INotifyPropertyChanged events for you. Here's the console session from when we ran it: <ListBox x:Name="MainListBox" Margin="0,0,-12,0" ItemsSource="{Binding}" SelectionChanged="MainListBox_SelectionChanged"> Next, to keep things simple, change the Binding on the two TextBlocks within the DataTemplate to Name and ID, <ListBox x:Name="MainListBox" Margin="0,0,-12,0" ItemsSource="{Binding}" SelectionChanged="MainListBox_SelectionChanged"> <ListBox.ItemTemplate> <DataTemplate> <StackPanel Margin="0,0,0,17" Width="432"> <TextBlock Text="{Binding Name}" TextWrapping="Wrap" Style="{StaticResource PhoneTextExtraLargeStyle}" /> <TextBlock Text="{Binding ID}" TextWrapping="Wrap" Margin="12,-6,12,0" Style="{StaticResource PhoneTextSubtleStyle}" /> </StackPanel> </DataTemplate> </ListBox.ItemTemplate> </ListBox> Getting The Context In the code-behind you’ll first declare a member variable to hold the context from the Entity Framework. This is named using convention over configuration. The db type is Person and the context is of type PersonContext, You initialize it by providing the URI, in this case using the URL obtained from the Cassini web server, PersonContext context = new PersonContext( new Uri( "http://localhost:49786/PersonTestDataService.svc/" ) ); Create a second member variable of type DataServiceCollection<Person> but do not initialize it, DataServiceCollection<Person> people; In the constructor you’ll initialize the DataServiceCollection using the PersonContext, public MainPage() { InitializeComponent(); people = new DataServiceCollection<Person>( context ); Finally, you’ll load the people collection using the LoadAsync method, passing in the fully specified URI for the People collection in the web service, people.LoadAsync( new Uri( "http://localhost:49786/PersonTestDataService.svc/People" ) ); Note that this method runs asynchronously and when it is finished the people collection is already populated. Thus, since we didn’t need or want to override any of the behavior we don’t implement the LoadCompleted. You can use the LoadCompleted event if you need to do any other UI updates, but you don't need to. The final code is as shown below: using System; using System.Data.Services.Client; using System.Windows; using System.Windows.Controls; using DeadSimpleServer.Models; using Microsoft.Phone.Controls; namespace WindowsPhoneODataTest { public partial class MainPage : PhoneApplicationPage { PersonContext context = new PersonContext( new Uri( "http://localhost:49786/PersonTestDataService.svc/" ) ); DataServiceCollection<Person> people; // Constructor public MainPage() { InitializeComponent(); // Set the data context of the listbox control to the sample data // DataContext = App.ViewModel; people = new DataServiceCollection<Person>( context ); people.LoadAsync( new Uri( "http://localhost:49786/PersonTestDataService.svc/People" ) ); DataContext = people; this.Loaded += new RoutedEventHandler( MainPage_Loaded ); } // Handle selection changed on ListBox private void MainListBox_SelectionChanged( object sender, SelectionChangedEventArgs e ) { // If selected index is -1 (no selection) do nothing if ( MainListBox.SelectedIndex == -1 ) return; // Navigate to the new page NavigationService.Navigate( new Uri( "/DetailsPage.xaml?selectedItem=" + MainListBox.SelectedIndex, UriKind.Relative ) ); // Reset selected index to -1 (no selection) MainListBox.SelectedIndex = -1; } // Load data for the ViewModel Items private void MainPage_Loaded( object sender, RoutedEventArgs e ) { if ( !App.ViewModel.IsDataLoaded ) { App.ViewModel.LoadData(); } } } } With people populated we can set it as the DataContext and run the application; you’ll find that the Name and ID are displayed in the list on the Mainpage. Here's how the pieces in the client fit together: Complete source code available here

Read the article

Rounded Corners and Shadows – Dialogs with CSS

- by Rick Strahl

Well, it looks like we’ve finally arrived at a place where at least all of the latest versions of main stream browsers support rounded corners and box shadows. The two CSS properties that make this possible are box-shadow and box-radius. Both of these CSS Properties now supported in all the major browsers as shown in this chart from QuirksMode: In it’s simplest form you can use box-shadow and border radius like this: .boxshadow { -moz-box-shadow: 3px 3px 5px #535353; -webkit-box-shadow: 3px 3px 5px #535353; box-shadow: 3px 3px 5px #535353; } .roundbox { -moz-border-radius: 6px 6px 6px 6px; -webkit-border-radius: 6px; border-radius: 6px 6px 6px 6px; } box-shadow: horizontal-shadow-pixels vertical-shadow-pixels blur-distance shadow-color box-shadow attributes specify the the horizontal and vertical offset of the shadow, the blur distance (to give the shadow a smooth soft look) and a shadow color. The spec also supports multiple shadows separated by commas using the attributes above but we’re not using that functionality here. box-radius: top-left-radius top-right-radius bottom-right-radius bottom-left-radius border-radius takes a pixel size for the radius for each corner going clockwise. CSS 3 also specifies each of the individual corner elements such as border-top-left-radius, but support for these is much less prevalent so I would recommend not using them for now until support improves. Instead use the single box-radius to specify all corners. Browser specific Support in older Browsers Notice that there are two variations: The actual CSS 3 properties (box-shadow and box-radius) and the browser specific ones (-moz, –webkit prefixes for FireFox and Chrome/Safari respectively) which work in slightly older versions of modern browsers before official CSS 3 support was added. The goal is to spread support as widely as possible and the prefix versions extend the range slightly more to those browsers that provided early support for these features. Notice that box-shadow and border-radius are used after the browser specific versions to ensure that the latter versions get precedence if the browser supports both (last assignment wins). Use the .boxshadow and .roundbox Styles in HTML To use these two styles create a simple rounded box with a shadow you can use HTML like this:  <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="boxcontenttext"> Simple Rounded Corner Box. </div> </div> which looks like this in the browser: This works across browsers and it’s pretty sweet and simple. Watch out for nested Elements! There are a couple of things to be aware of however when using rounded corners. Specifically, you need to be careful when you nest other non-transparent content into the rounded box. For example check out what happens when I change the inside <div> to have a colored background:  <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="boxcontenttext" style="background: khaki;"> Simple Rounded Corner Box. </div> </div> which renders like this: If you look closely you’ll find that the inside <div>’s corners are not rounded and so ‘poke out’ slightly over the rounded corners. It looks like the rounded corners are ‘broken’ up instead of a solid rounded line around the corner, which his pretty ugly. The bigger the radius the more drastic this effect becomes . To fix this issue the inner <div> also has have rounded corners at the same or slightly smaller radius than the outer <div>. The simple fix for this is to simply also apply the roundbox style to the inner <div> in addition to the boxcontenttext style already applied: <div class="boxcontenttext roundbox" style="background: khaki;"> The fixed display now looks proper: Separate Top and Bottom Elements This gets even a little more tricky if you have an element at the top or bottom only of the rounded box. What if you need to add something like a header or footer <div> that have non-transparent backgrounds which is a pretty common scenario? In those cases you want only the top or bottom corners rounded and not both. To make this work a couple of additional styles to round only the top and bottom corners can be created: .roundbox-top { -moz-border-radius: 4px 4px 0 0; -webkit-border-radius: 4px 4px 0 0; border-radius: 4px 4px 0 0; } .roundbox-bottom { -moz-border-radius: 0 0 4px 4px; -webkit-border-radius: 0 0 4px 4px; border-radius: 0 0 4px 4px; } Notice that radius used for the ‘inside’ rounding is smaller (4px) than the outside radius (6px). This is so the inner radius fills into the outer border – if you use the same size you may have some white space showing between inner and out rounded corners. Experiment with values to see what works – in my experimenting the behavior across browsers here is consistent (thankfully). These styles can be applied in addition to other styles to make only the top or bottom portions of an element rounded. For example imagine I have styles like this: .gridheader, .gridheaderbig, .gridheaderleft, .gridheaderright { padding: 4px 4px 4px 4px; background: #003399 url(images/vertgradient.png) repeat-x; text-align: center; font-weight: bold; text-decoration: none; color: khaki; } .gridheaderleft { text-align: left; } .gridheaderright { text-align: right; } .gridheaderbig { font-size: 135%; } If I just apply say gridheader by itself in HTML like this: <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="gridheaderleft">Box with a Header</div> <div class="boxcontenttext" style="background: khaki;"> Simple Rounded Corner Box. </div> </div> This results in a pretty funky display – again due to the fact that the inner elements render square rather than rounded corners: If you look close again you can see that both the header and the main content have square edges which jumps out at the eye. To fix this you can now apply the roundbox-top and roundbox-bottom to the header and content respectively: <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="gridheaderleft roundbox-top">Box with a Header</div> <div class="boxcontenttext roundbox-bottom" style="background: khaki;"> Simple Rounded Corner Box. </div> </div> Which now gives the proper display with rounded corners both on the top and bottom: All of this is sweet to be supported – at least by the newest browser – without having to resort to images and nasty JavaScripts solutions. While this is still not a mainstream feature yet for the majority of actually installed browsers, the majority of browser users are very likely to have this support as most browsers other than IE are actively pushing users to upgrade to newer versions. Since this is a ‘visual display only feature it degrades reasonably well in non-supporting browsers: You get an uninteresting square and non-shadowed browser box, but the display is still overall functional. The main sticking point – as always is Internet Explorer versions 8.0 and down as well as older versions of other browsers. With those browsers you get a functional view that is a little less interesting to look at obviously: but at least it’s still functional. Maybe that’s just one more incentive for people using older browsers to upgrade to a more modern browser :-) Creating Dialog Related Styles In a lot of my AJAX based applications I use pop up windows which effectively work like dialogs. Using the simple CSS behaviors above, it’s really easy to create some fairly nice looking overlaid windows with nothing but CSS. Here’s what a typical ‘dialog’ I use looks like: The beauty of this is that it’s plain CSS – no plug-ins or images (other than the gradients which are optional) required. Add jQuery-ui draggable (or ww.jquery.js as shown below) and you have a nice simple inline implementation of a dialog represented by a simple <div> tag. Here’s the HTML for this dialog: <div id="divDialog" class="dialog boxshadow" style="width: 450px;"> <div class="dialog-header"> <div class="closebox"></div> User Sign-in </div> <div class="dialog-content"> <label>Username:</label> <input type="text" name="txtUsername" value=" " /> <label>Password</label> <input type="text" name="txtPassword" value=" " /> <hr /> <input type="button" id="btnLogin" value="Login" /> </div> <div class="dialog-statusbar">Ready</div> </div> Most of this behavior is driven by the ‘dialog’ styles which are fairly basic and easy to understand. They do use a few support images for the gradients which are provided in the sample I’ve provided. Here’s what the CSS looks like: .dialog { background: White; overflow: hidden; border: solid 1px steelblue; -moz-border-radius: 6px 6px 4px 4px; -webkit-border-radius: 6px 6px 4px 4px; border-radius: 6px 6px 3px 3px; } .dialog-header { background-image: url(images/dialogheader.png); background-repeat: repeat-x; text-align: left; color: cornsilk; padding: 5px; padding-left: 10px; font-size: 1.02em; font-weight: bold; position: relative; -moz-border-radius: 4px 4px 0px 0px; -webkit-border-radius: 4px 4px 0px 0px; border-radius: 4px 4px 0px 0px; } .dialog-top { -moz-border-radius: 4px 4px 0px 0px; -webkit-border-radius: 4px 4px 0px 0px; border-radius: 4px 4px 0px 0px; } .dialog-bottom { -moz-border-radius: 0 0 3px 3px; -webkit-border-radius: 0 0 3px 3px; border-radius: 0 0 3px 3px; } .dialog-content { padding: 15px; } .dialog-statusbar, .dialog-toolbar { background: #eeeeee; background-image: url(images/dialogstrip.png); background-repeat: repeat-x; padding: 5px; padding-left: 10px; border-top: solid 1px silver; border-bottom: solid 1px silver; font-size: 0.8em; } .dialog-statusbar { -moz-border-radius: 0 0 3px 3px; -webkit-border-radius: 0 0 3px 3px; border-radius: 0 0 3px 3px; padding-right: 10px; } .closebox { position: absolute; right: 2px; top: 2px; background-image: url(images/close.gif); background-repeat: no-repeat; width: 14px; height: 14px; cursor: pointer; opacity: 0.60; filter: alpha(opacity="80"); } .closebox:hover { opacity: 1; filter: alpha(opacity="100"); } The main style is the dialog class which is the outer box. It has the rounded border that serves as the outline. Note that I didn’t add the box-shadow to this style because in some situations I just want the rounded box in an inline display that doesn’t have a shadow so it’s still applied separately. dialog-header, then has the rounded top corners and displays a typical dialog heading format. dialog-bottom and dialog-top then provide the same functionality as roundbox-top and roundbox-bottom described earlier but are provided mainly in the stylesheet for consistency to match the dialog’s round edges and making it easier to remember and find in Intellisense as it shows up in the same dialog- group. dialog-statusbar and dialog-toolbar are two elements I use a lot for floating windows – the toolbar serves for buttons and options and filters typically, while the status bar provides information specific to the floating window. Since the the status bar is always on the bottom of the dialog it automatically handles the rounding of the bottom corners. Finally there’s closebox style which is to be applied to an empty <div> tag in the header typically. What this does is render a close image that is by default low-lighted with a low opacity value, and then highlights when hovered over. All you’d have to do handle the close operation is handle the onclick of the <div>. Note that the <div> right aligns so typically you should specify it before any other content in the header. Speaking of closable – some time ago I created a closable jQuery plug-in that basically automates this process and can be applied against ANY element in a page, automatically removing or closing the element with some simple script code. Using this you can leave out the <div> tag for closable and just do the following: To make the above dialog closable (and draggable) which makes it effectively and overlay window, you’d add jQuery.js and ww.jquery.js to the page: <script type="text/javascript" src="../../scripts/jquery.min.js"></script> <script type="text/javascript" src="../../scripts/ww.jquery.min.js"></script> and then simply call: <script type="text/javascript"> $(document).ready(function () { $("#divDialog") .draggable({ handle: ".dialog-header" }) .closable({ handle: ".dialog-header", closeHandler: function () { alert("Window about to be closed."); return true; // true closes - false leaves open } }); }); </script> * ww.jquery.js emulates base features in jQuery-ui’s draggable. If jQuery-ui is loaded its draggable version will be used instead and voila you have now have a draggable and closable window – here in mid-drag: The dragging and closable behaviors are of course optional, but it’s the final touch that provides dialog like window behavior. Relief for older Internet Explorer Versions with CSS Pie If you want to get these features to work with older versions of Internet Explorer all the way back to version 6 you can check out CSS Pie. CSS Pie provides an Internet Explorer behavior file that attaches to specific CSS rules and simulates these behavior using script code in IE (mostly by implementing filters). You can simply add the behavior to each CSS style that uses box-shadow and border-radius like this: .boxshadow { -moz-box-shadow: 3px 3px 5px #535353; -webkit-box-shadow: 3px 3px 5px #535353; box-shadow: 3px 3px 5px #535353; behavior: url(scripts/PIE.htc); } .roundbox { -moz-border-radius: 6px 6px 6px 6px; -webkit-border-radius: 6px; border-radius: 6px 6px 6px 6px; behavior: url(scripts/PIE.htc); } CSS Pie requires the PIE.htc on your server and referenced from each CSS style that needs it. Note that the url() for IE behaviors is NOT CSS file relative as other CSS resources, but rather PAGE relative , so if you have more than one folder you probably need to reference the HTC file with a fixed path like this: behavior: url(/MyApp/scripts/PIE.htc); in the style. Small price to pay, but a royal pain if you have a common CSS file you use in many applications. Once the PIE.htc file has been copied and you have applied the behavior to each style that uses these new features Internet Explorer will render rounded corners and box shadows! Yay! Hurray for box-shadow and border-radius All of this functionality is very welcome natively in the browser. If you think this is all frivolous visual candy, you might be right :-), but if you take a look on the Web and search for rounded corner solutions that predate these CSS attributes you’ll find a boatload of stuff from image files, to custom drawn content to Javascript solutions that play tricks with a few images. It’s sooooo much easier to have this functionality built in and I for one am glad to see that’s it’s finally becoming standard in the box. Still remember that when you use these new CSS features, they are not universal, and are not going to be really soon. Legacy browsers, especially old versions of Internet Explorer that can’t be updated will continue to be around and won’t work with this shiny new stuff. I say screw ‘em: Let them get a decent recent browser or see a degraded and ugly UI. We have the luxury with this functionality in that it doesn’t typically affect usability – it just doesn’t look as nice. Resources Download the Sample The sample includes the styles and images and sample page as well as ww.jquery.js for the draggable/closable example. Online Sample Check out the sample described in this post online. Closable and Draggable Documentation Documentation for the closeable and draggable plug-ins in ww.jquery.js. You can also check out the full documentation for all the plug-ins contained in ww.jquery.js here. © Rick Strahl, West Wind Technologies, 2005-2011Posted in HTML CSS

Read the article

ASP.NET Frameworks and Raw Throughput Performance

- by Rick Strahl

A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012 removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this: It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type. ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results: The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

ANTS CLR and Memory Profiler In Depth Review (Part 1 of 2 – CLR Profiler)

- by ToStringTheory

One of the things that people might not know about me, is my obsession to make my code as efficient as possible. Many people might not realize how much of a task or undertaking that this might be, but it is surely a task as monumental as climbing Mount Everest, except this time it is a challenge for the mind… In trying to make code efficient, there are many different factors that play a part – size of project or solution, tiers, language used, experience and training of the programmer, technologies used, maintainability of the code – the list can go on for quite some time. I spend quite a bit of time when developing trying to determine what is the best way to implement a feature to accomplish the efficiency that I look to achieve. One program that I have recently come to learn about – Red Gate ANTS Performance (CLR) and Memory profiler gives me tools to accomplish that job more efficiently as well. In this review, I am going to cover some of the features of the ANTS profiler set by compiling some hideous example code to test against. Notice As a member of the Geeks With Blogs Influencers program, one of the perks is the ability to review products, in exchange for a free license to the program. I have not let this affect my opinions of the product in any way, and Red Gate nor Geeks With Blogs has tried to influence my opinion regarding this product in any way. Introduction The ANTS Profiler pack provided by Red Gate was something that I had not heard of before receiving an email regarding an offer to review it for a license. Since I look to make my code efficient, it was a no brainer for me to try it out! One thing that I have to say took me by surprise is that upon downloading the program and installing it you fill out a form for your usual contact information. Sure enough within 2 hours, I received an email from a sales representative at Red Gate asking if she could help me to achieve the most out of my trial time so it wouldn’t go to waste. After replying to her and explaining that I was looking to review its feature set, she put me in contact with someone that setup a demo session to give me a quick rundown of its features via an online meeting. After having dealt with a massive ordeal with one of my utility companies and their complete lack of customer service, Red Gates friendly and helpful representatives were a breath of fresh air, and something I was thankful for. ANTS CLR Profiler The ANTS CLR profiler is the thing I want to focus on the most in this post, so I am going to dive right in now. Install was simple and took no time at all. It installed both the profiler for the CLR and Memory, but also visual studio extensions to facilitate the usage of the profilers (click any images for full size images): The Visual Studio menu options (under ANTS menu) Starting the CLR Performance Profiler from the start menu yields this window If you follow the instructions after launching the program from the start menu (Click File > New Profiling Session to start a new project), you are given a dialog with plenty of options for profiling: The New Session dialog. Lots of options. One thing I noticed is that the buttons in the lower right were half-covered by the panel of the application. If I had to guess, I would imagine that this is caused by my DPI settings being set to 125%. This is a problem I have seen in other applications as well that don’t scale well to different dpi scales. The profiler options give you the ability to profile: .NET Executable ASP.NET web application (hosted in IIS) ASP.NET web application (hosted in IIS express) ASP.NET web application (hosted in Cassini Web Development Server) SharePoint web application (hosted in IIS) Silverlight 4+ application Windows Service COM+ server XBAP (local XAML browser application) Attach to an already running .NET 4 process Choosing each option provides a varying set of other variables/options that one can set including options such as application arguments, operating path, record I/O performance performance counters to record (43 counters in all!), etc… All in all, they give you the ability to profile many different .Net project types, and make it simple to do so. In most cases of my using this application, I would be using the built in Visual Studio extensions, as they automatically start a new profiling project in ANTS with the options setup, and start your program, however RedGate has made it easy enough to profile outside of Visual Studio as well. On the flip side of this, as someone who lives most of their work life in Visual Studio, one thing I do wish is that instead of opening an entirely separate application/gui to perform profiling after launching, that instead they would provide a Visual Studio panel with the information, and integrate more of the profiling project information into Visual Studio. So, now that we have an idea of what options that the profiler gives us, its time to test its abilities and features. Horrendous Example Code – Prime Number Generator One of my interests besides development, is Physics and Math – what I went to college for. I have especially always been interested in prime numbers, as they are something of a mystery… So, I decided that I would go ahead and to test the abilities of the profiler, I would write a small program, website, and library to generate prime numbers in the quantity that you ask for. I am going to start off with some terrible code, and show how I would see the profiler being used as a development tool. First off, the IPrimes interface (all code is downloadable at the end of the post): interface IPrimes { IEnumerable<int> GetPrimes(int retrieve); } Simple enough, right? Anything that implements the interface will (hopefully) provide an IEnumerable of int, with the quantity specified in the parameter argument. Next, I am going to implement this interface in the most basic way: public class DumbPrimes : IPrimes { public IEnumerable<int> GetPrimes(int retrieve) { //store a list of primes already found var _foundPrimes = new List<int>() { 2, 3 }; //if i ask for 1 or two primes, return what asked for if (retrieve <= _foundPrimes.Count()) return _foundPrimes.Take(retrieve); //the next number to look at int _analyzing = 4; //since I already determined I don't have enough //execute at least once, and until quantity is sufficed do { //assume prime until otherwise determined bool isPrime = true; //start dividing at 2 //divide until number is reached, or determined not prime for (int i = 2; i < _analyzing && isPrime; i++) { //if (i) goes into _analyzing without a remainder, //_analyzing is NOT prime if (_analyzing % i == 0) isPrime = false; } //if it is prime, add to found list if (isPrime) _foundPrimes.Add(_analyzing); //increment number to analyze next _analyzing++; } while (_foundPrimes.Count() < retrieve); return _foundPrimes; } } This is the simplest way to get primes in my opinion. Checking each number by the straight definition of a prime – is it divisible by anything besides 1 and itself. I have included this code in a base class library for my solution, as I am going to use it to demonstrate a couple of features of ANTS. This class library is consumed by a simple non-MVVM WPF application, and a simple MVC4 website. I will not post the WPF code here inline, as it is simply an ObservableCollection<int>, a label, two textbox’s, and a button. Starting a new Profiling Session So, in Visual Studio, I have just completed my first stint developing the GUI and DumbPrimes IPrimes class, so now I want to check my codes efficiency by profiling it. All I have to do is build the solution (surprised initiating a profiling session doesn’t do this, but I suppose I can understand it), and then click the ANTS menu, followed by Profile Performance. I am then greeted by the profiler starting up and already monitoring my program live: You are provided with a realtime graph at the top, and a pane at the bottom giving you information on how to proceed. I am going to start by asking my program to show me the first 15000 primes: After the program finally began responding again (I did all the work on the main UI thread – how bad!), I stopped the profiler, which did kill the process of my program too. One important thing to note, is that the profiler by default wants to give you a lot of detail about the operation – line hit counts, time per line, percent time per line, etc… The important thing to remember is that this itself takes a lot of time. When running my program without the profiler attached, it can generate the 15000 primes in 5.18 seconds, compared to 74.5 seconds – almost a 1500 percent increase. While this may seem like a lot, remember that there is a trade off. It may be WAY more inefficient, however, I am able to drill down and make improvements to specific problem areas, and then decrease execution time all around. Analyzing the Profiling Session After clicking ‘Stop Profiling’, the process running my application stopped, and the entire execution time was automatically selected by ANTS, and the results shown below: Now there are a number of interesting things going on here, I am going to cover each in a section of its own: Real Time Performance Counter Bar (top of screen) At the top of the screen, is the real time performance bar. As your application is running, this will constantly update with the currently selected performance counters status. A couple of cool things to note are the fact that you can drag a selection around specific time periods to drill down the detail views in the lower 2 panels to information pertaining to only that period. After selecting a time period, you can bookmark a section and name it, so that it is easy to find later, or after reloaded at a later time. You can also zoom in, out, or fit the graph to the space provided – useful for drilling down. It may be hard to see, but at the top of the processor time graph below the time ticks, but above the red usage graph, there is a green bar. This bar shows at what times a method that is selected in the ‘Call tree’ panel is called. Very cool to be able to click on a method and see at what times it made an impact. As I said before, ANTS provides 43 different performance counters you can hook into. Click the arrow next to the Performance tab at the top will allow you to change between different counters if you have them selected: Method Call Tree, ADO.Net Database Calls, File IO – Detail Panel Red Gate really hit the mark here I think. When you select a section of the run with the graph, the call tree populates to fill a hierarchical tree of method calls, with information regarding each of the methods. By default, methods are hidden where the source is not provided (framework type code), however, Red Gate has integrated Reflector into ANTS, so even if you don’t have source for something, you can select a method and get the source if you want. Methods are also hidden where the impact is seen as insignificant – methods that are only executed for 1% of the time of the overall calling methods time; in other words, working on making them better is not where your efforts should be focused. – Smart! Source Panel – Detail Panel The source panel is where you can see line level information on your code, showing the code for the currently selected method from the Method Call Tree. If the code is not available, Reflector takes care of it and shows the code anyways! As you can notice, there does seem to be a problem with how ANTS determines what line is the actual line that a call is completed on. I have suspicions that this may be due to some of the inline code optimizations that the CLR applies upon compilation of the assembly. In a method with comments, the problem is much more severe: As you can see here, apparently the most offending code in my base library was a comment – *gasp*! Removing the comments does help quite a bit, however I hope that Red Gate works on their counter algorithm soon to improve the logic on positioning for statistics: I did a small test just to demonstrate the lines are correct without comments. For me, it isn’t a deal breaker, as I can usually determine the correct placements by looking at the application code in the region and determining what makes sense, but it is something that would probably build up some irritation with time. Feature – Suggest Method for Optimization A neat feature to really help those in need of a pointer, is the menu option under tools to automatically suggest methods to optimize/improve: Nice feature – clicking it filters the call tree and stars methods that it thinks are good candidates for optimization. I do wish that they would have made it more visible for those of use who aren’t great on sight: Process Integration I do think that this could have a place in my process. After experimenting with the profiler, I do think it would be a great benefit to do some development, testing, and then after all the bugs are worked out, use the profiler to check on things to make sure nothing seems like it is hogging more than its fair share. For example, with this program, I would have developed it, ran it, tested it – it works, but slowly. After looking at the profiler, and seeing the massive amount of time spent in 1 method, I might go ahead and try to re-implement IPrimes (I actually would probably rewrite the offending code, but so that I can distribute both sets of code easily, I’m just going to make another implementation of IPrimes). Using two pieces of knowledge about prime numbers can make this method MUCH more efficient – prime numbers fall into two buckets 6k+/-1 , and a number is prime if it is not divisible by any other primes before it: public class SmartPrimes : IPrimes { public IEnumerable<int> GetPrimes(int retrieve) { //store a list of primes already found var _foundPrimes = new List<int>() { 2, 3 }; //if i ask for 1 or two primes, return what asked for if (retrieve <= _foundPrimes.Count()) return _foundPrimes.Take(retrieve); //the next number to look at int _k = 1; //since I already determined I don't have enough //execute at least once, and until quantity is sufficed do { //assume prime until otherwise determined bool isPrime = true; int potentialPrime; //analyze 6k-1 //assign the value to potential potentialPrime = 6 * _k - 1; //if there are any primes that divise this, it is NOT a prime number //using PLINQ for quick boost isPrime = !_foundPrimes.AsParallel() .Any(prime => potentialPrime % prime == 0); //if it is prime, add to found list if (isPrime) _foundPrimes.Add(potentialPrime); if (_foundPrimes.Count() == retrieve) break; //analyze 6k+1 //assign the value to potential potentialPrime = 6 * _k + 1; //if there are any primes that divise this, it is NOT a prime number //using PLINQ for quick boost isPrime = !_foundPrimes.AsParallel() .Any(prime => potentialPrime % prime == 0); //if it is prime, add to found list if (isPrime) _foundPrimes.Add(potentialPrime); //increment k to analyze next _k++; } while (_foundPrimes.Count() < retrieve); return _foundPrimes; } } Now there are definitely more things I can do to help make this more efficient, but for the scope of this example, I think this is fine (but still hideous)! Profiling this now yields a happy surprise 27 seconds to generate the 15000 primes with the profiler attached, and only 1.43 seconds without. One important thing I wanted to call out though was the performance graph now: Notice anything odd? The %Processor time is above 100%. This is because there is now more than 1 core in the operation. A better label for the chart in my mind would have been %Core time, but to each their own. Another odd thing I noticed was that the profiler seemed to be spot on this time in my DumbPrimes class with line details in source, even with comments.. Odd. Profiling Web Applications The last thing that I wanted to cover, that means a lot to me as a web developer, is the great amount of work that Red Gate put into the profiler when profiling web applications. In my solution, I have a simple MVC4 application setup with 1 page, a single input form, that will output prime values as my WPF app did. Launching the profiler from Visual Studio as before, nothing is really different in the profiler window, however I did receive a UAC prompt for a Red Gate helper app to integrate with the web server without notification. After requesting 500, 1000, 2000, and 5000 primes, and looking at the profiler session, things are slightly different from before: As you can see, there are 4 spikes of activity in the processor time graph, but there is also something new in the call tree: That’s right – ANTS will actually group method calls by get/post operations, so it is easier to find out what action/page is giving the largest problems… Pretty cool in my mind! Overview Overall, I think that Red Gate ANTS CLR Profiler has a lot to offer, however I think it also has a long ways to go. 3 Biggest Pros: Ability to easily drill down from time graph, to method calls, to source code Wide variety of counters to choose from when profiling your application Excellent integration/grouping of methods being called from web applications by request – BRILLIANT! 3 Biggest Cons: Issue regarding line details in source view Nit pick – Processor time vs. Core time Nit pick – Lack of full integration with Visual Studio Ratings Ease of Use (7/10) – I marked down here because of the problems with the line level details and the extra work that that entails, and the lack of better integration with Visual Studio. Effectiveness (10/10) – I believe that the profiler does EXACTLY what it purports to do. Especially with its large variety of performance counters, a definite plus! Features (9/10) – Besides the real time performance monitoring, and the drill downs that I’ve shown here, ANTS also has great integration with ADO.Net, with the ability to show database queries run by your application in the profiler. This, with the line level details, the web request grouping, reflector integration, and various options to customize your profiling session I think create a great set of features! Customer Service (10/10) – My entire experience with Red Gate personnel has been nothing but good. their people are friendly, helpful, and happy! UI / UX (8/10) – The interface is very easy to get around, and all of the options are easy to find. With a little bit of poking around, you’ll be optimizing Hello World in no time flat! Overall (8/10) – Overall, I am happy with the Performance Profiler and its features, as well as with the service I received when working with the Red Gate personnel. I WOULD recommend you trying the application and seeing if it would fit into your process, BUT, remember there are still some kinks in it to hopefully be worked out. My next post will definitely be shorter (hopefully), but thank you for reading up to here, or skipping ahead! Please, if you do try the product, drop me a message and let me know what you think! I would love to hear any opinions you may have on the product. Code Feel free to download the code I used above – download via DropBox

Read the article

Informed TDD – Kata “To Roman Numerals”

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons. PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

Read the article

Linq-to-XML query to select specific sub-element based on additional criteria

- by BrianLy

My current LINQ query and example XML are below. What I'd like to do is select the primary email address from the email-addresses element into the User.Email property. The type element under the email-address element is set to primary when this is true. There may be more than one element under the email-addresses but only one will be marked primary. What is the simplest approach to take here? Current Linq Query (User.Email is currently empty): var users = from response in xdoc.Descendants("response") where response.Element("id") != null select new User { Id = (string)response.Element("id"), Name = (string)response.Element("full-name"), Email = (string)response.Element("email-addresses"), JobTitle = (string)response.Element("job-title"), NetworkId = (string)response.Element("network-id"), Type = (string)response.Element("type") }; Example XML: <?xml version="1.0" encoding="UTF-8"?> <response> <response> <contact> <phone-numbers/> <im> <provider></provider> <username></username> </im> <email-addresses> <email-address> <type>primary</type> <address>[email protected]</address> </email-address> </email-addresses> </contact> <job-title>Account Manager</job-title> <type>user</type> <expertise nil="true"></expertise> <summary nil="true"></summary> <kids-names nil="true"></kids-names> <location nil="true"></location> <guid nil="true"></guid> <timezone>Eastern Time (US & Canada)</timezone> <network-name>Domain</network-name> <full-name>Alice</full-name> <network-id>79629</network-id> <stats> <followers>2</followers> <updates>4</updates> <following>3</following> </stats> <mugshot-url> https://assets3.yammer.com/images/no_photo_small.gif</mugshot-url> <previous-companies/> <birth-date></birth-date> <name>alice</name> <web-url>https://www.yammer.com/domain.com/users/alice</web-url> <interests nil="true"></interests> <state>active</state> <external-urls/> <url>https://www.yammer.com/api/v1/users/1089943</url> <network-domains> <network-domain>domain.com</network-domain> </network-domains> <id>1089943</id> <schools/> <hire-date nil="true"></hire-date> <significant-other nil="true"></significant-other> </response> <response> <contact> <phone-numbers/> <im> <provider></provider> <username></username> </im> <email-addresses> <email-address> <type>primary</type> <address>[email protected]</address> </email-address> </email-addresses> </contact> <job-title>Office Manager</job-title> <type>user</type> <expertise nil="true"></expertise> <summary nil="true"></summary> <kids-names nil="true"></kids-names> <location nil="true"></location> <guid nil="true"></guid> <timezone>Eastern Time (US & Canada)</timezone> <network-name>Domain</network-name> <full-name>Bill</full-name> <network-id>79629</network-id> <stats> <followers>3</followers> <updates>1</updates> <following>1</following> </stats> <mugshot-url> https://assets3.yammer.com/images/no_photo_small.gif</mugshot-url> <previous-companies/> <birth-date></birth-date> <name>bill</name> <web-url>https://www.yammer.com/domain.com/users/bill</web-url> <interests nil="true"></interests> <state>active</state> <external-urls/> <url>https://www.yammer.com/api/v1/users/1089920</url> <network-domains> <network-domain>domain.com</network-domain> </network-domains> <id>1089920</id> <schools/> <hire-date nil="true"></hire-date> <significant-other nil="true"></significant-other> </response> </response>

Read the article

makefile pathing issues on OSX

- by Justin808

OK, I thought I would try one last update and see if it gets me anywhere. I've created a very small test case. This should not build anything, it just tests the path settings. Also I've setup the path so there are no spaces. The is the smallest, simplest test case I could come up with. This makefile will set the path, echo the path, run avr-gcc -v with the full path specified and then try to run it without the full path specified. It should find avr-gcc in the path on the second try, but does not. makefile TOOLCHAIN := /Users/justinzaun/Desktop/AVRBuilder.app/Contents/Resources/avrchain PATH := ${TOOLCHAIN}/bin:${PATH} export PATH all: @echo ${PATH} @echo -------- "${TOOLCHAIN}/bin/avr-gcc" -v @echo -------- avr-gcc -v output JUSTINs-MacBook-Air:Untitled justinzaun$ make /Users/justinzaun/Desktop/AVRBuilder.app/Contents/Resources/avrchain/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin -------- "/Users/justinzaun/Desktop/AVRBuilder.app/Contents/Resources/avrchain/bin/avr-gcc" -v Using built-in specs. COLLECT_GCC=/Users/justinzaun/Desktop/AVRBuilder.app/Contents/Resources/avrchain/bin/avr-gcc COLLECT_LTO_WRAPPER=/Users/justinzaun/Desktop/AVRBuilder.app/Contents/Resources/avrchain/bin/../libexec/gcc/avr/4.6.3/lto-wrapper Target: avr Configured with: /Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../gcc/configure --prefix=/Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../build/ --exec-prefix=/Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../build/ --datadir=/Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../build/ --target=avr --enable-languages=c,objc,c++ --disable-libssp --disable-lto --disable-nls --disable-libgomp --disable-gdbtk --disable-threads --enable-poison-system-directories Thread model: single gcc version 4.6.3 (GCC) -------- avr-gcc -v make: avr-gcc: No such file or directory make: *** [all] Error 1 JUSTINs-MacBook-Air:Untitled justinzaun$ Original Question I'm trying to set the path from within the makefile. I can't seem to do this on OSX. Setting the path with PATH := /new/bin/:$(PATH) does not work. See my makefile below. makefile PROJECTNAME = Untitled # Name of target controller # (e.g. 'at90s8515', see the available avr-gcc mmcu # options for possible values) MCU = atmega640 # id to use with programmer # default: PROGRAMMER_MCU=$(MCU) # In case the programer used, e.g avrdude, doesn't # accept the same MCU name as avr-gcc (for example # for ATmega8s, avr-gcc expects 'atmega8' and # avrdude requires 'm8') PROGRAMMER_MCU = $(MCU) # Source files # List C/C++/Assembly source files: # (list all files to compile, e.g. 'a.c b.cpp as.S'): # Use .cc, .cpp or .C suffix for C++ files, use .S # (NOT .s !!!) for assembly source code files. PRJSRC = main.c \ utils.c # additional includes (e.g. -I/path/to/mydir) INC = # libraries to link in (e.g. -lmylib) LIBS = # Optimization level, # use s (size opt), 1, 2, 3 or 0 (off) OPTLEVEL = s ### You should not have to touch anything below this line ### PATH := /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/bin:/usr/bin:/bin:$(PATH) CPATH := /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/include # HEXFORMAT -- format for .hex file output HEXFORMAT = ihex # compiler CFLAGS = -I. $(INC) -g -mmcu=$(MCU) -O$(OPTLEVEL) \ -fpack-struct -fshort-enums \ -funsigned-bitfields -funsigned-char \ -Wall -Wstrict-prototypes \ -Wa,-ahlms=$(firstword \ $(filter %.lst, $(<:.c=.lst))) # c++ specific flags CPPFLAGS = -fno-exceptions \ -Wa,-ahlms=$(firstword \ $(filter %.lst, $(<:.cpp=.lst)) \ $(filter %.lst, $(<:.cc=.lst)) \ $(filter %.lst, $(<:.C=.lst))) # assembler ASMFLAGS = -I. $(INC) -mmcu=$(MCU) \ -x assembler-with-cpp \ -Wa,-gstabs,-ahlms=$(firstword \ $(<:.S=.lst) $(<.s=.lst)) # linker LDFLAGS = -Wl,-Map,$(TRG).map -mmcu=$(MCU) \ -lm $(LIBS) ##### executables #### CC=avr-gcc OBJCOPY=avr-objcopy OBJDUMP=avr-objdump SIZE=avr-size AVRDUDE=avrdude REMOVE=rm -f ##### automatic target names #### TRG=$(PROJECTNAME).out DUMPTRG=$(PROJECTNAME).s HEXROMTRG=$(PROJECTNAME).hex HEXTRG=$(HEXROMTRG) $(PROJECTNAME).ee.hex # Start by splitting source files by type # C++ CPPFILES=$(filter %.cpp, $(PRJSRC)) CCFILES=$(filter %.cc, $(PRJSRC)) BIGCFILES=$(filter %.C, $(PRJSRC)) # C CFILES=$(filter %.c, $(PRJSRC)) # Assembly ASMFILES=$(filter %.S, $(PRJSRC)) # List all object files we need to create OBJDEPS=$(CFILES:.c=.o) \ $(CPPFILES:.cpp=.o) \ $(BIGCFILES:.C=.o) \ $(CCFILES:.cc=.o) \ $(ASMFILES:.S=.o) # Define all lst files. LST=$(filter %.lst, $(OBJDEPS:.o=.lst)) # All the possible generated assembly # files (.s files) GENASMFILES=$(filter %.s, $(OBJDEPS:.o=.s)) .SUFFIXES : .c .cc .cpp .C .o .out .s .S \ .hex .ee.hex .h .hh .hpp # Make targets: # all, disasm, stats, hex, writeflash/install, clean all: $(TRG) $(TRG): $(OBJDEPS) $(CC) $(LDFLAGS) -o $(TRG) $(OBJDEPS) #### Generating assembly #### # asm from C %.s: %.c $(CC) -S $(CFLAGS) $< -o $@ # asm from (hand coded) asm %.s: %.S $(CC) -S $(ASMFLAGS) $< > $@ # asm from C++ .cpp.s .cc.s .C.s : $(CC) -S $(CFLAGS) $(CPPFLAGS) $< -o $@ #### Generating object files #### # object from C .c.o: $(CC) $(CFLAGS) -c $< -o $@ # object from C++ (.cc, .cpp, .C files) .cc.o .cpp.o .C.o : $(CC) $(CFLAGS) $(CPPFLAGS) -c $< -o $@ # object from asm .S.o : $(CC) $(ASMFLAGS) -c $< -o $@ #### Generating hex files #### # hex files from elf .out.hex: $(OBJCOPY) -j .text \ -j .data \ -O $(HEXFORMAT) $< $@ .out.ee.hex: $(OBJCOPY) -j .eeprom \ --change-section-lma .eeprom=0 \ -O $(HEXFORMAT) $< $@ #### Information #### info: @echo PATH: @echo "$(PATH)" $(CC) -v which $(CC) #### Cleanup #### clean: $(REMOVE) $(TRG) $(TRG).map $(DUMPTRG) $(REMOVE) $(OBJDEPS) $(REMOVE) $(LST) $(REMOVE) $(GENASMFILES) $(REMOVE) $(HEXTRG) error JUSTINs-MacBook-Air:Untitled justinzaun$ make avr-gcc -I. -g -mmcu=atmega640 -Os -fpack-struct -fshort-enums -funsigned-bitfields -funsigned-char -Wall -Wstrict-prototypes -Wa,-ahlms=main.lst -c main.c -o main.o make: avr-gcc: No such file or directory make: *** [main.o] Error 1 JUSTINs-MacBook-Air:Untitled justinzaun$ If I change my CC= to include the full path: CC=/Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/bin/avr-gcc then it finds it, but this doesn't seem the correct way to do things. For instance its trying to use the system as not the one in the correct path. update - Just to be sure, I'm adding the output of my ls command too so everyone knows the file exist. Also I've added a make info target to the makefile and showing that output as well. JUSTINs-MacBook-Air:Untitled justinzaun$ ls /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/bin ar avr-elfedit avr-man avr-strip objcopy as avr-g++ avr-nm avrdude objdump avr-addr2line avr-gcc avr-objcopy c++ ranlib avr-ar avr-gcc-4.6.3 avr-objdump g++ strip avr-as avr-gcov avr-ranlib gcc avr-c++ avr-gprof avr-readelf ld avr-c++filt avr-ld avr-size ld.bfd avr-cpp avr-ld.bfd avr-strings nm JUSTINs-MacBook-Air:Untitled justinzaun$ Output of make info with the \ in my path JUSTINs-MacBook-Air:Untitled justinzaun$ make info PATH: /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin avr-gcc -v make: avr-gcc: No such file or directory make: *** [info] Error 1 JUSTINs-MacBook-Air:Untitled justinzaun$ Output of make info with the \ not in my path JUSTINs-MacBook-Air:Untitled justinzaun$ make info PATH: /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR Builder.app/Contents/Resources/avrchain/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin avr-gcc -v make: avr-gcc: No such file or directory make: *** [info] Error 1 JUSTINs-MacBook-Air:Untitled justinzaun$ update - When I have my CC set to include the full path as described above, this is the result of make info. JUSTINs-MacBook-Air:Untitled justinzaun$ make info PATH: /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR Builder.app/Contents/Resources/avrchain/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/bin/avr-gcc -v Using built-in specs. COLLECT_GCC=/Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR Builder.app/Contents/Resources/avrchain/bin/avr-gcc COLLECT_LTO_WRAPPER=/Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR Builder.app/Contents/Resources/avrchain/bin/../libexec/gcc/avr/4.6.3/lto-wrapper Target: avr Configured with: /Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../gcc/configure --prefix=/Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../build/ --exec-prefix=/Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../build/ --datadir=/Users/justinzaun/Development/AVRBuilder/Packages/gccobj/../build/ --target=avr --enable-languages=c,objc,c++ --disable-libssp --disable-lto --disable-nls --disable-libgomp --disable-gdbtk --disable-threads --enable-poison-system-directories Thread model: single gcc version 4.6.3 (GCC) which /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR\ Builder.app/Contents/Resources/avrchain/bin/avr-gcc /Users/justinzaun/Library/Developer/Xcode/DerivedData/AVR_Builder-gxiykwiwjywvoagykxvmotvncbyd/Build/Products/Debug/AVR Builder.app/Contents/Resources/avrchain/bin/avr-gcc JUSTINs-MacBook-Air:Untitled justinzaun$

Read the article

Help with Java Program for Prime numbers

- by Ben

Hello everyone, I was wondering if you can help me with this program. I have been struggling with it for hours and have just trashed my code because the TA doesn't like how I executed it. I am completely hopeless and if anyone can help me out step by step, I would greatly appreciate it. In this project you will write a Java program that reads a positive integer n from standard input, then prints out the first n prime numbers. We say that an integer m is divisible by a non-zero integer d if there exists an integer k such that m = k d , i.e. if d divides evenly into m. Equivalently, m is divisible by d if the remainder of m upon (integer) division by d is zero. We would also express this by saying that d is a divisor of m. A positive integer p is called prime if its only positive divisors are 1 and p. The one exception to this rule is the number 1 itself, which is considered to be non-prime. A positive integer that is not prime is called composite. Euclid showed that there are infinitely many prime numbers. The prime and composite sequences begin as follows: Primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, … Composites: 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, … There are many ways to test a number for primality, but perhaps the simplest is to simply do trial divisions. Begin by dividing m by 2, and if it divides evenly, then m is not prime. Otherwise, divide by 3, then 4, then 5, etc. If at any point m is found to be divisible by a number d in the range 2 d m-1, then halt, and conclude that m is composite. Otherwise, conclude that m is prime. A moment’s thought shows that one need not do any trial divisions by numbers d which are themselves composite. For instance, if a trial division by 2 fails (i.e. has non-zero remainder, so m is odd), then a trial division by 4, 6, or 8, or any even number, must also fail. Thus to test a number m for primality, one need only do trial divisions by prime numbers less than m. Furthermore, it is not necessary to go all the way up to m-1. One need only do trial divisions of m by primes p in the range 2 p m . To see this, suppose m 1 is composite. Then there exist positive integers a and b such that 1 < a < m, 1 < b < m, and m = ab . But if both a m and b m , then ab m, contradicting that m = ab . Hence one of a or b must be less than or equal to m . To implement this process in java you will write a function called isPrime() with the following signature: static boolean isPrime(int m, int[] P) This function will return true or false according to whether m is prime or composite. The array argument P will contain a sufficient number of primes to do the testing. Specifically, at the time isPrime() is called, array P must contain (at least) all primes p in the range 2 p m . For instance, to test m = 53 for primality, one must do successive trial divisions by 2, 3, 5, and 7. We go no further since 11 53 . Thus a precondition for the function call isPrime(53, P) is that P[0] = 2 , P[1] = 3 , P[2] = 5, and P[3] = 7 . The return value in this case would be true since all these divisions fail. Similarly to test m =143 , one must do trial divisions by 2, 3, 5, 7, and 11 (since 13 143 ). The precondition for the function call isPrime(143, P) is therefore P[0] = 2 , P[1] = 3 , P[2] = 5, P[3] = 7 , and P[4] =11. The return value in this case would be false since 11 divides 143. Function isPrime() should contain a loop that steps through array P, doing trial divisions. This loop should terminate when 2 either a trial division succeeds, in which case false is returned, or until the next prime in P is greater than m , in which case true is returned. Function main() in this project will read the command line argument n, allocate an int array of length n, fill the array with primes, then print the contents of the array to stdout according to the format described below. In the context of function main(), we will refer to this array as Primes[]. Thus array Primes[] plays a dual role in this project. On the one hand, it is used to collect, store, and print the output data. On the other hand, it is passed to function isPrime() to test new integers for primality. Whenever isPrime() returns true, the newly discovered prime will be placed at the appropriate position in array Primes[]. This process works since, as explained above, the primes needed to test an integer m range only up to m , and all of these primes (and more) will already be stored in array Primes[] when m is tested. Of course it will be necessary to initialize Primes[0] = 2 manually, then proceed to test 3, 4, … for primality using function isPrime(). The following is an outline of the steps to be performed in function main(). • Check that the user supplied exactly one command line argument which can be interpreted as a positive integer n. If the command line argument is not a single positive integer, your program will print a usage message as specified in the examples below, then exit. • Allocate array Primes[] of length n and initialize Primes[0] = 2 . • Enter a loop which will discover subsequent primes and store them as Primes[1] , Primes[2], Primes[3] , ……, Primes[n -1] . This loop should contain an inner loop which walks through successive integers and tests them for primality by calling function isPrime() with appropriate arguments. • Print the contents of array Primes[] to stdout, 10 to a line separated by single spaces. In other words Primes[0] through Primes[9] will go on line 1, Primes[10] though Primes[19] will go on line 2, and so on. Note that if n is not a multiple of 10, then the last line of output will contain fewer than 10 primes. Your program, which will be called Prime.java, will produce output identical to that of the sample runs below. (As usual % signifies the unix prompt.) % java Prime Usage: java Prime [PositiveInteger] % java Prime xyz Usage: java Prime [PositiveInteger] % java Prime 10 20 Usage: java Prime [PositiveInteger] % java Prime 75 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 % 3 As you can see, inappropriate command line argument(s) generate a usage message which is similar to that of many unix commands. (Try doing the more command with no arguments to see such a message.) Your program will include a function called Usage() having signature static void Usage() that prints this message to stderr, then exits. Thus your program will contain three functions in all: main(), isPrime(), and Usage(). Each should be preceded by a comment block giving it’s name, a short description of it’s operation, and any necessary preconditions (such as those for isPrime().) See examples on the webpage.

Read the article

Will these optimizations to my Ruby implementation of diff improve performance in a Rails app?

- by grg-n-sox

<tl;dr> In source version control diff patch generation, would it be worth it to use the optimizations listed at the very bottom of this writing (see <optimizations>) in my Ruby implementation of diff for making diff patches? </tl;dr> <introduction> I am programming something I have never done before and there might already be tools out there to do the exact thing I am programming but at this point I am having too much fun to care so I am still going to do it from scratch, even if there is a tool for this. So anyways, I am working on a Ruby on Rails app and need a certain feature. Basically I want each entry in a table of mine, let's say for example a table of video games, to have a stored chunk of text that represents a review or something of the sort for that table entry. However, I want this text to be both editable by any registered user and also keep track of different submissions in a version control system. The simplest solution I could think of is just implement a solution that keeps track of the text body and the diff patch history of different versions of the text body as objects in Ruby and then serialize it, preferably in human readable form (so I'll most likely use YAML for this) for editing if needed due to corruption by a software bug or a mistake is made by an admin doing some version editing. So at first I just tried to dive in head first into this feature to find that the problem of generating a diff patch is more difficult that I thought to do efficiently. So I did some research and came across some ideas. Some I have implemented already and some I have not. However, it all pretty much revolves around the longest common subsequence problem, as you would already know if you have already done anything with diff or diff-like features, and optimization the function that solves it. Currently I have it so it truncates the compared versions of the text body from the beginning and end until non-matching lines are found. Then it solves the problem using a comparison matrix, but instead of incrementing the value stored in a cell when it finds a matching line like in most longest common subsequence algorithms I have seen examples of, I increment when I have a non-matching line so as to calculate edit distance instead of longest common subsequence. Although as far as I can tell between the two approaches, they are essentially two sides of the same coin so either could be used to derive an answer. It then back-traces through the comparison matrix and notes when there was an incrementation and in which adjacent cell (West, Northwest, or North) to determine that line's diff entry and assumes all other lines to be unchanged. Normally I would leave it at that, but since this is going into a Rails environment and not just some stand-alone Ruby script, I started getting worried about needing to optimize at least enough so if a spammer that somehow knew how I implemented the version control system and knew my worst case scenario entry still wouldn't be able to hit the server that bad. After some searching and reading of research papers and articles through the internet, I've come across several that seem decent but all seem to have pros and cons and I am having a hard time deciding how well in this situation that the pros and cons balance out. So are the ones listed here worth it? I have listed them with known pros and cons. </introduction> <optimizations> Chop the compared sequences into multiple chucks of subsequences by splitting where lines are unchanged, and then truncating each section of unchanged lines at the beginning and end of each section. Then solve the edit distance of each subsequence. Pro: Changes the time increase as the changed area gets bigger from a quadratic increase to something more similar to a linear increase. Con: Figuring out where to split already seems like you have to solve edit distance except now you don't care how it is changed. Would be fine if this was solvable by a process closer to solving hamming distance but a single insertion would throw this off. Use a cryptographic hash function to both convert all sequence elements into integers and ensure uniqueness. Then solve the edit distance comparing the hash integers instead of the sequence elements themselves. Pro: The operation of comparing two integers is faster than the operation of comparing two strings, so a slight performance gain is received after every comparison, which can be a lot overall. Con: Using a cryptographic hash function takes time to convert all the sequence elements and may end up costing more time to do the conversion that you gain back from the integer comparisons. You could use the built in hash function for a string but that will not guarantee uniqueness. Use lazy evaluation to only calculate the three center-most diagonals of the comparison matrix and then only calculate additional diagonals as needed. And then also use this approach to possibly remove the need on some comparisons to compare all three adjacent cells as desribed here. Pro: Can turn an algorithm that always takes O(n * m) time and make it so only worst case scenario is that time, best case becomes practically linear, and average case is somewhere between the two. Con: It is an algorithm I've only seen implemented in functional programming languages and I am having a difficult time comprehending how to convert this into Ruby based on how it is described at the site linked to above. Make a C module and do the hard work at the native level in C and just make a Ruby wrapper for it so Ruby can make all the calls to it that it needs. Pro: I have to imagine that evaluating something like this in could be a LOT faster. Con: I have no idea how Rails handles apps with ruby code that has C extensions and it hurts the portability of the app. This is an optimization for after the solving of edit distance, but idea is to store additional combined diffs with the ones produced by each version to make a delta-tree data structure with the most recently made diff as the root node of the tree so getting to any version takes worst case time of O(log n) instead of O(n). Pro: Would make going back to an old version a lot faster. Con: It would mean every new commit, the delta-tree would get a new root node that will cost time to reorganize the delta-tree for an operation that will be carried out a lot more often than going back a version, not to mention the unlikelihood it will be an old version. </optimizations> So are these things worth the effort?

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

MySQL Syslog Audit Plugin

- by jonathonc

This post shows the construction process of the Syslog Audit plugin that was presented at MySQL Connect 2012. It is based on an environment that has the appropriate development tools enabled including gcc,g++ and cmake. It also assumes you have downloaded the MySQL source code (5.5.16 or higher) and have compiled and installed the system into the /usr/local/mysql directory ready for use. The information provided below is designed to show the different components that make up a plugin, and specifically an audit type plugin, and how it comes together to be used within the MySQL service. The MySQL Reference Manual contains information regarding the plugin API and how it can be used, so please refer there for more detailed information. The code in this post is designed to give the simplest information necessary, so handling every return code, managing race conditions etc is not part of this example code. Let's start by looking at the most basic implementation of our plugin code as seen below: /* Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved. Author: Jonathon Coombes Licence: GPL Description: An auditing plugin that logs to syslog and can adjust the loglevel via the system variables. */ #include <stdio.h> #include <string.h> #include <mysql/plugin_audit.h> #include <syslog.h> There is a commented header detailing copyright/licencing and meta-data information and then the include headers. The two important include statements for our plugin are the syslog.h plugin, which gives us the structures for syslog, and the plugin_audit.h include which has details regarding the audit specific plugin api. Note that we do not need to include the general plugin header plugin.h, as this is done within the plugin_audit.h file already. To implement our plugin within the current implementation we need to add it into our source code and compile. > cd /usr/local/src/mysql-5.5.28/plugin > mkdir audit_syslog > cd audit_syslog A simple CMakeLists.txt file is created to manage the plugin compilation: MYSQL_ADD_PLUGIN(audit_syslog audit_syslog.cc MODULE_ONLY) Run the cmake command at the top level of the source and then you can compile the plugin using the 'make' command. This results in a compiled audit_syslog.so library, but currently it is not much use to MySQL as there is no level of api defined to communicate with the MySQL service. Now we need to define the general plugin structure that enables MySQL to recognise the library as a plugin and be able to install/uninstall it and have it show up in the system. The structure is defined in the plugin.h file in the MySQL source code. /* Plugin library descriptor */ mysql_declare_plugin(audit_syslog) { MYSQL_AUDIT_PLUGIN, /* plugin type */ &audit_syslog_descriptor, /* descriptor handle */ "audit_syslog", /* plugin name */ "Author Name", /* author */ "Simple Syslog Audit", /* description */ PLUGIN_LICENSE_GPL, /* licence */ audit_syslog_init, /* init function */ audit_syslog_deinit, /* deinit function */ 0x0001, /* plugin version */ NULL, /* status variables */ NULL, /* system variables */ NULL, /* no reserves */ 0, /* no flags */ } mysql_declare_plugin_end; The general plugin descriptor above is standard for all plugin types in MySQL. The plugin type is defined along with the init/deinit functions and interface methods into the system for sharing information, and various other metadata information. The descriptors have an internally recognised version number so that plugins can be matched against the api on the running server. The other details are usually related to the type-specific methods and structures to implement the plugin. Each plugin has a type-specific descriptor as well which details how the plugin is implemented for the specific purpose of that plugin type. /* Plugin type-specific descriptor */ static struct st_mysql_audit audit_syslog_descriptor= { MYSQL_AUDIT_INTERFACE_VERSION, /* interface version */ NULL, /* release_thd function */ audit_syslog_notify, /* notify function */ { (unsigned long) MYSQL_AUDIT_GENERAL_CLASSMASK | MYSQL_AUDIT_CONNECTION_CLASSMASK } /* class mask */ }; In this particular case, the release_thd function has not been defined as it is not required. The important method for auditing is the notify function which is activated when an event occurs on the system. The notify function is designed to activate on an event and the implementation will determine how it is handled. For the audit_syslog plugin, the use of the syslog feature sends all events to the syslog for recording. The class mask allows us to determine what type of events are being seen by the notify function. There are currently two major types of event: 1. General Events: This includes general logging, errors, status and result type events. This is the main one for tracking the queries and operations on the database. 2. Connection Events: This group is based around user logins. It monitors connections and disconnections, but also if somebody changes user while connected. With most audit plugins, the principle behind the plugin is to track changes to the system over time and counters can be an important part of this process. The next step is to define and initialise the counters that are used to track the events in the service. There are 3 counters defined in total for our plugin - the # of general events, the # of connection events and the total number of events. static volatile int total_number_of_calls; /* Count MYSQL_AUDIT_GENERAL_CLASS event instances */ static volatile int number_of_calls_general; /* Count MYSQL_AUDIT_CONNECTION_CLASS event instances */ static volatile int number_of_calls_connection; The init and deinit functions for the plugin are there to be called when the plugin is activated and when it is terminated. These offer the best option to initialise the counters for our plugin: /* Initialize the plugin at server start or plugin installation. */ static int audit_syslog_init(void *arg __attribute__((unused))) { openlog("mysql_audit:",LOG_PID|LOG_PERROR|LOG_CONS,LOG_USER); total_number_of_calls= 0; number_of_calls_general= 0; number_of_calls_connection= 0; return(0); } The init function does a call to openlog to initialise the syslog functionality. The parameters are the service to log under ("mysql_audit" in this case), the syslog flags and the facility for the logging. Then each of the counters are initialised to zero and a success is returned. If the init function is not defined, it will return success by default. /* Terminate the plugin at server shutdown or plugin deinstallation. */ static int audit_syslog_deinit(void *arg __attribute__((unused))) { closelog(); return(0); } The deinit function will simply close our syslog connection and return success. Note that the syslog functionality is part of the glibc libraries and does not require any external factors. The function names are what we define in the general plugin structure, so these have to match otherwise there will be errors. The next step is to implement the event notifier function that was defined in the type specific descriptor (audit_syslog_descriptor) which is audit_syslog_notify. /* Event notifier function */ static void audit_syslog_notify(MYSQL_THD thd __attribute__((unused)), unsigned int event_class, const void *event) { total_number_of_calls++; if (event_class == MYSQL_AUDIT_GENERAL_CLASS) { const struct mysql_event_general *event_general= (const struct mysql_event_general *) event; number_of_calls_general++; syslog(audit_loglevel,"%lu: User: %s Command: %s Query: %s\n", event_general->general_thread_id, event_general->general_user, event_general->general_command, event_general->general_query ); } else if (event_class == MYSQL_AUDIT_CONNECTION_CLASS) { const struct mysql_event_connection *event_connection= (const struct mysql_event_connection *) event; number_of_calls_connection++; syslog(audit_loglevel,"%lu: User: %s@%s[%s] Event: %d Status: %d\n", event_connection->thread_id, event_connection->user, event_connection->host, event_connection->ip, event_connection->event_subclass, event_connection->status ); } } In the case of an event, the notifier function is called. The first step is to increment the total number of events that have occurred in our database.The event argument is then cast into the appropriate event structure depending on the class type, of general event or connection event. The event type counters are incremented and details are sent via the syslog() function out to the system log. There are going to be different line formats and information returned since the general events have different data compared to the connection events, even though some of the details overlap, for example, user, thread id, host etc. On compiling the code now, there should be no errors and the resulting audit_syslog.so can be loaded into the server and ready to use. Log into the server and type: mysql> INSTALL PLUGIN audit_syslog SONAME 'audit_syslog.so'; This will install the plugin and will start updating the syslog immediately. Note that the audit plugin attaches to the immediate thread and cannot be uninstalled while that thread is active. This means that you cannot run the UNISTALL command until you log into a different connection (thread) on the server. Once the plugin is loaded, the system log will show output such as the following: Oct 8 15:33:21 machine mysql_audit:[8337]: 87: User: root[root] @ localhost [] Command: (null) Query: INSTALL PLUGIN audit_syslog SONAME 'audit_syslog.so' Oct 8 15:33:21 machine mysql_audit:[8337]: 87: User: root[root] @ localhost [] Command: Query Query: INSTALL PLUGIN audit_syslog SONAME 'audit_syslog.so' Oct 8 15:33:40 machine mysql_audit:[8337]: 87: User: root[root] @ localhost [] Command: (null) Query: show tables Oct 8 15:33:40 machine mysql_audit:[8337]: 87: User: root[root] @ localhost [] Command: Query Query: show tables Oct 8 15:33:43 machine mysql_audit:[8337]: 87: User: root[root] @ localhost [] Command: (null) Query: select * from t1 Oct 8 15:33:43 machine mysql_audit:[8337]: 87: User: root[root] @ localhost [] Command: Query Query: select * from t1 It appears that two of each event is being shown, but in actuality, these are two separate event types - the result event and the status event. This could be refined further by changing the audit_syslog_notify function to handle the different event sub-types in a different manner. So far, it seems that the logging is working with events showing up in the syslog output. The issue now is that the counters created earlier to track the number of events by type are not accessible when the plugin is being run. Instead there needs to be a way to expose the plugin specific information to the service and vice versa. This could be done via the information_schema plugin api, but for something as simple as counters, the obvious choice is the system status variables. This is done using the standard structure and the declaration: /* Plugin status variables for SHOW STATUS */ static struct st_mysql_show_var audit_syslog_status[]= { { "Audit_syslog_total_calls", (char *) &total_number_of_calls, SHOW_INT }, { "Audit_syslog_general_events", (char *) &number_of_calls_general, SHOW_INT }, { "Audit_syslog_connection_events", (char *) &number_of_calls_connection, SHOW_INT }, { 0, 0, SHOW_INT } }; The structure is simply the name that will be displaying in the mysql service, the address of the associated variables, and the data type being used for the counter. It is finished with a blank structure to show that there are no more variables. Remember that status variables may have the same name for variables from other plugin, so it is considered appropriate to add the plugin name at the start of the status variable name to avoid confusion. Looking at the status variables in the mysql client shows something like the following: mysql> show global status like "audit%"; +--------------------------------+-------+ | Variable_name | Value | +--------------------------------+-------+ | Audit_syslog_connection_events | 1 | | Audit_syslog_general_events | 2 | | Audit_syslog_total_calls | 3 | +--------------------------------+-------+ 3 rows in set (0.00 sec) The final connectivity piece for the plugin is to allow the interactive change of the logging level between the plugin and the system. This requires the ability to send changes via the mysql service through to the plugin. This is done using the system variables interface and defining a single variable to keep track of the active logging level for the facility. /* Plugin system variables for SHOW VARIABLES */ static MYSQL_SYSVAR_STR(loglevel, audit_loglevel, PLUGIN_VAR_RQCMDARG, "User can specify the log level for auditing", audit_loglevel_check, audit_loglevel_update, "LOG_NOTICE"); static struct st_mysql_sys_var* audit_syslog_sysvars[] = { MYSQL_SYSVAR(loglevel), NULL }; So now the system variable 'loglevel' is defined for the plugin and associated to the global variable 'audit_loglevel'. The check or validation function is defined to make sure that no garbage values are attempted in the update of the variable. The update function is used to save the new value to the variable. Note that the audit_syslog_sysvars structure is defined in the general plugin descriptor to associate the link between the plugin and the system and how much they interact. Next comes the implementation of the validation function and the update function for the system variable. It is worth noting that if you have a simple numeric such as integers for the variable types, the validate function is often not required as MySQL will handle the automatic check and validation of simple types. /* longest valid value */ #define MAX_LOGLEVEL_SIZE 100 /* hold the valid values */ static const char *possible_modes[]= { "LOG_ERROR", "LOG_WARNING", "LOG_NOTICE", NULL }; static int audit_loglevel_check( THD* thd, /*!< in: thread handle */ struct st_mysql_sys_var* var, /*!< in: pointer to system variable */ void* save, /*!< out: immediate result for update function */ struct st_mysql_value* value) /*!< in: incoming string */ { char buff[MAX_LOGLEVEL_SIZE]; const char *str; const char **found; int length; length= sizeof(buff); if (!(str= value->val_str(value, buff, &length))) return 1; /* We need to return a pointer to a locally allocated value in "save". Here we pick to search for the supplied value in an global array of constant strings and return a pointer to one of them. The other possiblity is to use the thd_alloc() function to allocate a thread local buffer instead of the global constants. */ for (found= possible_modes; *found; found++) { if (!strcmp(*found, str)) { *(const char**)save= *found; return 0; } } return 1; } The validation function is simply to take the value being passed in via the SET GLOBAL VARIABLE command and check if it is one of the pre-defined values allowed in our possible_values array. If it is found to be valid, then the value is assigned to the save variable ready for passing through to the update function. static void audit_loglevel_update( THD* thd, /*!< in: thread handle */ struct st_mysql_sys_var* var, /*!< in: system variable being altered */ void* var_ptr, /*!< out: pointer to dynamic variable */ const void* save) /*!< in: pointer to temporary storage */ { /* assign the new value so that the server can read it */ *(char **) var_ptr= *(char **) save; /* assign the new value to the internal variable */ audit_loglevel= *(char **) save; } Since all the validation has been done already, the update function is quite simple for this plugin. The first part is to update the system variable pointer so that the server can read the value. The second part is to update our own global plugin variable for tracking the value. Notice that the save variable is passed in as a void type to allow handling of various data types, so it must be cast to the appropriate data type when assigning it to the variables. Looking at how the latest changes affect the usage of the plugin and the interaction within the server shows: mysql> show global variables like "audit%"; +-----------------------+------------+ | Variable_name | Value | +-----------------------+------------+ | audit_syslog_loglevel | LOG_NOTICE | +-----------------------+------------+ 1 row in set (0.00 sec) mysql> set global audit_syslog_loglevel="LOG_ERROR"; Query OK, 0 rows affected (0.00 sec) mysql> show global status like "audit%"; +--------------------------------+-------+ | Variable_name | Value | +--------------------------------+-------+ | Audit_syslog_connection_events | 1 | | Audit_syslog_general_events | 11 | | Audit_syslog_total_calls | 12 | +--------------------------------+-------+ 3 rows in set (0.00 sec) mysql> show global variables like "audit%"; +-----------------------+-----------+ | Variable_name | Value | +-----------------------+-----------+ | audit_syslog_loglevel | LOG_ERROR | +-----------------------+-----------+ 1 row in set (0.00 sec) So now we have a plugin that will audit the events on the system and log the details to the system log. It allows for interaction to see the number of different events within the server details and provides a mechanism to change the logging level interactively via the standard system methods of the SET command. A more complex auditing plugin may have more detailed code, but each of the above areas is what will be involved and simply expanded on to add more functionality. With the above skeleton code, it is now possible to create your own audit plugins to implement your own auditing requirements. If, however, you are not of the coding persuasion, then you could always consider the option of the MySQL Enterprise Audit plugin that is available to purchase.

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

Zen and the Art of File and Folder Organization

- by Mark Virtue

Is your desk a paragon of neatness, or does it look like a paper-bomb has gone off? If you’ve been putting off getting organized because the task is too huge or daunting, or you don’t know where to start, we’ve got 40 tips to get you on the path to zen mastery of your filing system. For all those readers who would like to get their files and folders organized, or, if they’re already organized, better organized—we have compiled a complete guide to getting organized and staying organized, a comprehensive article that will hopefully cover every possible tip you could want. Signs that Your Computer is Poorly Organized If your computer is a mess, you’re probably already aware of it. But just in case you’re not, here are some tell-tale signs: Your Desktop has over 40 icons on it “My Documents” contains over 300 files and 60 folders, including MP3s and digital photos You use the Windows’ built-in search facility whenever you need to find a file You can’t find programs in the out-of-control list of programs in your Start Menu You save all your Word documents in one folder, all your spreadsheets in a second folder, etc Any given file that you’re looking for may be in any one of four different sets of folders But before we start, here are some quick notes: We’re going to assume you know what files and folders are, and how to create, save, rename, copy and delete them The organization principles described in this article apply equally to all computer systems. However, the screenshots here will reflect how things look on Windows (usually Windows 7). We will also mention some useful features of Windows that can help you get organized. Everyone has their own favorite methodology of organizing and filing, and it’s all too easy to get into “My Way is Better than Your Way” arguments. The reality is that there is no perfect way of getting things organized. When I wrote this article, I tried to keep a generalist and objective viewpoint. I consider myself to be unusually well organized (to the point of obsession, truth be told), and I’ve had 25 years experience in collecting and organizing files on computers. So I’ve got a lot to say on the subject. But the tips I have described here are only one way of doing it. Hopefully some of these tips will work for you too, but please don’t read this as any sort of “right” way to do it. At the end of the article we’ll be asking you, the reader, for your own organization tips. Why Bother Organizing At All? For some, the answer to this question is self-evident. And yet, in this era of powerful desktop search software (the search capabilities built into the Windows Vista and Windows 7 Start Menus, and third-party programs like Google Desktop Search), the question does need to be asked, and answered. I have a friend who puts every file he ever creates, receives or downloads into his My Documents folder and doesn’t bother filing them into subfolders at all. He relies on the search functionality built into his Windows operating system to help him find whatever he’s looking for. And he always finds it. He’s a Search Samurai. For him, filing is a waste of valuable time that could be spent enjoying life! It’s tempting to follow suit. On the face of it, why would anyone bother to take the time to organize their hard disk when such excellent search software is available? Well, if all you ever want to do with the files you own is to locate and open them individually (for listening, editing, etc), then there’s no reason to ever bother doing one scrap of organization. But consider these common tasks that are not achievable with desktop search software: Find files manually. Often it’s not convenient, speedy or even possible to utilize your desktop search software to find what you want. It doesn’t work 100% of the time, or you may not even have it installed. Sometimes its just plain faster to go straight to the file you want, if you know it’s in a particular sub-folder, rather than trawling through hundreds of search results. Find groups of similar files (e.g. all your “work” files, all the photos of your Europe holiday in 2008, all your music videos, all the MP3s from Dark Side of the Moon, all your letters you wrote to your wife, all your tax returns). Clever naming of the files will only get you so far. Sometimes it’s the date the file was created that’s important, other times it’s the file format, and other times it’s the purpose of the file. How do you name a collection of files so that they’re easy to isolate based on any of the above criteria? Short answer, you can’t. Move files to a new computer. It’s time to upgrade your computer. How do you quickly grab all the files that are important to you? Or you decide to have two computers now – one for home and one for work. How do you quickly isolate only the work-related files to move them to the work computer? Synchronize files to other computers. If you have more than one computer, and you need to mirror some of your files onto the other computer (e.g. your music collection), then you need a way to quickly determine which files are to be synced and which are not. Surely you don’t want to synchronize everything? Choose which files to back up. If your backup regime calls for multiple backups, or requires speedy backups, then you’ll need to be able to specify which files are to be backed up, and which are not. This is not possible if they’re all in the same folder. Finally, if you’re simply someone who takes pleasure in being organized, tidy and ordered (me! me!), then you don’t even need a reason. Being disorganized is simply unthinkable. Tips on Getting Organized Here we present our 40 best tips on how to get organized. Or, if you’re already organized, to get better organized. Tip #1. Choose Your Organization System Carefully The reason that most people are not organized is that it takes time. And the first thing that takes time is deciding upon a system of organization. This is always a matter of personal preference, and is not something that a geek on a website can tell you. You should always choose your own system, based on how your own brain is organized (which makes the assumption that your brain is, in fact, organized). We can’t instruct you, but we can make suggestions: You may want to start off with a system based on the users of the computer. i.e. “My Files”, “My Wife’s Files”, My Son’s Files”, etc. Inside “My Files”, you might then break it down into “Personal” and “Business”. You may then realize that there are overlaps. For example, everyone may want to share access to the music library, or the photos from the school play. So you may create another folder called “Family”, for the “common” files. You may decide that the highest-level breakdown of your files is based on the “source” of each file. In other words, who created the files. You could have “Files created by ME (business or personal)”, “Files created by people I know (family, friends, etc)”, and finally “Files created by the rest of the world (MP3 music files, downloaded or ripped movies or TV shows, software installation files, gorgeous desktop wallpaper images you’ve collected, etc).” This system happens to be the one I use myself. See below: Mark is for files created by meVC is for files created by my company (Virtual Creations)Others is for files created by my friends and familyData is the rest of the worldAlso, Settings is where I store the configuration files and other program data files for my installed software (more on this in tip #34, below). Each folder will present its own particular set of requirements for further sub-organization. For example, you may decide to organize your music collection into sub-folders based on the artist’s name, while your digital photos might get organized based on the date they were taken. It can be different for every sub-folder! Another strategy would be based on “currentness”. Files you have yet to open and look at live in one folder. Ones that have been looked at but not yet filed live in another place. Current, active projects live in yet another place. All other files (your “archive”, if you like) would live in a fourth folder. (And of course, within that last folder you’d need to create a further sub-system based on one of the previous bullet points). Put some thought into this – changing it when it proves incomplete can be a big hassle! Before you go to the trouble of implementing any system you come up with, examine a wide cross-section of the files you own and see if they will all be able to find a nice logical place to sit within your system. Tip #2. When You Decide on Your System, Stick to It! There’s nothing more pointless than going to all the trouble of creating a system and filing all your files, and then whenever you create, receive or download a new file, you simply dump it onto your Desktop. You need to be disciplined – forever! Every new file you get, spend those extra few seconds to file it where it belongs! Otherwise, in just a month or two, you’ll be worse off than before – half your files will be organized and half will be disorganized – and you won’t know which is which! Tip #3. Choose the Root Folder of Your Structure Carefully Every data file (document, photo, music file, etc) that you create, own or is important to you, no matter where it came from, should be found within one single folder, and that one single folder should be located at the root of your C: drive (as a sub-folder of C:\). In other words, do not base your folder structure in standard folders like “My Documents”. If you do, then you’re leaving it up to the operating system engineers to decide what folder structure is best for you. And every operating system has a different system! In Windows 7 your files are found in C:\Users\YourName, whilst on Windows XP it was C:\Documents and Settings\YourName\My Documents. In UNIX systems it’s often /home/YourName. These standard default folders tend to fill up with junk files and folders that are not at all important to you. “My Documents” is the worst offender. Every second piece of software you install, it seems, likes to create its own folder in the “My Documents” folder. These folders usually don’t fit within your organizational structure, so don’t use them! In fact, don’t even use the “My Documents” folder at all. Allow it to fill up with junk, and then simply ignore it. It sounds heretical, but: Don’t ever visit your “My Documents” folder! Remove your icons/links to “My Documents” and replace them with links to the folders you created and you care about! Create your own file system from scratch! Probably the best place to put it would be on your D: drive – if you have one. This way, all your files live on one drive, while all the operating system and software component files live on the C: drive – simply and elegantly separated. The benefits of that are profound. Not only are there obvious organizational benefits (see tip #10, below), but when it comes to migrate your data to a new computer, you can (sometimes) simply unplug your D: drive and plug it in as the D: drive of your new computer (this implies that the D: drive is actually a separate physical disk, and not a partition on the same disk as C:). You also get a slight speed improvement (again, only if your C: and D: drives are on separate physical disks). Warning: From tip #12, below, you will see that it’s actually a good idea to have exactly the same file system structure – including the drive it’s filed on – on all of the computers you own. So if you decide to use the D: drive as the storage system for your own files, make sure you are able to use the D: drive on all the computers you own. If you can’t ensure that, then you can still use a clever geeky trick to store your files on the D: drive, but still access them all via the C: drive (see tip #17, below). If you only have one hard disk (C:), then create a dedicated folder that will contain all your files – something like C:\Files. The name of the folder is not important, but make it a single, brief word. There are several reasons for this: When creating a backup regime, it’s easy to decide what files should be backed up – they’re all in the one folder! If you ever decide to trade in your computer for a new one, you know exactly which files to migrate You will always know where to begin a search for any file If you synchronize files with other computers, it makes your synchronization routines very simple. It also causes all your shortcuts to continue to work on the other machines (more about this in tip #24, below). Once you’ve decided where your files should go, then put all your files in there – Everything! Completely disregard the standard, default folders that are created for you by the operating system (“My Music”, “My Pictures”, etc). In fact, you can actually relocate many of those folders into your own structure (more about that below, in tip #6). The more completely you get all your data files (documents, photos, music, etc) and all your configuration settings into that one folder, then the easier it will be to perform all of the above tasks. Once this has been done, and all your files live in one folder, all the other folders in C:\ can be thought of as “operating system” folders, and therefore of little day-to-day interest for us. Here’s a screenshot of a nicely organized C: drive, where all user files are located within the \Files folder: Tip #4. Use Sub-Folders This would be our simplest and most obvious tip. It almost goes without saying. Any organizational system you decide upon (see tip #1) will require that you create sub-folders for your files. Get used to creating folders on a regular basis. Tip #5. Don’t be Shy About Depth Create as many levels of sub-folders as you need. Don’t be scared to do so. Every time you notice an opportunity to group a set of related files into a sub-folder, do so. Examples might include: All the MP3s from one music CD, all the photos from one holiday, or all the documents from one client. It’s perfectly okay to put files into a folder called C:\Files\Me\From Others\Services\WestCo Bank\Statements\2009. That’s only seven levels deep. Ten levels is not uncommon. Of course, it’s possible to take this too far. If you notice yourself creating a sub-folder to hold only one file, then you’ve probably become a little over-zealous. On the other hand, if you simply create a structure with only two levels (for example C:\Files\Work) then you really haven’t achieved any level of organization at all (unless you own only six files!). Your “Work” folder will have become a dumping ground, just like your Desktop was, with most likely hundreds of files in it. Tip #6. Move the Standard User Folders into Your Own Folder Structure Most operating systems, including Windows, create a set of standard folders for each of its users. These folders then become the default location for files such as documents, music files, digital photos and downloaded Internet files. In Windows 7, the full list is shown below: Some of these folders you may never use nor care about (for example, the Favorites folder, if you’re not using Internet Explorer as your browser). Those ones you can leave where they are. But you may be using some of the other folders to store files that are important to you. Even if you’re not using them, Windows will still often treat them as the default storage location for many types of files. When you go to save a standard file type, it can become annoying to be automatically prompted to save it in a folder that’s not part of your own file structure. But there’s a simple solution: Move the folders you care about into your own folder structure! If you do, then the next time you go to save a file of the corresponding type, Windows will prompt you to save it in the new, moved location. Moving the folders is easy. Simply drag-and-drop them to the new location. Here’s a screenshot of the default My Music folder being moved to my custom personal folder (Mark): Tip #7. Name Files and Folders Intelligently This is another one that almost goes without saying, but we’ll say it anyway: Do not allow files to be created that have meaningless names like Document1.doc, or folders called New Folder (2). Take that extra 20 seconds and come up with a meaningful name for the file/folder – one that accurately divulges its contents without repeating the entire contents in the name. Tip #8. Watch Out for Long Filenames Another way to tell if you have not yet created enough depth to your folder hierarchy is that your files often require really long names. If you need to call a file Johnson Sales Figures March 2009.xls (which might happen to live in the same folder as Abercrombie Budget Report 2008.xls), then you might want to create some sub-folders so that the first file could be simply called March.xls, and living in the Clients\Johnson\Sales Figures\2009 folder. A well-placed file needs only a brief filename! Tip #9. Use Shortcuts! Everywhere! This is probably the single most useful and important tip we can offer. A shortcut allows a file to be in two places at once. Why would you want that? Well, the file and folder structure of every popular operating system on the market today is hierarchical. This means that all objects (files and folders) always live within exactly one parent folder. It’s a bit like a tree. A tree has branches (folders) and leaves (files). Each leaf, and each branch, is supported by exactly one parent branch, all the way back to the root of the tree (which, incidentally, is exactly why C:\ is called the “root folder” of the C: drive). That hard disks are structured this way may seem obvious and even necessary, but it’s only one way of organizing data. There are others: Relational databases, for example, organize structured data entirely differently. The main limitation of hierarchical filing structures is that a file can only ever be in one branch of the tree – in only one folder – at a time. Why is this a problem? Well, there are two main reasons why this limitation is a problem for computer users: The “correct” place for a file, according to our organizational rationale, is very often a very inconvenient place for that file to be located. Just because it’s correctly filed doesn’t mean it’s easy to get to. Your file may be “correctly” buried six levels deep in your sub-folder structure, but you may need regular and speedy access to this file every day. You could always move it to a more convenient location, but that would mean that you would need to re-file back to its “correct” location it every time you’d finished working on it. Most unsatisfactory. A file may simply “belong” in two or more different locations within your file structure. For example, say you’re an accountant and you have just completed the 2009 tax return for John Smith. It might make sense to you to call this file 2009 Tax Return.doc and file it under Clients\John Smith. But it may also be important to you to have the 2009 tax returns from all your clients together in the one place. So you might also want to call the file John Smith.doc and file it under Tax Returns\2009. The problem is, in a purely hierarchical filing system, you can’t put it in both places. Grrrrr! Fortunately, Windows (and most other operating systems) offers a way for you to do exactly that: It’s called a “shortcut” (also known as an “alias” on Macs and a “symbolic link” on UNIX systems). Shortcuts allow a file to exist in one place, and an icon that represents the file to be created and put anywhere else you please. In fact, you can create a dozen such icons and scatter them all over your hard disk. Double-clicking on one of these icons/shortcuts opens up the original file, just as if you had double-clicked on the original file itself. Consider the following two icons: The one on the left is the actual Word document, while the one on the right is a shortcut that represents the Word document. Double-clicking on either icon will open the same file. There are two main visual differences between the icons: The shortcut will have a small arrow in the lower-left-hand corner (on Windows, anyway) The shortcut is allowed to have a name that does not include the file extension (the “.docx” part, in this case) You can delete the shortcut at any time without losing any actual data. The original is still intact. All you lose is the ability to get to that data from wherever the shortcut was. So why are shortcuts so great? Because they allow us to easily overcome the main limitation of hierarchical file systems, and put a file in two (or more) places at the same time. You will always have files that don’t play nice with your organizational rationale, and can’t be filed in only one place. They demand to exist in two places. Shortcuts allow this! Furthermore, they allow you to collect your most often-opened files and folders together in one spot for convenient access. The cool part is that the original files stay where they are, safe forever in their perfectly organized location. So your collection of most often-opened files can – and should – become a collection of shortcuts! If you’re still not convinced of the utility of shortcuts, consider the following well-known areas of a typical Windows computer: The Start Menu (and all the programs that live within it) The Quick Launch bar (or the Superbar in Windows 7) The “Favorite folders” area in the top-left corner of the Windows Explorer window (in Windows Vista or Windows 7) Your Internet Explorer Favorites or Firefox Bookmarks Each item in each of these areas is a shortcut! Each of those areas exist for one purpose only: For convenience – to provide you with a collection of the files and folders you access most often. It should be easy to see by now that shortcuts are designed for one single purpose: To make accessing your files more convenient. Each time you double-click on a shortcut, you are saved the hassle of locating the file (or folder, or program, or drive, or control panel icon) that it represents. Shortcuts allow us to invent a golden rule of file and folder organization: “Only ever have one copy of a file – never have two copies of the same file. Use a shortcut instead” (this rule doesn’t apply to copies created for backup purposes, of course!) There are also lesser rules, like “don’t move a file into your work area – create a shortcut there instead”, and “any time you find yourself frustrated with how long it takes to locate a file, create a shortcut to it and place that shortcut in a convenient location.” So how to we create these massively useful shortcuts? There are two main ways: “Copy” the original file or folder (click on it and type Ctrl-C, or right-click on it and select Copy): Then right-click in an empty area of the destination folder (the place where you want the shortcut to go) and select Paste shortcut: Right-drag (drag with the right mouse button) the file from the source folder to the destination folder. When you let go of the mouse button at the destination folder, a menu pops up: Select Create shortcuts here. Note that when shortcuts are created, they are often named something like Shortcut to Budget Detail.doc (windows XP) or Budget Detail – Shortcut.doc (Windows 7). If you don’t like those extra words, you can easily rename the shortcuts after they’re created, or you can configure Windows to never insert the extra words in the first place (see our article on how to do this). And of course, you can create shortcuts to folders too, not just to files! Bottom line: Whenever you have a file that you’d like to access from somewhere else (whether it’s convenience you’re after, or because the file simply belongs in two places), create a shortcut to the original file in the new location. Tip #10. Separate Application Files from Data Files Any digital organization guru will drum this rule into you. Application files are the components of the software you’ve installed (e.g. Microsoft Word, Adobe Photoshop or Internet Explorer). Data files are the files that you’ve created for yourself using that software (e.g. Word Documents, digital photos, emails or playlists). Software gets installed, uninstalled and upgraded all the time. Hopefully you always have the original installation media (or downloaded set-up file) kept somewhere safe, and can thus reinstall your software at any time. This means that the software component files are of little importance. Whereas the files you have created with that software is, by definition, important. It’s a good rule to always separate unimportant files from important files. So when your software prompts you to save a file you’ve just created, take a moment and check out where it’s suggesting that you save the file. If it’s suggesting that you save the file into the same folder as the software itself, then definitely don’t follow that suggestion. File it in your own folder! In fact, see if you can find the program’s configuration option that determines where files are saved by default (if it has one), and change it. Tip #11. Organize Files Based on Purpose, Not on File Type If you have, for example a folder called Work\Clients\Johnson, and within that folder you have two sub-folders, Word Documents and Spreadsheets (in other words, you’re separating “.doc” files from “.xls” files), then chances are that you’re not optimally organized. It makes little sense to organize your files based on the program that created them. Instead, create your sub-folders based on the purpose of the file. For example, it would make more sense to create sub-folders called Correspondence and Financials. It may well be that all the files in a given sub-folder are of the same file-type, but this should be more of a coincidence and less of a design feature of your organization system. Tip #12. Maintain the Same Folder Structure on All Your Computers In other words, whatever organizational system you create, apply it to every computer that you can. There are several benefits to this: There’s less to remember. No matter where you are, you always know where to look for your files If you copy or synchronize files from one computer to another, then setting up the synchronization job becomes very simple Shortcuts can be copied or moved from one computer to another with ease (assuming the original files are also copied/moved). There’s no need to find the target of the shortcut all over again on the second computer Ditto for linked files (e.g Word documents that link to data in a separate Excel file), playlists, and any files that reference the exact file locations of other files. This applies even to the drive that your files are stored on. If your files are stored on C: on one computer, make sure they’re stored on C: on all your computers. Otherwise all your shortcuts, playlists and linked files will stop working! Tip #13. Create an “Inbox” Folder Create yourself a folder where you store all files that you’re currently working on, or that you haven’t gotten around to filing yet. You can think of this folder as your “to-do” list. You can call it “Inbox” (making it the same metaphor as your email system), or “Work”, or “To-Do”, or “Scratch”, or whatever name makes sense to you. It doesn’t matter what you call it – just make sure you have one! Once you have finished working on a file, you then move it from the “Inbox” to its correct location within your organizational structure. You may want to use your Desktop as this “Inbox” folder. Rightly or wrongly, most people do. It’s not a bad place to put such files, but be careful: If you do decide that your Desktop represents your “to-do” list, then make sure that no other files find their way there. In other words, make sure that your “Inbox”, wherever it is, Desktop or otherwise, is kept free of junk – stray files that don’t belong there. So where should you put this folder, which, almost by definition, lives outside the structure of the rest of your filing system? Well, first and foremost, it has to be somewhere handy. This will be one of your most-visited folders, so convenience is key. Putting it on the Desktop is a great option – especially if you don’t have any other folders on your Desktop: the folder then becomes supremely easy to find in Windows Explorer: You would then create shortcuts to this folder in convenient spots all over your computer (“Favorite Links”, “Quick Launch”, etc). Tip #14. Ensure You have Only One “Inbox” Folder Once you’ve created your “Inbox” folder, don’t use any other folder location as your “to-do list”. Throw every incoming or created file into the Inbox folder as you create/receive it. This keeps the rest of your computer pristine and free of randomly created or downloaded junk. The last thing you want to be doing is checking multiple folders to see all your current tasks and projects. Gather them all together into one folder. Here are some tips to help ensure you only have one Inbox: Set the default “save” location of all your programs to this folder. Set the default “download” location for your browser to this folder. If this folder is not your desktop (recommended) then also see if you can make a point of not putting “to-do” files on your desktop. This keeps your desktop uncluttered and Zen-like: (the Inbox folder is in the bottom-right corner) Tip #15. Be Vigilant about Clearing Your “Inbox” Folder This is one of the keys to staying organized. If you let your “Inbox” overflow (i.e. allow there to be more than, say, 30 files or folders in there), then you’re probably going to start feeling like you’re overwhelmed: You’re not keeping up with your to-do list. Once your Inbox gets beyond a certain point (around 30 files, studies have shown), then you’ll simply start to avoid it. You may continue to put files in there, but you’ll be scared to look at it, fearing the “out of control” feeling that all overworked, chaotic or just plain disorganized people regularly feel. So, here’s what you can do: Visit your Inbox/to-do folder regularly (at least five times per day). Scan the folder regularly for files that you have completed working on and are ready for filing. File them immediately. Make it a source of pride to keep the number of files in this folder as small as possible. If you value peace of mind, then make the emptiness of this folder one of your highest (computer) priorities If you know that a particular file has been in the folder for more than, say, six weeks, then admit that you’re not actually going to get around to processing it, and move it to its final resting place. Tip #16. File Everything Immediately, and Use Shortcuts for Your Active Projects As soon as you create, receive or download a new file, store it away in its “correct” folder immediately. Then, whenever you need to work on it (possibly straight away), create a shortcut to it in your “Inbox” (“to-do”) folder or your desktop. That way, all your files are always in their “correct” locations, yet you still have immediate, convenient access to your current, active files. When you finish working on a file, simply delete the shortcut. Ideally, your “Inbox” folder – and your Desktop – should contain no actual files or folders. They should simply contain shortcuts. Tip #17. Use Directory Symbolic Links (or Junctions) to Maintain One Unified Folder Structure Using this tip, we can get around a potential hiccup that we can run into when creating our organizational structure – the issue of having more than one drive on our computer (C:, D:, etc). We might have files we need to store on the D: drive for space reasons, and yet want to base our organized folder structure on the C: drive (or vice-versa). Your chosen organizational structure may dictate that all your files must be accessed from the C: drive (for example, the root folder of all your files may be something like C:\Files). And yet you may still have a D: drive and wish to take advantage of the hundreds of spare Gigabytes that it offers. Did you know that it’s actually possible to store your files on the D: drive and yet access them as if they were on the C: drive? And no, we’re not talking about shortcuts here (although the concept is very similar). By using the shell command mklink, you can essentially take a folder that lives on one drive and create an alias for it on a different drive (you can do lots more than that with mklink – for a full rundown on this programs capabilities, see our dedicated article). These aliases are called directory symbolic links (and used to be known as junctions). You can think of them as “virtual” folders. They function exactly like regular folders, except they’re physically located somewhere else. For example, you may decide that your entire D: drive contains your complete organizational file structure, but that you need to reference all those files as if they were on the C: drive, under C:\Files. If that was the case you could create C:\Files as a directory symbolic link – a link to D:, as follows: mklink /d c:\files d:\ Or it may be that the only files you wish to store on the D: drive are your movie collection. You could locate all your movie files in the root of your D: drive, and then link it to C:\Files\Media\Movies, as follows: mklink /d c:\files\media\movies d:\ (Needless to say, you must run these commands from a command prompt – click the Start button, type cmd and press Enter) Tip #18. Customize Your Folder Icons This is not strictly speaking an organizational tip, but having unique icons for each folder does allow you to more quickly visually identify which folder is which, and thus saves you time when you’re finding files. An example is below (from my folder that contains all files downloaded from the Internet): To learn how to change your folder icons, please refer to our dedicated article on the subject. Tip #19. Tidy Your Start Menu The Windows Start Menu is usually one of the messiest parts of any Windows computer. Every program you install seems to adopt a completely different approach to placing icons in this menu. Some simply put a single program icon. Others create a folder based on the name of the software. And others create a folder based on the name of the software manufacturer. It’s chaos, and can make it hard to find the software you want to run. Thankfully we can avoid this chaos with useful operating system features like Quick Launch, the Superbar or pinned start menu items. Even so, it would make a lot of sense to get into the guts of the Start Menu itself and give it a good once-over. All you really need to decide is how you’re going to organize your applications. A structure based on the purpose of the application is an obvious candidate. Below is an example of one such structure: In this structure, Utilities means software whose job it is to keep the computer itself running smoothly (configuration tools, backup software, Zip programs, etc). Applications refers to any productivity software that doesn’t fit under the headings Multimedia, Graphics, Internet, etc. In case you’re not aware, every icon in your Start Menu is a shortcut and can be manipulated like any other shortcut (copied, moved, deleted, etc). With the Windows Start Menu (all version of Windows), Microsoft has decided that there be two parallel folder structures to store your Start Menu shortcuts. One for you (the logged-in user of the computer) and one for all users of the computer. Having two parallel structures can often be redundant: If you are the only user of the computer, then having two parallel structures is totally redundant. Even if you have several users that regularly log into the computer, most of your installed software will need to be made available to all users, and should thus be moved out of the “just you” version of the Start Menu and into the “all users” area. To take control of your Start Menu, so you can start organizing it, you’ll need to know how to access the actual folders and shortcut files that make up the Start Menu (both versions of it). To find these folders and files, click the Start button and then right-click on the All Programs text (Windows XP users should right-click on the Start button itself): The Open option refers to the “just you” version of the Start Menu, while the Open All Users option refers to the “all users” version. Click on the one you want to organize. A Windows Explorer window then opens with your chosen version of the Start Menu selected. From there it’s easy. Double-click on the Programs folder and you’ll see all your folders and shortcuts. Now you can delete/rename/move until it’s just the way you want it. Note: When you’re reorganizing your Start Menu, you may want to have two Explorer windows open at the same time – one showing the “just you” version and one showing the “all users” version. You can drag-and-drop between the windows. Tip #20. Keep Your Start Menu Tidy Once you have a perfectly organized Start Menu, try to be a little vigilant about keeping it that way. Every time you install a new piece of software, the icons that get created will almost certainly violate your organizational structure. So to keep your Start Menu pristine and organized, make sure you do the following whenever you install a new piece of software: Check whether the software was installed into the “just you” area of the Start Menu, or the “all users” area, and then move it to the correct area. Remove all the unnecessary icons (like the “Read me” icon, the “Help” icon (you can always open the help from within the software itself when it’s running), the “Uninstall” icon, the link(s)to the manufacturer’s website, etc) Rename the main icon(s) of the software to something brief that makes sense to you. For example, you might like to rename Microsoft Office Word 2010 to simply Word Move the icon(s) into the correct folder based on your Start Menu organizational structure And don’t forget: when you uninstall a piece of software, the software’s uninstall routine is no longer going to be able to remove the software’s icon from the Start Menu (because you moved and/or renamed it), so you’ll need to remove that icon manually. Tip #21. Tidy C:\ The root of your C: drive (C:\) is a common dumping ground for files and folders – both by the users of your computer and by the software that you install on your computer. It can become a mess. There’s almost no software these days that requires itself to be installed in C:\. 99% of the time it can and should be installed into C:\Program Files. And as for your own files, well, it’s clear that they can (and almost always should) be stored somewhere else. In an ideal world, your C:\ folder should look like this (on Windows 7): Note that there are some system files and folders in C:\ that are usually and deliberately “hidden” (such as the Windows virtual memory file pagefile.sys, the boot loader file bootmgr, and the System Volume Information folder). Hiding these files and folders is a good idea, as they need to stay where they are and are almost never needed to be opened or even seen by you, the user. Hiding them prevents you from accidentally messing with them, and enhances your sense of order and well-being when you look at your C: drive folder. Tip #22. Tidy Your Desktop The Desktop is probably the most abused part of a Windows computer (from an organization point of view). It usually serves as a dumping ground for all incoming files, as well as holding icons to oft-used applications, plus some regularly opened files and folders. It often ends up becoming an uncontrolled mess. See if you can avoid this. Here’s why… Application icons (Word, Internet Explorer, etc) are often found on the Desktop, but it’s unlikely that this is the optimum place for them. The “Quick Launch” bar (or the Superbar in Windows 7) is always visible and so represents a perfect location to put your icons. You’ll only be able to see the icons on your Desktop when all your programs are minimized. It might be time to get your application icons off your desktop… You may have decided that the Inbox/To-do folder on your computer (see tip #13, above) should be your Desktop. If so, then enough said. Simply be vigilant about clearing it and preventing it from being polluted by junk files (see tip #15, above). On the other hand, if your Desktop is not acting as your “Inbox” folder, then there’s no reason for it to have any data files or folders on it at all, except perhaps a couple of shortcuts to often-opened files and folders (either ongoing or current projects). Everything else should be moved to your “Inbox” folder. In an ideal world, it might look like this: Tip #23. Move Permanent Items on Your Desktop Away from the Top-Left Corner When files/folders are dragged onto your desktop in a Windows Explorer window, or when shortcuts are created on your Desktop from Internet Explorer, those icons are always placed in the top-left corner – or as close as they can get. If you have other files, folders or shortcuts that you keep on the Desktop permanently, then it’s a good idea to separate these permanent icons from the transient ones, so that you can quickly identify which ones the transients are. An easy way to do this is to move all your permanent icons to the right-hand side of your Desktop. That should keep them separated from incoming items. Tip #24. Synchronize If you have more than one computer, you’ll almost certainly want to share files between them. If the computers are permanently attached to the same local network, then there’s no need to store multiple copies of any one file or folder – shortcuts will suffice. However, if the computers are not always on the same network, then you will at some point need to copy files between them. For files that need to permanently live on both computers, the ideal way to do this is to synchronize the files, as opposed to simply copying them. We only have room here to write a brief summary of synchronization, not a full article. In short, there are several different types of synchronization: Where the contents of one folder are accessible anywhere, such as with Dropbox Where the contents of any number of folders are accessible anywhere, such as with Windows Live Mesh Where any files or folders from anywhere on your computer are synchronized with exactly one other computer, such as with the Windows “Briefcase”, Microsoft SyncToy, or (much more powerful, yet still free) SyncBack from 2BrightSparks. This only works when both computers are on the same local network, at least temporarily. A great advantage of synchronization solutions is that once you’ve got it configured the way you want it, then the sync process happens automatically, every time. Click a button (or schedule it to happen automatically) and all your files are automagically put where they’re supposed to be. If you maintain the same file and folder structure on both computers, then you can also sync files depend upon the correct location of other files, like shortcuts, playlists and office documents that link to other office documents, and the synchronized files still work on the other computer! Tip #25. Hide Files You Never Need to See If you have your files well organized, you will often be able to tell if a file is out of place just by glancing at the contents of a folder (for example, it should be pretty obvious if you look in a folder that contains all the MP3s from one music CD and see a Word document in there). This is a good thing – it allows you to determine if there are files out of place with a quick glance. Yet sometimes there are files in a folder that seem out of place but actually need to be there, such as the “folder art” JPEGs in music folders, and various files in the root of the C: drive. If such files never need to be opened by you, then a good idea is to simply hide them. Then, the next time you glance at the folder, you won’t have to remember whether that file was supposed to be there or not, because you won’t see it at all! To hide a file, simply right-click on it and choose Properties: Then simply tick the Hidden tick-box: Tip #26. Keep Every Setup File These days most software is downloaded from the Internet. Whenever you download a piece of software, keep it. You’ll never know when you need to reinstall the software. Further, keep with it an Internet shortcut that links back to the website where you originally downloaded it, in case you ever need to check for updates. See tip #33 below for a full description of the excellence of organizing your setup files. Tip #27. Try to Minimize the Number of Folders that Contain Both Files and Sub-folders Some of the folders in your organizational structure will contain only files. Others will contain only sub-folders. And you will also have some folders that contain both files and sub-folders. You will notice slight improvements in how long it takes you to locate a file if you try to avoid this third type of folder. It’s not always possible, of course – you’ll always have some of these folders, but see if you can avoid it. One way of doing this is to take all the leftover files that didn’t end up getting stored in a sub-folder and create a special “Miscellaneous” or “Other” folder for them. Tip #28. Starting a Filename with an Underscore Brings it to the Top of a List Further to the previous tip, if you name that “Miscellaneous” or “Other” folder in such a way that its name begins with an underscore “_”, then it will appear at the top of the list of files/folders. The screenshot below is an example of this. Each folder in the list contains a set of digital photos. The folder at the top of the list, _Misc, contains random photos that didn’t deserve their own dedicated folder: Tip #29. Clean Up those CD-ROMs and (shudder!) Floppy Disks Have you got a pile of CD-ROMs stacked on a shelf of your office? Old photos, or files you archived off onto CD-ROM (or even worse, floppy disks!) because you didn’t have enough disk space at the time? In the meantime have you upgraded your computer and now have 500 Gigabytes of space you don’t know what to do with? If so, isn’t it time you tidied up that stack of disks and filed them into your gorgeous new folder structure? So what are you waiting for? Bite the bullet, copy them all back onto your computer, file them in their appropriate folders, and then back the whole lot up onto a shiny new 1000Gig external hard drive! Useful Folders to Create This next section suggests some useful folders that you might want to create within your folder structure. I’ve personally found them to be indispensable. The first three are all about convenience – handy folders to create and then put somewhere that you can always access instantly. For each one, it’s not so important where the actual folder is located, but it’s very important where you put the shortcut(s) to the folder. You might want to locate the shortcuts: On your Desktop In your “Quick Launch” area (or pinned to your Windows 7 Superbar) In your Windows Explorer “Favorite Links” area Tip #30. Create an “Inbox” (“To-Do”) Folder This has already been mentioned in depth (see tip #13), but we wanted to reiterate its importance here. This folder contains all the recently created, received or downloaded files that you have not yet had a chance to file away properly, and it also may contain files that you have yet to process. In effect, it becomes a sort of “to-do list”. It doesn’t have to be called “Inbox” – you can call it whatever you want. Tip #31. Create a Folder where Your Current Projects are Collected Rather than going hunting for them all the time, or dumping them all on your desktop, create a special folder where you put links (or work folders) for each of the projects you’re currently working on. You can locate this folder in your “Inbox” folder, on your desktop, or anywhere at all – just so long as there’s a way of getting to it quickly, such as putting a link to it in Windows Explorer’s “Favorite Links” area: Tip #32. Create a Folder for Files and Folders that You Regularly Open You will always have a few files that you open regularly, whether it be a spreadsheet of your current accounts, or a favorite playlist. These are not necessarily “current projects”, rather they’re simply files that you always find yourself opening. Typically such files would be located on your desktop (or even better, shortcuts to those files). Why not collect all such shortcuts together and put them in their own special folder? As with the “Current Projects” folder (above), you would want to locate that folder somewhere convenient. Below is an example of a folder called “Quick links”, with about seven files (shortcuts) in it, that is accessible through the Windows Quick Launch bar: See tip #37 below for a full explanation of the power of the Quick Launch bar. Tip #33. Create a “Set-ups” Folder A typical computer has dozens of applications installed on it. For each piece of software, there are often many different pieces of information you need to keep track of, including: The original installation setup file(s). This can be anything from a simple 100Kb setup.exe file you downloaded from a website, all the way up to a 4Gig ISO file that you copied from a DVD-ROM that you purchased. The home page of the software manufacturer (in case you need to look up something on their support pages, their forum or their online help) The page containing the download link for your actual file (in case you need to re-download it, or download an upgraded version) The serial number Your proof-of-purchase documentation Any other template files, plug-ins, themes, etc that also need to get installed For each piece of software, it’s a great idea to gather all of these files together and put them in a single folder. The folder can be the name of the software (plus possibly a very brief description of what it’s for – in case you can’t remember what the software does based in its name). Then you would gather all of these folders together into one place, and call it something like “Software” or “Setups”. If you have enough of these folders (I have several hundred, being a geek, collected over 20 years), then you may want to further categorize them. My own categorization structure is based on “platform” (operating system): The last seven folders each represents one platform/operating system, while _Operating Systems contains set-up files for installing the operating systems themselves. _Hardware contains ROMs for hardware I own, such as routers. Within the Windows folder (above), you can see the beginnings of the vast library of software I’ve compiled over the years: An example of a typical application folder looks like this: Tip #34. Have a “Settings” Folder We all know that our documents are important. So are our photos and music files. We save all of these files into folders, and then locate them afterwards and double-click on them to open them. But there are many files that are important to us that can’t be saved into folders, and then searched for and double-clicked later on. These files certainly contain important information that we need, but are often created internally by an application, and saved wherever that application feels is appropriate. A good example of this is the “PST” file that Outlook creates for us and uses to store all our emails, contacts, appointments and so forth. Another example would be the collection of Bookmarks that Firefox stores on your behalf. And yet another example would be the customized settings and configuration files of our all our software. Granted, most Windows programs store their configuration in the Registry, but there are still many programs that use configuration files to store their settings. Imagine if you lost all of the above files! And yet, when people are backing up their computers, they typically only back up the files they know about – those that are stored in the “My Documents” folder, etc. If they had a hard disk failure or their computer was lost or stolen, their backup files would not include some of the most vital files they owned. Also, when migrating to a new computer, it’s vital to ensure that these files make the journey. It can be a very useful idea to create yourself a folder to store all your “settings” – files that are important to you but which you never actually search for by name and double-click on to open them. Otherwise, next time you go to set up a new computer just the way you want it, you’ll need to spend hours recreating the configuration of your previous computer! So how to we get our important files into this folder? Well, we have a few options: Some programs (such as Outlook and its PST files) allow you to place these files wherever you want. If you delve into the program’s options, you will find a setting somewhere that controls the location of the important settings files (or “personal storage” – PST – when it comes to Outlook) Some programs do not allow you to change such locations in any easy way, but if you get into the Registry, you can sometimes find a registry key that refers to the location of the file(s). Simply move the file into your Settings folder and adjust the registry key to refer to the new location. Some programs stubbornly refuse to allow their settings files to be placed anywhere other then where they stipulate. When faced with programs like these, you have three choices: (1) You can ignore those files, (2) You can copy the files into your Settings folder (let’s face it – settings don’t change very often), or (3) you can use synchronization software, such as the Windows Briefcase, to make synchronized copies of all your files in your Settings folder. All you then have to do is to remember to run your sync software periodically (perhaps just before you run your backup software!). There are some other things you may decide to locate inside this new “Settings” folder: Exports of registry keys (from the many applications that store their configurations in the Registry). This is useful for backup purposes or for migrating to a new computer Notes you’ve made about all the specific customizations you have made to a particular piece of software (so that you’ll know how to do it all again on your next computer) Shortcuts to webpages that detail how to tweak certain aspects of your operating system or applications so they are just the way you like them (such as how to remove the words “Shortcut to” from the beginning of newly created shortcuts). In other words, you’d want to create shortcuts to half the pages on the How-To Geek website! Here’s an example of a “Settings” folder: Windows Features that Help with Organization This section details some of the features of Microsoft Windows that are a boon to anyone hoping to stay optimally organized. Tip #35. Use the “Favorite Links” Area to Access Oft-Used Folders Once you’ve created your great new filing system, work out which folders you access most regularly, or which serve as great starting points for locating the rest of the files in your folder structure, and then put links to those folders in your “Favorite Links” area of the left-hand side of the Windows Explorer window (simply called “Favorites” in Windows 7): Some ideas for folders you might want to add there include: Your “Inbox” folder (or whatever you’ve called it) – most important! The base of your filing structure (e.g. C:\Files) A folder containing shortcuts to often-accessed folders on other computers around the network (shown above as Network Folders) A folder containing shortcuts to your current projects (unless that folder is in your “Inbox” folder) Getting folders into this area is very simple – just locate the folder you’re interested in and drag it there! Tip #36. Customize the Places Bar in the File/Open and File/Save Boxes Consider the screenshot below: The highlighted icons (collectively known as the “Places Bar”) can be customized to refer to any folder location you want, allowing instant access to any part of your organizational structure. Note: These File/Open and File/Save boxes have been superseded by new versions that use the Windows Vista/Windows 7 “Favorite Links”, but the older versions (shown above) are still used by a surprisingly large number of applications. The easiest way to customize these icons is to use the Group Policy Editor, but not everyone has access to this program. If you do, open it up and navigate to: User Configuration > Administrative Templates > Windows Components > Windows Explorer > Common Open File Dialog If you don’t have access to the Group Policy Editor, then you’ll need to get into the Registry. Navigate to: HKEY_CURRENT_USER \ Software \ Microsoft \ Windows \ CurrentVersion \ Policies \ comdlg32 \ Placesbar It should then be easy to make the desired changes. Log off and log on again to allow the changes to take effect. Tip #37. Use the Quick Launch Bar as a Application and File Launcher That Quick Launch bar (to the right of the Start button) is a lot more useful than people give it credit for. Most people simply have half a dozen icons in it, and use it to start just those programs. But it can actually be used to instantly access just about anything in your filing system: For complete instructions on how to set this up, visit our dedicated article on this topic. Tip #38. Put a Shortcut to Windows Explorer into Your Quick Launch Bar This is only necessary in Windows Vista and Windows XP. The Microsoft boffins finally got wise and added it to the Windows 7 Superbar by default. Windows Explorer – the program used for managing your files and folders – is one of the most useful programs in Windows. Anyone who considers themselves serious about being organized needs instant access to this program at any time. A great place to create a shortcut to this program is in the Windows XP and Windows Vista “Quick Launch” bar: To get it there, locate it in your Start Menu (usually under “Accessories”) and then right-drag it down into your Quick Launch bar (and create a copy). Tip #39. Customize the Starting Folder for Your Windows 7 Explorer Superbar Icon If you’re on Windows 7, your Superbar will include a Windows Explorer icon. Clicking on the icon will launch Windows Explorer (of course), and will start you off in your “Libraries” folder. Libraries may be fine as a starting point, but if you have created yourself an “Inbox” folder, then it would probably make more sense to start off in this folder every time you launch Windows Explorer. To change this default/starting folder location, then first right-click the Explorer icon in the Superbar, and then right-click Properties:Then, in Target field of the Windows Explorer Properties box that appears, type %windir%\explorer.exe followed by the path of the folder you wish to start in. For example: %windir%\explorer.exe C:\Files If that folder happened to be on the Desktop (and called, say, “Inbox”), then you would use the following cleverness: %windir%\explorer.exe shell:desktop\Inbox Then click OK and test it out. Tip #40. Ummmmm…. No, that’s it. I can’t think of another one. That’s all of the tips I can come up with. I only created this one because 40 is such a nice round number… Case Study – An Organized PC To finish off the article, I have included a few screenshots of my (main) computer (running Vista). The aim here is twofold: To give you a sense of what it looks like when the above, sometimes abstract, tips are applied to a real-life computer, and To offer some ideas about folders and structure that you may want to steal to use on your own PC. Let’s start with the C: drive itself. Very minimal. All my files are contained within C:\Files. I’ll confine the rest of the case study to this folder: That folder contains the following: Mark: My personal files VC: My business (Virtual Creations, Australia) Others contains files created by friends and family Data contains files from the rest of the world (can be thought of as “public” files, usually downloaded from the Net) Settings is described above in tip #34 The Data folder contains the following sub-folders: Audio: Radio plays, audio books, podcasts, etc Development: Programmer and developer resources, sample source code, etc (see below) Humour: Jokes, funnies (those emails that we all receive) Movies: Downloaded and ripped movies (all legal, of course!), their scripts, DVD covers, etc. Music: (see below) Setups: Installation files for software (explained in full in tip #33) System: (see below) TV: Downloaded TV shows Writings: Books, instruction manuals, etc (see below) The Music folder contains the following sub-folders: Album covers: JPEG scans Guitar tabs: Text files of guitar sheet music Lists: e.g. “Top 1000 songs of all time” Lyrics: Text files MIDI: Electronic music files MP3 (representing 99% of the Music folder): MP3s, either ripped from CDs or downloaded, sorted by artist/album name Music Video: Video clips Sheet Music: usually PDFs The Data\Writings folder contains the following sub-folders: (all pretty self-explanatory) The Data\Development folder contains the following sub-folders: Again, all pretty self-explanatory (if you’re a geek) The Data\System folder contains the following sub-folders: These are usually themes, plug-ins and other downloadable program-specific resources. The Mark folder contains the following sub-folders: From Others: Usually letters that other people (friends, family, etc) have written to me For Others: Letters and other things I have created for other people Green Book: None of your business Playlists: M3U files that I have compiled of my favorite songs (plus one M3U playlist file for every album I own) Writing: Fiction, philosophy and other musings of mine Mark Docs: Shortcut to C:\Users\Mark Settings: Shortcut to C:\Files\Settings\Mark The Others folder contains the following sub-folders: The VC (Virtual Creations, my business – I develop websites) folder contains the following sub-folders: And again, all of those are pretty self-explanatory. Conclusion These tips have saved my sanity and helped keep me a productive geek, but what about you? What tips and tricks do you have to keep your files organized? Please share them with us in the comments. Come on, don’t be shy… Similar Articles Productive Geek Tips Fix For When Windows Explorer in Vista Stops Showing File NamesWhy Did Windows Vista’s Music Folder Icon Turn Yellow?Print or Create a Text File List of the Contents in a Directory the Easy WayCustomize the Windows 7 or Vista Send To MenuAdd Copy To / Move To on Windows 7 or Vista Right-Click Menu TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Acronis Online Backup DVDFab 6 Revo Uninstaller Pro Registry Mechanic 9 for Windows Track Daily Goals With 42Goals Video Toolbox is a Superb Online Video Editor Fun with 47 charts and graphs Tomorrow is Mother’s Day Check the Average Speed of YouTube Videos You’ve Watched OutlookStatView Scans and Displays General Usage Statistics

Read the article

Scripting with the Sun ZFS Storage 7000 Appliance

- by Geoff Ongley

The Sun ZFS Storage 7000 appliance has a user friendly and easy to understand graphical web based interface we call the "BUI" or "Browser User Interface".This interface is very useful for many tasks, but in some cases a script (or workflow) may be more appropriate, such as:Repetitive tasksTasks which work on (or obtain information about) a large number of shares or usersTasks which are triggered by an alert threshold (workflows)Tasks where you want a only very basic input, but a consistent output (workflows)The appliance scripting language is based on ECMAscript 3 (close to javascript). I'm not going to cover ECMAscript 3 in great depth (I'm far from an expert here), but I would like to show you some neat things you can do with the appliance, to get you started based on what I have found from my own playing around.I'm making the assumption you have some sort of programming background, and understand variables, arrays, functions to some extent - but of course if something is not clear, please let me know so I can fix it up or clarify it.Variable Declarations and ArraysVariablesECMAScript is a dynamically and weakly typed language. If you don't know what that means, google is your friend - but at a high level it means we can just declare variables with no specific type and on the fly.For example, I can declare a variable and use it straight away in the middle of my code, for example:projects=list();Which makes projects an array of values that are returned from the list(); function (which is usable in most contexts). With this kind of variable, I can do things like:projects.length (this property on array tells you how many objects are in it, good for for loops etc). Alternatively, I could say:projects=3;and now projects is just a simple number.Should we declare variables like this so loosely? In my opinion, the answer is no - I feel it is a better practice to declare variables you are going to use, before you use them - and given them an initial value. You can do so as follows:var myVariable=0;To demonstrate the ability to just randomly assign and change the type of variables, you can create a simple script at the cli as follows (bold for input):fishy10:> script("." to run)> run("cd /");("." to run)> run ("shares");("." to run)> var projects;("." to run)> projects=list();("." to run)> printf("Number of projects is: %d\n",projects.length);("." to run)> projects=152;("." to run)> printf("Value of the projects variable as an integer is now: %d\n",projects);("." to run)> .Number of projects is: 7Value of the projects variable as an integer is now: 152You can also confirm this behaviour by checking the typeof variable we are dealing with:fishy10:> script("." to run)> run("cd /");("." to run)> run ("shares");("." to run)> var projects;("." to run)> projects=list();("." to run)> printf("var projects is of type %s\n",typeof(projects));("." to run)> projects=152;("." to run)> printf("var projects is of type %s\n",typeof(projects));("." to run)> .var projects is of type objectvar projects is of type numberArraysSo you likely noticed that we have already touched on arrays, as the list(); (in the shares context) stored an array into the 'projects' variable.But what if you want to declare your own array? Easy! This is very similar to Java and other languages, we just instantiate a brand new "Array" object using the keyword new:var myArray = new Array();will create an array called "myArray".A quick example:fishy10:> script("." to run)> testArray = new Array();("." to run)> testArray[0]="This";("." to run)> testArray[1]="is";("." to run)> testArray[2]="just";("." to run)> testArray[3]="a";("." to run)> testArray[4]="test";("." to run)> for (i=0; i < testArray.length; i++)("." to run)> {("." to run)> printf("Array element %d is %s\n",i,testArray[i]);("." to run)> }("." to run)> .Array element 0 is ThisArray element 1 is isArray element 2 is justArray element 3 is aArray element 4 is testWorking With LoopsFor LoopFor loops are very similar to those you will see in C, java and several other languages. One of the key differences here is, as you were made aware earlier, we can be a bit more sloppy with our variable declarations.The general way you would likely use a for loop is as follows:for (variable; test-case; modifier for variable){}For example, you may wish to declare a variable i as 0; and a MAX_ITERATIONS variable to determine how many times this loop should repeat:var i=0;var MAX_ITERATIONS=10;And then, use this variable to be tested against some case existing (has i reached MAX_ITERATIONS? - if not, increment i using i++);for (i=0; i < MAX_ITERATIONS; i++){ // some work to do}So lets run something like this on the appliance:fishy10:> script("." to run)> var i=0;("." to run)> var MAX_ITERATIONS=10;("." to run)> for (i=0; i < MAX_ITERATIONS; i++)("." to run)> {("." to run)> printf("The number is %d\n",i);("." to run)> }("." to run)> .The number is 0The number is 1The number is 2The number is 3The number is 4The number is 5The number is 6The number is 7The number is 8The number is 9While LoopWhile loops again are very similar to other languages, we loop "while" a condition is met. For example:fishy10:> script("." to run)> var isTen=false;("." to run)> var counter=0;("." to run)> while(isTen==false)("." to run)> {("." to run)> if (counter==10) ("." to run)> { ("." to run)> isTen=true; ("." to run)> } ("." to run)> printf("Counter is %d\n",counter);("." to run)> counter++; ("." to run)> }("." to run)> printf("Loop has ended and Counter is %d\n",counter);("." to run)> .Counter is 0Counter is 1Counter is 2Counter is 3Counter is 4Counter is 5Counter is 6Counter is 7Counter is 8Counter is 9Counter is 10Loop has ended and Counter is 11So what do we notice here? Something has actually gone wrong - counter will technically be 11 once the loop completes... Why is this?Well, if we have a loop like this, where the 'while' condition that will end the loop may be set based on some other condition(s) existing (such as the counter has reached 10) - we must ensure that we terminate this iteration of the loop when the condition is met - otherwise the rest of the code will be followed which may not be desirable. In other words, like in other languages, we will only ever check the loop condition once we are ready to perform the next iteration, so any other code after we set "isTen" to be true, will still be executed as we can see it was above.We can avoid this by adding a break into our loop once we know we have set the condition - this will stop the rest of the logic being processed in this iteration (and as such, counter will not be incremented). So lets try that again:fishy10:> script("." to run)> var isTen=false;("." to run)> var counter=0;("." to run)> while(isTen==false)("." to run)> {("." to run)> if (counter==10) ("." to run)> { ("." to run)> isTen=true; ("." to run)> break;("." to run)> } ("." to run)> printf("Counter is %d\n",counter);("." to run)> counter++; ("." to run)> }("." to run)> printf("Loop has ended and Counter is %d\n", counter);("." to run)> .Counter is 0Counter is 1Counter is 2Counter is 3Counter is 4Counter is 5Counter is 6Counter is 7Counter is 8Counter is 9Loop has ended and Counter is 10Much better!Methods to Obtain and Manipulate DataGet MethodThe get method allows you to get simple properties from an object, for example a quota from a user. The syntax is fairly simple:var myVariable=get('property');An example of where you may wish to use this, is when you are getting a bunch of information about a user (such as quota information when in a shares context):var users=list();for(k=0; k < users.length; k++){ user=users[k]; run('select ' + user); var username=get('name'); var usage=get('usage'); var quota=get('quota');...Which you can then use to your advantage - to print or manipulate infomation (you could change a user's information with a set method, based on the information returned from the get method). The set method is explained next.Set MethodThe set method can be used in a simple manner, similar to get. The syntax for set is:set('property','value'); // where value is a string, if it was a number, you don't need quotesFor example, we could set the quota on a share as follows (first observing the initial value):fishy10:shares default/test-geoff> script("." to run)> var currentQuota=get('quota');("." to run)> printf("Current Quota is: %s\n",currentQuota);("." to run)> set('quota','30G');("." to run)> run('commit');("." to run)> currentQuota=get('quota');("." to run)> printf("Current Quota is: %s\n",currentQuota);("." to run)> .Current Quota is: 0Current Quota is: 32212254720This shows us using both the get and set methods as can be used in scripts, of course when only setting an individual share, the above is overkill - it would be much easier to set it manually at the cli using 'set quota=3G' and then 'commit'.List MethodThe list method can be very powerful, especially in more complex scripts which iterate over large amounts of data and manipulate it if so desired. The general way you will use list is as follows:var myVar=list();Which will make "myVar" an array, containing all the objects in the relevant context (this could be a list of users, shares, projects, etc). You can then gather or manipulate data very easily.We could list all the shares and mountpoints in a given project for example:fishy10:shares another-project> script("." to run)> var shares=list();("." to run)> for (i=0; i < shares.length; i++)("." to run)> {("." to run)> run('select ' + shares[i]);("." to run)> var mountpoint=get('mountpoint');("." to run)> printf("Share %s discovered, has mountpoint %s\n",shares[i],mountpoint);("." to run)> run('done');("." to run)> }("." to run)> .Share and-another discovered, has mountpoint /export/another-project/and-anotherShare another-share discovered, has mountpoint /export/another-project/another-shareShare bob discovered, has mountpoint /export/another-projectShare more-shares-for-all discovered, has mountpoint /export/another-project/more-shares-for-allShare yep discovered, has mountpoint /export/another-project/yepWriting More Complex and Re-Usable CodeFunctionsThe best way to be able to write more complex code is to use functions to split up repeatable or reusable sections of your code. This also makes your more complex code easier to read and understand for other programmers.We write functions as follows:function functionName(variable1,variable2,...,variableN){}For example, we could have a function that takes a project name as input, and lists shares for that project (assuming we're already in the 'project' context - context is important!):function getShares(proj){ run('select ' + proj); shares=list(); printf("Project: %s\n", proj); for(j=0; j < shares.length; j++) { printf("Discovered share: %s\n",shares[i]); } run('done'); // exit selected project}Commenting your CodeLike any other language, a large part of making it readable and understandable is to comment it. You can use the same comment style as in C and Java amongst other languages.In other words, sngle line comments use://at the beginning of the comment.Multi line comments use:/*at the beginning, and:*/ at the end.For example, here we will use both:fishy10:> script("." to run)> // This is a test comment("." to run)> printf("doing some work...\n");("." to run)> /* This is a multi-line("." to run)> comment which I will span across("." to run)> three lines in total */("." to run)> printf("doing some more work...\n");("." to run)> .doing some work...doing some more work...Your comments do not have to be on their own, they can begin (particularly with single line comments this is handy) at the end of a statement, for examplevar projects=list(); // The variable projects is an array containing all projects on the system.Try and Catch StatementsYou may be used to using try and catch statements in other languages, and they can (and should) be utilised in your code to catch expected or unexpected error conditions, that you do NOT wish to stop your code from executing (if you do not catch these errors, your script will exit!):try{ // do some work}catch(err) // Catch any error that could occur{ // do something here under the error condition}For example, you may wish to only execute some code if a context can be reached. If you can't perform certain actions under certain circumstances, that may be perfectly acceptable.For example if you want to test a condition that only makes sense when looking at a SMB/NFS share, but does not make sense when you hit an iscsi or FC LUN, you don't want to stop all processing of other shares you may not have covered yet.For example we may wish to obtain quota information on all shares for all users on a share (but this makes no sense for a LUN):function getShareQuota(shar) // Get quota for each user of this share{ run('select ' + shar); printf(" SHARE: %s\n", shar); try { run('users'); printf(" %20s %11s %11s %3s\n","Username","Usage(G)","Quota(G)","Quota(%)"); printf(" %20s %11s %11s %4s\n","--------","--------","--------","----"); users=list(); for(k=0; k < users.length; k++) { user=users[k]; getUserQuota(user); } run('done'); // exit user context } catch(err) { printf(" SKIPPING %s - This is NOT a NFS or CIFs share, not looking for users\n", shar); } run('done'); // done with this share}Running Scripts Remotely over SSHAs you have likely noticed, writing and running scripts for all but the simplest jobs directly on the appliance is not going to be a lot of fun.There's a couple of choices on what you can do here:Create scripts on a remote system and run them over sshCreate scripts, wrapping them in workflow code, so they are stored on the appliance and can be triggered under certain circumstances (like a threshold being reached)We'll cover the first one here, and then cover workflows later on (as these are for the most part just scripts with some wrapper information around them).Creating a SSH Public/Private SSH Key PairLog on to your handy Solaris box (You wouldn't be using any other OS, right? :P) and use ssh-keygen to create a pair of ssh keys. I'm storing this separate to my normal key:[geoff@lightning ~] ssh-keygen -t rsa -b 1024Generating public/private rsa key pair.Enter file in which to save the key (/export/home/geoff/.ssh/id_rsa): /export/home/geoff/.ssh/nas_key_rsaEnter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /export/home/geoff/.ssh/nas_key_rsa.Your public key has been saved in /export/home/geoff/.ssh/nas_key_rsa.pub.The key fingerprint is:7f:3d:53:f0:2a:5e:8b:2d:94:2a:55:77:66:5c:9b:14 geoff@lightningInstalling the Public Key on the ApplianceOn your Solaris host, observe the public key:[geoff@lightning ~] cat .ssh/nas_key_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvYfK3RIaAYmMHBOvyhKM41NaSmcgUMC3igPN5gUKJQvSnYmjuWG6CBr1CkF5UcDji7v19jG3qAD5lAMFn+L0CxgRr8TNaAU+hA4/tpAGkjm+dKYSyJgEdMIURweyyfUFXoerweR8AWW5xlovGKEWZTAfvJX9Zqvh8oMQ5UJLUUc= geoff@lightningNow, copy and paste everything after "ssh-rsa" and before "user@hostname" - in this case, geoff@lightning. That is, this bit:AAAAB3NzaC1yc2EAAAABIwAAAIEAvYfK3RIaAYmMHBOvyhKM41NaSmcgUMC3igPN5gUKJQvSnYmjuWG6CBr1CkF5UcDji7v19jG3qAD5lAMFn+L0CxgRr8TNaAU+hA4/tpAGkjm+dKYSyJgEdMIURweyyfUFXoerweR8AWW5xlovGKEWZTAfvJX9Zqvh8oMQ5UJLUUc=Logon to your appliance and get into the preferences -> keys area for this user (root):[geoff@lightning ~] ssh [email protected]: Last login: Mon Dec 6 17:13:28 2010 from 192.168.0.2fishy10:> configuration usersfishy10:configuration users> select rootfishy10:configuration users root> preferences fishy10:configuration users root preferences> keysOR do it all in one hit:fishy10:> configuration users select root preferences keysNow, we create a new public key that will be accepted for this user and set the type to RSA:fishy10:configuration users root preferences keys> createfishy10:configuration users root preferences key (uncommitted)> set type=RSASet the key itself using the string copied previously (between ssh-rsa and user@host), and set the key ensuring you put double quotes around it (eg. set key="<key>"):fishy10:configuration users root preferences key (uncommitted)> set key="AAAAB3NzaC1yc2EAAAABIwAAAIEAvYfK3RIaAYmMHBOvyhKM41NaSmcgUMC3igPN5gUKJQvSnYmjuWG6CBr1CkF5UcDji7v19jG3qAD5lAMFn+L0CxgRr8TNaAU+hA4/tpAGkjm+dKYSyJgEdMIURweyyfUFXoerweR8AWW5xlovGKEWZTAfvJX9Zqvh8oMQ5UJLUUc="Now set the comment for this key (do not use spaces):fishy10:configuration users root preferences key (uncommitted)> set comment="LightningRSAKey" Commit the new key:fishy10:configuration users root preferences key (uncommitted)> commitVerify the key is there:fishy10:configuration users root preferences keys> lsKeys:NAME MODIFIED TYPE COMMENT key-000 2010-10-25 20:56:42 RSA cycloneRSAKey key-001 2010-12-6 17:44:53 RSA LightningRSAKey As you can see, we now have my new key, and a previous key I have created on this appliance.Running your Script over SSH from a Remote SystemHere I have created a basic test script, and saved it as test.ecma3:[geoff@lightning ~] cat test.ecma3 script// This is a test script, By Geoff Ongley 2010.printf("Testing script remotely over ssh\n");.Now, we can run this script remotely with our keyless login:[geoff@lightning ~] ssh -i .ssh/nas_key_rsa root@fishy10 < test.ecma3Pseudo-terminal will not be allocated because stdin is not a terminal.Testing script remotely over sshPutting it Together - An Example Completed Quota Gathering ScriptSo now we have a lot of the basics to creating a script, let us do something useful, like, find out how much every user is using, on every share on the system (you will recognise some of the code from my previous examples): script/************************************** Quick and Dirty Quota Check script ** Written By Geoff Ongley ** 25 October 2010 **************************************/function getUserQuota(usr){ run('select ' + usr); var username=get('name'); var usage=get('usage'); var quota=get('quota'); var usage_g=usage / 1073741824; // convert bytes to gigabytes var quota_g=quota / 1073741824; // as above var quota_percent=0 if (quota > 0) { quota_percent=(usage / quota)*(100/1); } printf(" %20s %8.2f %8.2f %d%%\n",username,usage_g,quota_g,quota_percent); run('done'); // done with this selected user}function getShareQuota(shar){ //printf("DEBUG: selecting share %s\n", shar); run('select ' + shar); printf(" SHARE: %s\n", shar); try { run('users'); printf(" %20s %11s %11s %3s\n","Username","Usage(G)","Quota(G)","Quota(%)"); printf(" %20s %11s %11s %4s\n","--------","--------","--------","--------"); users=list(); for(k=0; k < users.length; k++) { user=users[k]; getUserQuota(user); } run('done'); // exit user context } catch(err) { printf(" SKIPPING %s - This is NOT a NFS or CIFs share, not looking for users\n", shar); } run('done'); // done with this share}function getShares(proj){ //printf("DEBUG: selecting project %s\n",proj); run('select ' + proj); shares=list(); printf("Project: %s\n", proj); for(j=0; j < shares.length; j++) { share=shares[j]; getShareQuota(share); } run('done'); // exit selected project}function getProjects(){ run('cd /'); run('shares'); projects=list(); for (i=0; i < projects.length; i++) { var project=projects[i]; getShares(project); } run('done'); // exit context for all projects}getProjects();.Which can be run as follows, and will print information like this:[geoff@lightning ~/FISHWORKS_SCRIPTS] ssh -i ~/.ssh/nas_key_rsa root@fishy10 < get_quota_utilisation.ecma3Pseudo-terminal will not be allocated because stdin is not a terminal.Project: another-project SHARE: and-another Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- nobody 0.00 0.00 0% geoffro 0.05 0.00 0% Billy 0.10 0.00 0% root 0.00 0.00 0% testing-user 0.05 0.00 0% SHARE: another-share Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- root 0.00 0.00 0% nobody 0.00 0.00 0% geoffro 0.05 0.49 9% testing-user 0.05 0.02 249% Billy 0.10 0.29 33% SHARE: bob Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- nobody 0.00 0.00 0% root 0.00 0.00 0% SHARE: more-shares-for-all Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- Billy 0.10 0.00 0% testing-user 0.05 0.00 0% nobody 0.00 0.00 0% root 0.00 0.00 0% geoffro 0.05 0.00 0% SHARE: yep Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- root 0.00 0.00 0% nobody 0.00 0.00 0% Billy 0.10 0.01 999% testing-user 0.05 0.49 9% geoffro 0.05 0.00 0%Project: default SHARE: Test-LUN SKIPPING Test-LUN - This is NOT a NFS or CIFs share, not looking for users SHARE: test-geoff Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- geoffro 0.05 0.00 0% root 3.18 10.00 31% uucp 0.00 0.00 0% nobody 0.59 0.49 119%^CKilled by signal 2.Creating a WorkflowWorkflows are scripts that we store on the appliance, and can have the script execute either on request (even from the BUI), or on an event such as a threshold being met.Workflow BasicsA workflow allows you to create a simple process that can be executed either via the BUI interface interactively, or by an alert being raised (for some threshold being reached, for example).The basics parameters you will have to set for your "workflow object" (notice you're creating a variable, that embodies ECMAScript) are as follows (parameters is optional):name: A name for this workflowdescription: A Description for the workflowparameters: A set of input parameters (useful when you need user input to execute the workflow)execute: The code, the script itself to execute, which will be function (parameters)With parameters, you can specify things like this (slightly modified sample taken from the System Administration Guide): ...parameters: variableParam1: { label: 'Name of Share', type: 'String' }, variableParam2 { label: 'Share Size', type: 'size' },execute: ....}; Note the commas separating the sections of name, parameters, execute, and so on. This is important!Also - there is plenty of properties you can set on the parameters for your workflow, these are described in the Sun ZFS Storage System Administration Guide.Creating a Basic Workflow from a Basic ScriptTo make a basic script into a basic workflow, you need to wrap the following around your script to create a 'workflow' object:var workflow = {name: 'Get User Quotas',description: 'Displays Quota Utilisation for each user on each share',execute: function() {// (basic script goes here, minus the "script" at the beginning, and "." at the end)}};However, it appears (at least in my experience to date) that the workflow object may only be happy with one function in the execute parameter - either that or I'm doing something wrong. As far as I can tell, after execute: you should only have a basic one function context like so:execute: function(){}To deal with this, and to give an example similar to our script earlier, I have created another simple quota check, to show the same basic functionality, but in a workflow format:var workflow = {name: 'Get User Quotas',description: 'Displays Quota Utilisation for each user on each share',execute: function () { run('cd /'); run('shares'); projects=list(); for (i=0; i < projects.length; i++) { run('select ' + projects[i]); shares=list('filesystem'); printf("Project: %s\n", projects[i]); for(j=0; j < shares.length; j++) { run('select ' +shares[j]); try { run('users'); printf(" SHARE: %s\n", shares[j]); printf(" %20s %11s %11s %3s\n","Username","Usage(G)","Quota(G)","Quota(%)"); printf(" %20s %11s %11s %4s\n","--------","--------","--------","-------"); users=list(); for(k=0; k < users.length; k++) { run('select ' + users[k]); username=get('name'); usage=get('usage'); quota=get('quota'); usage_g=usage / 1073741824; // convert bytes to gigabytes quota_g=quota / 1073741824; // as above quota_percent=0 if (quota > 0) { quota_percent=(usage / quota)*(100/1); } printf(" %20s %8.2f %8.2f %d%%\n",username,usage_g,quota_g,quota_percent); run('done'); } run('done'); // exit user context } catch(err) { // printf(" %s is a LUN, Not looking for users\n", shares[j]); } run('done'); // exit selected share context } run('done'); // exit project context } }};SummaryThe Sun ZFS Storage 7000 Appliance offers lots of different and interesting features to Sun/Oracle customers, including the world renowned Analytics. Hopefully the above will help you to think of new creative things you could be doing by taking advantage of one of the other neat features, the internal scripting engine!Some references are below to help you continue learning more, I'll update this post as I do the same! Enjoy...More information on ECMAScript 3A complete reference to ECMAScript 3 which will help you learn more of the details you may be interested in, can be found here:http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdfMore Information on Administering the Sun ZFS Storage 7000The Sun ZFS Storage 7000 System Administration guide can be a useful reference point, and can be found here:http://wikis.sun.com/download/attachments/186238602/2010_Q3_2_ADMIN.pdf

Search Results

Search found 1120 results on 45 pages for 'simplest'.

Page 45/45 | < Previous Page | 41 42 43 44 45

- by Jesse

- by claws

- by Brendon Muir

- by solublefish

- by JoshReuben

- by Jon Galloway

- by Rick Strahl

- by Rick Strahl

- by ToStringTheory

- by Ralf Westphal

- by BrianLy

- by Justin808

- by Ben

- by grg-n-sox

- by Shaun

- by jonathonc

- by rchrd

- by Mark Virtue

- by Geoff Ongley

< Previous Page | 41 42 43 44 45