Search Results

Search found 86 results on 4 pages for 'emerge'.

Page 4/4 | < Previous Page | 1 2 3 4 

  • Are there programs that iteratively write new programs?

    - by chris
    For about a year I have been thinking about writing a program that writes programs. This would primarily be a playful exercise that might teach me some new concepts. My inspiration came from negentropy and the ability for order to emerge from chaos and new chaos to arise out of order in infinite succession. To be more specific, the program would start by writing a short random string. If the string compiles the programs will log it for later comparison. If the string does not compile the program will try to rewrite it until it does compile. As more strings (mini 'useless' programs) are logged they can be parsed for similarities and used to generate a grammar. This grammar can then be drawn on to write more strings that have a higher probability of compilation than purely random strings. This is obviously more than a little silly, but I thought it would be fun to try and grow a program like this. And as a byproduct I get a bunch of unique programs that I can visualize and call art. I'll probably write this in Ruby due to its simple syntax and dynamic compilation and then I will visualize in processing using ruby-processing. What I would like to know is: Is there a name for this type of programming? What currently exists in this field? Who are the primary contributors? BONUS! - In what ways can I procedurally assign value to output programs beyond compiles(y/n)? I may want to extend the functionality of this program to generate a program based on parameters, but I want the program to define those parameters through running the programs that compile and assigning meaning to the programs output. This question is probably more involved than reasonable for a bonus, but if you can think of a simple way to get something like this done in less than 23 lines or one hyperlink, please toss it into your response. I know that this is not quite meta-programming and from the little I know of AI and generative algorithms they are usually more goal oriented than what I am thinking. What would be optimal is a program that continually rewrites and improves itself so I don't have to ^_^

    Read the article

  • Trouble with site-to-site OpenVPN & pfSense not passing traffic

    - by JohnCC
    I'm trying to get an OpenVPN tunnel going on pfSense 1.2.3-RELEASE running on embedded routers. I have a local LAN 10.34.43.0/254. The remote LAN is 10.200.1.0/24. The local pfSense is configured as the client, and the remote is configured as the server. My OpenVPN tunnel is using the IP range 10.99.89.0/24 internally. There are also some additional LANs on the remote side routed through the tunnel, but the issue is not with those since my connectivity fails before that point in the chain. The tunnel comes up fine and the logs look healthy. What I find is this:- I can ping and telnet to the remote LAN and the additional remote LANs from the local pfSense box's shell. I cannot ping or telnet to any remote LANs from the local network. I cannot ping or telnet to the local network from the remote LAN or the remote pfSense box's shell. If I tcpdump the tun interfaces on both sides and ping from the local LAN, I see the packets hit the tunnel locally, but they do not appear on the remote side (nor do they appear on the remote LAN interface if I tcpdump that). If I tcpdump the tun interfaces on both sides and ping from the local pfSense shell, I see the packets hit the tunnel locally, and exit the remote side. I can also tcpdump the remote LAN interface and see them pass there too. If I tcpdump the tun interfaces on both sides and ping from the remote pfSense shell, I see the packets hit the remote tun but they do not emerge from the local one. Here is the config file the remote side is using:- #user nobody #group nobody daemon keepalive 10 60 ping-timer-rem persist-tun persist-key dev tun proto udp cipher BF-CBC up /etc/rc.filter_configure down /etc/rc.filter_configure server 10.99.89.0 255.255.255.0 client-config-dir /var/etc/openvpn_csc push "route 10.200.1.0 255.255.255.0" lport <port> route 10.34.43.0 255.255.255.0 ca /var/etc/openvpn_server0.ca cert /var/etc/openvpn_server0.cert key /var/etc/openvpn_server0.key dh /var/etc/openvpn_server0.dh comp-lzo push "route 205.217.5.128 255.255.255.224" push "route 205.217.5.64 255.255.255.224" push "route 165.193.147.128 255.255.255.224" push "route 165.193.147.32 255.255.255.240" push "route 192.168.1.16 255.255.255.240" push "route 192.168.2.16 255.255.255.240" Here is the local config:- writepid /var/run/openvpn_client0.pid #user nobody #group nobody daemon keepalive 10 60 ping-timer-rem persist-tun persist-key dev tun proto udp cipher BF-CBC up /etc/rc.filter_configure down /etc/rc.filter_configure remote <host> <port> client lport 1194 ifconfig 10.99.89.2 10.99.89.1 ca /var/etc/openvpn_client0.ca cert /var/etc/openvpn_client0.cert key /var/etc/openvpn_client0.key comp-lzo You can see the relevant parts of the routing tables extracted from pfSense here http://pastie.org/5365800 The local firewall permits all ICMP from the LAN, and my PC is allowed everything to anywhere. The remote firewall treats its LAN as trusted and permits all traffic on that interface. Can anyone suggest why this is not working, and what I could try next?

    Read the article

  • TFS, G.I. Joe and Under-doing

    If I were to rank the most consistently irritating parts of my work day, using TFS would come in first by a wide margin. Even repeated network outages this week seem like a pleasant reprieve from this monolithic beast. This is not a reflexive anti-Microsoft feeling, that attitude just wouldnt work for a consultant who does .NET development. It is also not an utter dismissal of TFS as worthless; Ive seen people use it effectively on several projects. So why? Ill start with a laundry list of shortcomings. An out of the box UI for work items that is insultingly bad, a source control system that is confoundingly fragile when handling merges, folder renames and long file names, the arcane XML wizardry necessary to customize a template and a build system that adds an extra layer of oddness on top of msbuild. Im sure my legion of readers will soon point out to me how I can work around all these issues, how this is fixed in TFS 2010 or with this add-in, and how once you have everything set up, youre fine. And theyd be right, any one of these problems could be worked around. If not dirty laundry, what else? I thought about it for a while, and came to the conclusion that TFS is so irritating to me because it represents a vision of software development that I find unappealing. To expand upon this, lets start with some wisdom from those great PSAs at the end of the G.I. Joe cartoons of the 80s: Now you know, and knowing is half the battle. In software development, Id go further and say knowing is more than half the battle. Understanding the dimensions of the problem you are trying to solve, the needs of the users, the value that your software can provide are more than half the battle. Implementation of this understanding is not easy, but it is not even possible without this knowledge. Assuming we have a fixed amount of time and mental energy for any project, why does this spell trouble for TFS? If you think about what TFS is doing, its offering you a huge array of options to track the day to day implementation of your project. From tasks, to code churn, to test coverage. All valuable metrics, but only in exchange for valuable time to get it all working. In addition, when you have a shiny toy like TFS, the temptation is to feel obligated to use it. So the push from TFS is to encourage a project manager and team to focus on process and metrics around process. You can get great visibility, and graphs to show your project stakeholders, but none of that is important if you are not implementing the right product. Not just unimportant, these activities can be harmful as they drain your time and sap your creativity away from the rest of the project. To be more concrete, lets suppose your organization has invested the time to create a template for your projects and trained people in how to use it, so there is no longer a big investment of time for each project to get up and running. First, Id challenge if that template could be specific enough to be full featured and still applicable for any project. Second, the very existence of this template would be a indication to a project manager that the success of their project was somehow directly related to fitting management of that project into this format. Again, while the capabilities are wonderful, the mirage is there; just get everything into TFS and your project will run smoothly. Ill close the loop on this first topic by proposing a thought experiment. Think of the projects youve worked on. How many times have you been chagrined to discover youve implemented the wrong feature, misunderstood how a feature should work or just plain spent too much time on a screen that nobody uses? That sounds like a really worthwhile area to invest time in improving. How about going back to these projects and thinking about how many times you wished you had optimized the state change flow of your tasks or been embarrassed to not have a code churn report linked back to the latest changeset? With thanks to the Real American Heroes, Ill move on to a more current influence, that of the developers at 37signals, and their philosophy towards software development. This philosophy, fully detailed in the books Getting Real and Rework, is a vision of software that under does the competition. This is software that is deliberately limited in functionality in order to concentrate fully on making sure ever feature that is there is awesome and needed. Why is this relevant? Well, in one of those fun seeming paradoxes in life, constraints can be a spark for creativity. Think Twitter, the small screen of an iPhone, the limitations of HTML for applications, the low memory limits of older or embedded system. As long as there is some freedom within those constraints, amazing things emerge. For project management, some of the most respected people in the industry recommend using just index cards, pens and tape. They argue that with change the constant in software development, your process should be as limited (yet rigorous) as possible. Looking at TFS, this is not a system designed to under do anybody. It is a big jumble of components and options, with every feature you could think of. Predictably this means many basic functions are hard to use. For task management, many people just use an Excel spreadsheet linked up to TFS. Not a stirring endorsement of the tooling there. TFS as a whole would be far more appealing to me if there was less of it, but better. Id cut 50% of the features to make the other half really amaze and inspire me. And thats really the heart of the matter. TFS has great promise and I want to believe it can work better. But ultimately it focuses your attention on a lot of stuff that doesnt really matter and then clamps down your creativity in a mess of forms and dialogs obscuring what does.   --- Relevant Links --- All those great G.I. Joe PSAs are on YouTube, including lots of mashed up versions. A simple Google search will get you on the right track.Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • Preserving Permalinks

    - by Daniel Moth
    One of the things that gets me on a rant is websites that break permalinks. If you have posted something somewhere and there is a public URL pointing to it, that URL should never ever return a 404. You are breaking all websites that ever linked to you and you are breaking all search engine links to your content (that others will try and follow). It is a pet peeve of mine. So when I had to move my blog, obviously I would preserve the root URL (www.danielmoth.com/Blog/), but I also wanted to preserve every URL my blog has generated over the years. To be clear, our focus here is on the URL formatting, not the content migration which I'll talk about in my next post. In this post, I'll describe my solution first and then what it solves. 1. The IIS7 Rewrite Module and web.config There are a few ways you can map an old URL to a new one (so when requests to the old URL come in, they get redirected to the new one). The new blog engine I use (dasBlog) has built-in functionality to do that (Scott refers to it here). Instead, the way I chose to address the issue was to use the IIS7 rewrite module. The IIS7 rewrite module allows redirecting URLs based on pattern matching, regular expressions and, of course, hardcoded full URLs for things that don't fall into any pattern. You can configure it visually from IIS Manager using a handy dialog that allows testing patterns against input URLs. Here is what mine looked like after configuring a few rules: To learn more about this technology check out this video, the reference page and this overview blog post; all 3 pages have a collection of related resources at the bottom worth checking out too. All the visual configuration ends up in a web.config file at the root folder of your website. If you are on a shared hosting service, probably the only way you can use the Rewrite Module is by directly editing the web.config file. Next, I'll describe the URLs I had to map and how that manifested itself in the web.config file. What I did was create the rules locally using the GUI, and then took the generated web.config file and uploaded it to my live site. You can view my web.config here. 2. Monthly Archives Observe the difference between the way the two blog engines generate this type of URL Blogger: /Blog/2004_07_01_mothblog_archive.html dasBlog: /Blog/default,month,2004-07.aspx In my web.config file, the rule that deals with this is the one named "monthlyarchive_redirect". 3. Categories Observe the difference between the way the two blog engines generate this type of URL Blogger: /Blog/labels/Personal.html dasBlog: /Blog/CategoryView,category,Personal.aspx In my web.config file the rule that deals with this is the one named "category_redirect". 4. Posts Observe the difference between the way the two blog engines generate this type of URL Blogger: /Blog/2004/07/hello-world.html dasBlog: /Blog/Hello-World.aspx In my web.config file the rule that deals with this is the one named "post_redirect". Note: The decision is taken to use dasBlog URLs that do not include the date info (see the description of my Appearance settings). If we included the date info then it would have to include the day part, which blogger did not generate. This makes it impossible to redirect correctly and to have a single permalink for blog posts moving forward. An implication of this decision, is that no two blog posts can have the same title. The tool I will describe in my next post (inelegantly) deals with duplicates, but not with triplicates or higher. 5. Unhandled by a generic rule Unfortunately, the two blog engines use different rules for generating URLs for blog posts. Most of the time the conversion is as simple as the example of the previous section where a post titled "Hello World" generates a URL with the words separated by a hyphen. Some times that is not the case, for example: /Blog/2006/05/medc-wrap-up.html /Blog/MEDC-Wrapup.aspx or /Blog/2005/01/best-of-moth-2004.html /Blog/Best-Of-The-Moth-2004.aspx or /Blog/2004/11/more-windows-mobile-2005-details.html /Blog/More-Windows-Mobile-2005-Details-Emerge.aspx In short, blogger does not add words to the title beyond ~39 characters, it drops some words from the title generation (e.g. a, an, on, the), and it preserve hyphens that appear in the title. For this reason, we need to detect these and explicitly list them for redirects (no regular expression can help here because the full set of rules is not listed anywhere). In my web.config file the rule that deals with this is the one named "Redirect rule1 for FullRedirects" combined with the rewriteMap named "StaticRedirects". Note: The tool I describe in my next post will detect all the URLs that need to be explicitly redirected and will list them in a file ready for you to copy them to your web.config rewriteMap. 6. C# code doing the same as the web.config I wrote some naive code that does the same thing as the web.config: given a string it will return a new string converted according to the 3 rules above. It does not take into account the 4th case where an explicit hard-coded conversion is needed (the tool I present in the next post does take that into account). static string REGEX_post_redirect = "[0-9]{4}/[0-9]{2}/([0-9a-z-]+).html"; static string REGEX_category_redirect = "labels/([_0-9a-z-% ]+).html"; static string REGEX_monthlyarchive_redirect = "([0-9]{4})_([0-9]{2})_[0-9]{2}_mothblog_archive.html"; static string Redirect(string oldUrl) { GroupCollection g; if (RunRegExOnIt(oldUrl, REGEX_post_redirect, 2, out g)) return string.Concat(g[1].Value, ".aspx"); if (RunRegExOnIt(oldUrl, REGEX_category_redirect, 2, out g)) return string.Concat("CategoryView,category,", g[1].Value, ".aspx"); if (RunRegExOnIt(oldUrl, REGEX_monthlyarchive_redirect, 3, out g)) return string.Concat("default,month,", g[1].Value, "-", g[2], ".aspx"); return string.Empty; } static bool RunRegExOnIt(string toRegEx, string pattern, int groupCount, out GroupCollection g) { if (pattern.Length == 0) { g = null; return false; } g = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(toRegEx).Groups; return (g.Count == groupCount); } Comments about this post welcome at the original blog.

    Read the article

  • The last MVVM you'll ever need?

    - by Nuri Halperin
    As my MVC projects mature and grow, the need to have some omnipresent, ambient model properties quickly emerge. The application no longer has only one dynamic pieced of data on the page: A sidebar with a shopping cart, some news flash on the side – pretty common stuff. The rub is that a controller is invoked in context of a single intended request. The rest of the data, even though it could be just as dynamic, is expected to appear on it's own. There are many solutions to this scenario. MVVM prescribes creating elaborate objects which expose your new data as a property on some uber-object with more properties exposing the "side show" ambient data. The reason I don't love this approach is because it forces fairly acute awareness of the view, and soon enough you have many MVVM objects laying around, and views have to start doing null-checks in order to ensure you really supplied all the values before binding to them. Ick. Just as unattractive is the ViewData dictionary. It's not strongly typed, and in both this and the MVVM approach someone has to populate these properties – n'est pas? Where does that live? With MVC2, we get the formerly-futures  feature Html.RenderAction(). The feature allows you plant a line in a view, of the format: <% Html.RenderAction("SessionInterest", "Session"); %> While this syntax looks very clean, I can't help being bothered by it. MVC was touting a very strong separation of concerns, the Model taking on the role of the business logic, the controller handling route and performing minimal view-choosing operations and the views strictly focused on rendering out angled-bracket tags. The RenderAction() syntax has the view calling some controller and invoking it inline with it's runtime rendering. This – to my taste – embeds too much  knowledge of controllers into the view's code – which was allegedly forbidden.  The one way flow "Controller Receive Data –> Controller invoke Model –> Controller select view –> Controller Hand data to view" now gets a "View calls controller and gets it's own data" which is not so one-way anymore. Ick. I toyed with some other solutions a bit, including some base controllers, special view classes etc. My current favorite though is making use of the ExpandoObject and dynamic features with C# 4.0. If you follow Phil Haack or read a bit from David Heyden you can see the general picture emerging. The game changer is that using the new dynamic syntax, one can sprout properties on an object and make use of them in the view. Well that beats having a bunch of uni-purpose MVVM's any day! Rather than statically exposed properties, we'll just use the capability of adding members at runtime. Armed with new ideas and syntax, I went to work: First, I created a factory method to enrich the focuse object: public static class ModelExtension { public static dynamic Decorate(this Controller controller, object mainValue) { dynamic result = new ExpandoObject(); result.Value = mainValue; result.SessionInterest = CodeCampBL.SessoinInterest(); result.TagUsage = CodeCampBL.TagUsage(); return result; } } This gives me a nice fluent way to have the controller add the rest of the ambient "side show" items (SessionInterest, TagUsage in this demo) and expose them all as the Model: public ActionResult Index() { var data = SyndicationBL.Refresh(TWEET_SOURCE_URL); dynamic result = this.Decorate(data); return View(result); } So now what remains is that my view knows to expect a dynamic object (rather than statically typed) so that the ASP.NET page compiler won't barf: <%@ Page Language="C#" Title="Ambient Demo" MasterPageFile="~/Views/Shared/Ambient.Master" Inherits="System.Web.Mvc.ViewPage<dynamic>" %> Notice the generic ViewPage<dynamic>. It doesn't work otherwise. In the page itself, Model.Value property contains the main data returned from the controller. The nice thing about this, is that the master page (Ambient.Master) also inherits from the generic ViewMasterPage<dynamic>. So rather than the page worrying about all this ambient stuff, the side bars and panels for ambient data all reside in a master page, and can be rendered using the RenderPartial() syntax: <% Html.RenderPartial("TagCloud", Model.SessionInterest as Dictionary<string, int>); %> Note here that a cast is necessary. This is because although dynamic is magic, it can't figure out what type this property is, and wants you to give it a type so its binder can figure out the right property to bind to at runtime. I use as, you can cast if you like. So there we go – no violation of MVC, no explosion of MVVM models and voila – right? Well, I could not let this go without a tweak or two more. The first thing to improve, is that some views may not need all the properties. In that case, it would be a waste of resources to populate every property. The solution to this is simple: rather than exposing properties, I change d the factory method to expose lambdas - Func<T> really. So only if and when a view accesses a member of the dynamic object does it load the data. public static class ModelExtension { // take two.. lazy loading! public static dynamic LazyDecorate(this Controller c, object mainValue) { dynamic result = new ExpandoObject(); result.Value = mainValue; result.SessionInterest = new Func<Dictionary<string, int>>(() => CodeCampBL.SessoinInterest()); result.TagUsage = new Func<Dictionary<string, int>>(() => CodeCampBL.TagUsage()); return result; } } Now that lazy loading is in place, there's really no reason not to hook up all and any possible ambient property. Go nuts! Add them all in – they won't get invoked unless used. This now requires changing the signature of usage on the ambient properties methods –adding some parenthesis to the master view: <% Html.RenderPartial("TagCloud", Model.SessionInterest() as Dictionary<string, int>); %> And, of course, the controller needs to call LazyDecorate() rather than the old Decorate(). The final touch is to introduce a convenience method to the my Controller class , so that the tedium of calling Decorate() everywhere goes away. This is done quite simply by adding a bunch of methods, matching View(object), View(string,object) signatures of the Controller class: public ActionResult Index() { var data = SyndicationBL.Refresh(TWEET_SOURCE_URL); return AmbientView(data); } //these methods can reside in a base controller for the solution: public ViewResult AmbientView(dynamic data) { dynamic result = ModelExtension.LazyDecorate(this, data); return View(result); } public ViewResult AmbientView(string viewName, dynamic data) { dynamic result = ModelExtension.LazyDecorate(this, data); return View(viewName, result); } The call to AmbientView now replaces any call the View() that requires the ambient data. DRY sattisfied, lazy loading and no need to replace core pieces of the MVC pipeline. I call this a good MVC day. Enjoy!

    Read the article

  • Reviewing Retail Predictions for 2011

    - by David Dorf
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} I've been busy thinking about what 2012 and beyond will look like for retail, and I have some interesting predictions to share.  But before I go there, let’s first review this year’s predictions before making new ones for 2012. 1. Alternate Payments We've seen several alternate payment schemes emerge over the last two years, and 2011 may be the year one of them takes hold. Any competition that can drive down fees will be good for everyone. I'm betting that Apple will add NFC chips to their next version of the iPhone, then enable payments in stores using iTunes accounts on the backend. Paypal will continue to make inroads, and Isis will announce a pilot. The iPhone 4S did not contain an NFC chip, so we’ll have to continuing waiting for the iPhone 5. PayPal announced its moving into in-store payments, and Google launched its wallet in selected cities.  Overall I think the payment scene is heating up and that trend will continue. 2. Engineered Systems The industry is moving toward purpose-built appliances that are optimized across the entire stack. Oracle calls these "engineered systems" and the first two examples are Exadata and Exalogic, but there are other examples from other vendors. These are particularly important to the retail industry because of the volume of data that must be processed. There should be continued adoption in 2011. Oracle reports that Exadata is its fasting growing product, and at the recent OpenWorld it announced the SuperCluster and Exalytics products, both continuing the engineered systems trend. SAP’s HANA continues to receive attention, and IBM also seems to be moving in this direction. 3. Social Analytics There are lots of tools that provide insight into how a brand is perceived across popular internet sites, but as far as I know, these tools are not industry specific. The next step needs to mine the data and determine how it should influence retail operations. The data needs to help retailers determine how they create promotions, which products to stock, and how to keep consumers engaged. Social data alone does not provide the answers, but its one more data point that will help retailers make better decisions. Look for some vendor consolidation to help make this happen. In March, Salesforce.com acquired leading social monitoring vendor Radian6 and followed up with acquisitions of Heroku and Model Metrics. The notion of Social CRM seems to be going more mainstream now. 4. 2-D Barcodes Look for more QRCodes on shelf-tags, in newspaper circulars, and on billboards. It's a great portal from the physical world into the digital one that buys us time until augmented reality matures further. Nobody wants to type "www", backslash, and ".com" on their phones. QRCodes are everywhere. ‘Nuff said. 5. In the words of Microsoft, "To the Cloud!" My favorite "cloud application" is Evernote. If you take notes on your work laptop, you will inevitably need those notes on your home PC. And if you manage to solve that problem, you'll need to access them from your mobile phone. Evernote stores your notes in the cloud and provides easy ways to access them. Being able to access a service from anywhere and not having to worry about backups, upgrades, etc. is great. Retailers will start to rely on cloud services, both public and private, in the coming year. There were no shortage of announcements in this area: Amazon’s cloud-based Kindle Fire, Apple’s iCloud, Oracle’s Public Cloud, etc. I saw an interesting presentation showing how BevMo moved their systems to the cloud.  Seems like retailers are starting to consider the cloud for specific uses. 6. F-CommerceTop of Form Move over "E" and "M" so we can introduce "F-Commerce," which should go mainstream in 2011. Already several retailers have created small stores on Facebook, and it won't be long before Facebook becomes a full-fledged channel in the omni-channel world of retail. The battle between Facebook and Google will heat up over retail, where both stand to make lots of money. JCPenney and ASOS both put their entire catalogs on Facebook, and lots of other retailers have connected Facebook to their e-commerce site. I still think selling from the newsfeed is the best approach, and several retailers are trying that approach as well. I just don’t see Google+ as a threat to Facebook, so I think that battle is over.  I called 2011 The Year of F-Commerce, and that was probably accurate. Its good to look back at predictions, but we also have to think about what was missed.  I didn't see Amazon entering the tablet business with such a splash, although in hindsight it was obvious. Nor did I think HP would fall so far so fast.  Look for my 2012 predictions coming soon.

    Read the article

  • ANTS CLR and Memory Profiler In Depth Review (Part 2 of 2 &ndash; Memory Profiler)

    - by ToStringTheory
    One of the things that people might not know about me, is my obsession to make my code as efficient as possible. Many people might not realize how much of a task or undertaking that this might be, but it is surely a task as monumental as climbing Mount Everest, except this time it is a challenge for the mind… In trying to make code efficient, there are many different factors that play a part – size of project or solution, tiers, language used, experience and training of the programmer, technologies used, maintainability of the code – the list can go on for quite some time. I spend quite a bit of time when developing trying to determine what is the best way to implement a feature to accomplish the efficiency that I look to achieve. One program that I have recently come to learn about – Red Gate ANTS Performance (CLR) and Memory profiler gives me tools to accomplish that job more efficiently as well. In this review, I am going to cover some of the features of the ANTS memory profiler set by compiling some hideous example code to test against. Notice As a member of the Geeks With Blogs Influencers program, one of the perks is the ability to review products, in exchange for a free license to the program. I have not let this affect my opinions of the product in any way, and Red Gate nor Geeks With Blogs has tried to influence my opinion regarding this product in any way. Introduction – Part 2 In my last post, I reviewed the feature packed Red Gate ANTS Performance Profiler.  Separate from the Red Gate Performance Profiler is the Red Gate ANTS Memory Profiler – a simple, easy to use utility for checking how your application is handling memory management…  A tool that I wish I had had many times in the past.  This post will be focusing on the ANTS Memory Profiler and its tool set. The memory profiler has a large assortment of features just like the Performance Profiler, with the new session looking nearly exactly alike: ANTS Memory Profiler Memory profiling is not something that I have to do very often…  In the past, the few cases I’ve had to find a memory leak in an application I have usually just had to trace the code of the operations being performed to look for oddities…  Sadly, I have come across more undisposed/non-using’ed IDisposable objects, usually from ADO.Net than I would like to ever see.  Support is not fun, however using ANTS Memory Profiler makes this task easier.  For this round of testing, I am going to use the same code from my previous example, using the WPF application. This time, I will choose the ‘Profile Memory’ option from the ANTS menu in Visual Studio, which launches the solution in its currently configured state/start-up project, and then launches the ANTS Memory Profiler to help.  It prepopulates all of the fields with the current project information, and all I have to do is select the ‘Start Profiling’ option. When the window comes up, it is actually quite barren, just giving ideas on how to work the profiler.  You start by getting to the point in your application that you want to profile, and then taking a ‘Memory Snapshot’.  This performs a full garbage collection, and snapshots the managed heap.  Using the same WPF app as before, I will go ahead and take a snapshot now. As you can see, ANTS is already giving me lots of information regarding the snapshot, however this is just a snapshot.  The whole point of the profiler is to perform an action, usually one where a memory problem is being noticed, and then take another snapshot and perform a diff between them to see what has changed.  I am going to go ahead and generate 5000 primes, and then take another snapshot: As you can see, ANTS is already giving me a lot of new information about this snapshot compared to the last.  Information such as difference in memory usage, fragmentation, class usage, etc…  If you take more snapshots, you can use the dropdown at the top to set your actual comparison snapshots. If you beneath the timeline, you will see a breadcrumb trail showing how best to approach profiling memory using ANTS.  When you first do the comparison, you start on the Summary screen.  You can either use the charts at the bottom, or switch to the class list screen to get to the next step.  Here is the class list screen: As you can see, it lists information about all of the instances between the snapshots, as well as at the bottom giving you a way to filter by telling ANTS what your problem is.  I am going to go ahead and select the Int16[] to look at the Instance Categorizer Using the instance categorizer, you can travel backwards to see where all of the instances are coming from.  It may be hard to see in this image, but hopefully the lightbox (click on it) will help: I can see that all of these instances are rooted to the application through the UI TextBlock control.  This image will probably be even harder to see, however using the ‘Instance Retention Graph’, you can trace an objects memory inheritance up the chain to see its roots as well.  This is a simple example, as this is simply a known element.  Usually you would be profiling an actual problem, and comparing those differences.  I know in the past, I have spotted a problem where a new context was created per page load, and it was rooted into the application through an event.  As the application began to grow, performance and reliability problems started to emerge.  A tool like this would have been a great way to identify the problem quickly. Overview Overall, I think that the Red Gate ANTS Memory Profiler is a great utility for debugging those pesky leaks.  3 Biggest Pros: Easy to use interface with lots of options for configuring profiling session Intuitive and helpful interface for drilling down from summary, to instance, to root graphs ANTS provides an API for controlling the profiler. Not many options, but still helpful. 2 Biggest Cons: Inability to automatically snapshot the memory by interval Lack of complete integration with Visual Studio via an extension panel Ratings Ease of Use (9/10) – I really do believe that they have brought simplicity to the once difficult task of memory profiling.  I especially liked how it stepped you further into the drilldown by directing you towards the best options. Effectiveness (10/10) – I believe that the profiler does EXACTLY what it purports to do.  Features (7/10) – A really great set of features all around in the application, however, I would like to see some ability for automatically triggering snapshots based on intervals or framework level items such as events. Customer Service (10/10) – My entire experience with Red Gate personnel has been nothing but good.  their people are friendly, helpful, and happy! UI / UX (9/10) – The interface is very easy to get around, and all of the options are easy to find.  With a little bit of poking around, you’ll be optimizing Hello World in no time flat! Overall (9/10) – Overall, I am happy with the Memory Profiler and its features, as well as with the service I received when working with the Red Gate personnel.  Thank you for reading up to here, or skipping ahead – I told you it would be shorter!  Please, if you do try the product, drop me a message and let me know what you think!  I would love to hear any opinions you may have on the product. Code Feel free to download the code I used above – download via DropBox

    Read the article

  • How about a new platform for your next API&hellip; a CMS?

    - by Elton Stoneman
    Originally posted on: http://geekswithblogs.net/EltonStoneman/archive/2014/05/22/how-about-a-new-platform-for-your-next-apihellip-a.aspxSay what? I’m seeing a type of API emerge which serves static or long-lived resources, which are mostly read-only and have a controlled process to update the data that gets served. Think of something like an app configuration API, where you want a central location for changeable settings. You could use this server side to store database connection strings and keep all your instances in sync, or it could be used client side to push changes out to all users (and potentially driving A/B or MVT testing). That’s a good candidate for a RESTful API which makes proper use of HTTP expiration and validation caching to minimise traffic, but really you want a front end UI where you can edit the current config that the API returns and publish your changes. Sound like a Content Mangement System would be a good fit? I’ve been looking at that and it’s a great fit for this scenario. You get a lot of what you need out of the box, the amount of custom code you need to write is minimal, and you get a whole lot of extra stuff from using CMS which is very useful, but probably not something you’d build if you had to put together a quick UI over your API content (like a publish workflow, fine-grained security and an audit trail). You typically use a CMS for HTML resources, but it’s simple to expose JSON instead – or to do content negotiation to support both, so you can open a resource in a browser and see a nice visual representation, or request it with: Accept=application/json and get the same content rendered as JSON for the app to use. Enter Umbraco Umbraco is an open source .NET CMS that’s been around for a while. It has very good adoption, a lively community and a good release cycle. It’s easy to use, has all the functionality you need for a CMS-driven API, and it’s scalable (although you won’t necessarily put much scale on the CMS layer). In the rest of this post, I’ll build out a simple app config API using Umbraco. We’ll define the structure of the configuration resource by creating a new Document Type and setting custom properties; then we’ll build a very simple Razor template to return configuration documents as JSON; then create a resource and see how it looks. And we’ll look at how you could build this into a wider solution. If you want to try this for yourself, it’s ultra easy – there’s an Umbraco image in the Azure Website gallery, so all you need to to is create a new Website, select Umbraco from the image and complete the installation. It will create a SQL Azure website to store all the content, as well as a Website instance for editing and accessing content. They’re standard Azure resources, so you can scale them as you need. The default install creates a starter site for some HTML content, which you can use to learn your way around (or just delete). 1. Create Configuration Document Type In Umbraco you manage content by creating and modifying documents, and every document has a known type, defining what properties it holds. We’ll create a new Document Type to describe some basic config settings. In the Settings section from the left navigation (spanner icon), expand Document Types and Master, hit the ellipsis and select to create a new Document Type: This will base your new type off the Master type, which gives you some existing properties that we’ll use – like the Page Title which will be the resource URL. In the Generic Properties tab for the new Document Type, you set the properties you’ll be able to edit and return for the resource: Here I’ve added a text string where I’ll set a default cache lifespan, an image which I can use for a banner display, and a date which could show the user when the next release is due. This is the sort of thing that sits nicely in an app config API. It’s likely to change during the life of the product, but not very often, so it’s good to have a centralised place where you can make and publish changes easily and safely. It also enables A/B and MVT testing, as you can change the response each client gets based on your set logic, and their apps will behave differently without needing a release. 2. Define the response template Now we’ve defined the structure of the resource (as a document), in Umbraco we can define a C# Razor template to say how that resource gets rendered to the client. If you only want to provide JSON, it’s easy to render the content of the document by building each property in the response (Umbraco uses dynamic objects so you can specify document properties as object properties), or you can support content negotiation with very little effort. Here’s a template to render the document as HTML or JSON depending on the Accept header, using JSON.NET for the API rendering: @inherits Umbraco.Web.Mvc.UmbracoTemplatePage @using Newtonsoft.Json @{ Layout = null; } @if(UmbracoContext.HttpContext.Request.Headers["accept"] != null &amp;&amp; UmbracoContext.HttpContext.Request.Headers["accept"] == "application/json") { Response.ContentType = "application/json"; @Html.Raw(JsonConvert.SerializeObject(new { cacheLifespan = CurrentPage.cacheLifespan, bannerImageUrl = CurrentPage.bannerImage, nextReleaseDate = CurrentPage.nextReleaseDate })) } else { <h1>App configuration</h1> <p>Cache lifespan: <b>@CurrentPage.cacheLifespan</b></p> <p>Banner Image: </p> <img src="@CurrentPage.bannerImage"> <p>Next Release Date: <b>@CurrentPage.nextReleaseDate</b></p> } That’s a rough-and ready example of what you can do. You could make it completely generic and just render all the document’s properties as JSON, but having a specific template for each resource gives you control over what gets sent out. And the templates are evaluated at run-time, so if you need to change the output – or extend it, say to add caching response headers – you just edit the template and save, and the next client request gets rendered from the new template. No code to build and ship. 3. Create the content With your document type created, in  the Content pane you can create a new instance of that document, where Umbraco gives you a nice UI to input values for the properties we set up on the Document Type: Here I’ve set the cache lifespan to an xs:duration value, uploaded an image for the banner and specified a release date. Each property gets the appropriate input control – text box, file upload and date picker. At the top of the page is the name of the resource – myapp in this example. That specifies the URL for the resource, so if I had a DNS entry pointing to my Umbraco instance, I could access the config with a URL like http://static.x.y.z.com/config/myapp. The setup is all done now, so when we publish this resource it’ll be available to access.  4. Access the resource Now if you open  that URL in the browser, you’ll see the HTML version rendered: - complete with the  image and formatted date. Umbraco lets you save changes and preview them before publishing, so the HTML view could be a good way of showing editors their changes in a usable view, before they confirm them. If you browse the same URL from a REST client, specifying the Accept=application/json request header, you get this response:   That’s the exact same resource, with a managed UI to publish it, being accessed as HTML or JSON with a tiny amount of effort. 5. The wider landscape If you have fairy stable content to expose as an API, I think  this approach is really worth considering. Umbraco scales very nicely, but in a typical solution you probably wouldn’t need it to. When you have additional requirements, like logging API access requests - but doing it out-of-band so clients aren’t impacted, you can put a very thin API layer on top of Umbraco, and cache the CMS responses in your API layer:   Here the API does a passthrough to CMS, so the CMS still controls the content, but it caches the response. If the response is cached for 1 minute, then Umbraco only needs to handle 1 request per minute (multiplied by the number of API instances), so if you need to support 1000s of request per second, you’re scaling a thin, simple API layer rather than having to scale the more complex CMS infrastructure (including the database). This diagram also shows an approach to logging, by asynchronously publishing a message to a queue (Redis in this case), which can be picked up later and persisted by a different process. Does it work? Beautifully. Using Azure, I spiked the solution above (including the Redis logging framework which I’ll blog about later) in half a day. That included setting up different roles in Umbraco to demonstrate a managed workflow for publishing changes, and a couple of document types representing different resources. Is it maintainable? We have three moving parts, which are all managed resources in Azure –  an Azure Website for Umbraco which may need a couple of instances for HA (or may not, depending on how long the content can be cached), a message queue (Redis is in preview in Azure, but you can easily use Service Bus Queues if performance is less of a concern), and the Web Role for the API. Two of the components are off-the-shelf, from open source projects, and the only custom code is the API which is very simple. Does it scale? Pretty nicely. With a single Umbraco instance running as an Azure Website, and with 4x instances for my API layer (Standard sized Web Roles), I got just under 4,000 requests per second served reliably, with a Worker Role in the background saving the access logs. So we had a nice UI to publish app config changes, with a friendly Web preview and a publishing workflow, capable of supporting 14 million requests in an hour, with less than a day’s effort. Worth considering if you’re publishing long-lived resources through your API.

    Read the article

  • Making a Case For The Command Line

    - by Jesse Taber
    Originally posted on: http://geekswithblogs.net/GruffCode/archive/2013/06/30/making-a-case-for-the-command-line.aspxI have had an idea percolating in the back of my mind for over a year now that I’ve just recently started to implement. This idea relates to building out “internal tools” to ease the maintenance and on-going support of a software system. The system that I currently work on is (mostly) web-based, so we traditionally we have built these internal tools in the form of pages within the app that are only accessible by our developers and support personnel. These pages allow us to perform tasks within the system that, for one reason or another, we don’t want to let our end users perform (e.g. mass create/update/delete operations on data, flipping switches that turn paid modules of the system on or off, etc). When we try to build new tools like this we often struggle with the level of effort required to build them. Effort Required Creating a whole new page in an existing web application can be a fairly large undertaking. You need to create the page and ensure it will have a layout that is consistent with the other pages in the app. You need to decide what types of input controls need to go onto the page. You need to ensure that everything uses the same style as the rest of the site. You need to figure out what the text on the page should say. Then, when you figure out that you forgot about an input that should really be present you might have to go back and re-work the entire thing. Oh, and in addition to all of that, you still have to, you know, write the code that actually performs the task. Everything other than the code that performs the task at hand is just overhead. We don’t need a fancy date picker control in a nicely styled page for the vast majority of our internal tools. We don’t even really need a page, for that matter. We just need a way to issue a command to the application and have it, in turn, execute the code that we’ve written to accomplish a given task. All we really need is a simple console application! Plumbing Problems A former co-worker of mine, John Sonmez, always advocated the Unix philosophy for building internal tools: start with something that runs at the command line, and then build a UI on top of that if you need to. John’s idea has a lot of merit, and we tried building out some internal tools as simple Console applications. Unfortunately, this was often easier said that done. Doing a “File –> New Project” to build out a tool for a mature system can be pretty daunting because that new project is totally empty.  In our case, the web application code had a lot of of “plumbing” built in: it managed authentication and authorization, it handled database connection management for our multi-tenanted architecture, it managed all of the context that needs to follow a user around the application such as their timezone and regional/language settings. In addition, the configuration file for the web application  (a web.config in our case because this is an ASP .NET application) is large and would need to be reproduced into a similar configuration file for a Console application. While most of these problems are could be solved pretty easily with some refactoring of the codebase, building Console applications for internal tools still potentially suffers from one pretty big drawback: you’d have to execute them on a machine with network access to all of the needed resources. Obviously, our web servers can easily communicate the the database servers and can publish messages to our service bus, but the same is not true for all of our developer and support personnel workstations. We could have everyone run these tools remotely via RDP or SSH, but that’s a bit cumbersome and certainly a lot less convenient than having the tools built into the web application that is so easily accessible. Mix and Match So we need a way to build tools that are easily accessible via the web application but also don’t require the overhead of creating a user interface. This is where my idea comes into play: why not just build a command line interface into the web application? If it’s part of the web application we get all of the plumbing that comes along with that code, and we’re executing everything on the web servers which means we’ll have access to any external resources that we might need. Rather than having to incur the overhead of creating a brand new page for each tool that we want to build, we can create one new page that simply accepts a command in text form and executes it as a request on the web server. In this way, we can focus on writing the code to accomplish the task. If the tool ends up being heavily used, then (and only then) should we consider spending the time to build a better user experience around it. To be clear, I’m not trying to downplay the importance of building great user experiences into your system; we should all strive to provide the best UX possible to our end users. I’m only advocating this sort of bare-bones interface for internal consumption by the technical staff that builds and supports the software. This command line interface should be the “back end” to a highly polished and eye-pleasing public face. Implementation As I mentioned at the beginning of this post, this is an idea that I’ve had for awhile but have only recently started building out. I’ve outlined some general guidelines and design goals for this effort as follows: Text in, text out: In the interest of keeping things as simple as possible, I want this interface to be purely text-based. Users will submit commands as plain text, and the application will provide responses in plain text. Obviously this text will be “wrapped” within the context of HTTP requests and responses, but I don’t want to have to think about HTML or CSS when taking input from the user or displaying responses back to the user. Task-oriented code only: After building the initial “harness” for this interface, the only code that should need to be written to create a new internal tool should be code that is expressly needed to accomplish the task that the tool is intended to support. If we want to encourage and enable ourselves to build good tooling, we need to lower the barriers to entry as much as possible. Built-in documentation: One of the great things about most command line utilities is the ‘help’ switch that provides usage guidelines and details about the arguments that the utility accepts. Our web-based command line utility should allow us to build the documentation for these tools directly into the code of the tools themselves. I finally started trying to implement this idea when I heard about a fantastic open-source library called CLAP (Command Line Auto Parser) that lets me meet the guidelines outlined above. CLAP lets you define classes with public methods that can be easily invoked from the command line. Here’s a quick example of the code that would be needed to create a new tool to do something within your system: 1: public class CustomerTools 2: { 3: [Verb] 4: public void UpdateName(int customerId, string firstName, string lastName) 5: { 6: //invoke internal services/domain objects/hwatever to perform update 7: } 8: } This is just a regular class with a single public method (though you could have as many methods as you want). The method is decorated with the ‘Verb’ attribute that tells the CLAP library that it is a method that can be invoked from the command line. Here is how you would invoke that code: Parser.Run(args, new CustomerTools()); Note that ‘args’ is just a string[] that would normally be passed passed in from the static Main method of a Console application. Also, CLAP allows you to pass in multiple classes that define [Verb] methods so you can opt to organize the code that CLAP will invoke in any way that you like. You can invoke this code from a command line application like this: SomeExe UpdateName -customerId:123 -firstName:Jesse -lastName:Taber ‘SomeExe’ in this example just represents the name of .exe that is would be created from our Console application. CLAP then interprets the arguments passed in order to find the method that should be invoked and automatically parses out the parameters that need to be passed in. After a quick spike, I’ve found that invoking the ‘Parser’ class can be done from within the context of a web application just as easily as it can from within the ‘Main’ method entry point of a Console application. There are, however, a few sticking points that I’m working around: Splitting arguments into the ‘args’ array like the command line: When you invoke a standard .NET console application you get the arguments that were passed in by the user split into a handy array (this is the ‘args’ parameter referenced above). Generally speaking they get split by whitespace, but it’s also clever enough to handle things like ignoring whitespace in a phrase that is surrounded by quotes. We’ll need to re-create this logic within our web application so that we can give the ‘args’ value to CLAP just like a console application would. Providing a response to the user: If you were writing a console application, you might just use Console.WriteLine to provide responses to the user as to the progress and eventual outcome of the command. We can’t use Console.WriteLine within a web application, so I’ll need to find another way to provide feedback to the user. Preferably this approach would allow me to use the same handler classes from both a Console application and a web application, so some kind of strategy pattern will likely emerge from this effort. Submitting files: Often an internal tool needs to support doing some kind of operation in bulk, and the easiest way to submit the data needed to support the bulk operation is in a file. Getting the file uploaded and available to the CLAP handler classes will take a little bit of effort. Mimicking the console experience: This isn’t really a requirement so much as a “nice to have”. To start out, the command-line interface in the web application will probably be a single ‘textarea’ control with a button to submit the contents to a handler that will pass it along to CLAP to be parsed and run. I think it would be interesting to use some javascript and CSS trickery to change that page into something with more of a “shell” interface look and feel. I’ll be blogging more about this effort in the future and will include some code snippets (or maybe even a full blown example app) as I progress. I also think that I’ll probably end up either submitting some pull requests to the CLAP project or possibly forking/wrapping it into a more web-friendly package and open sourcing that.

    Read the article

  • urgent help needed to convert arabic html to pdf

    - by Mariam
    <div> <table border="1" width="500px"> <tr> <td colspan="2"> aspdotnetcodebook ????? ???????</td> </tr> <tr> <td> cell1 </td> <td> cell2 </td> </tr> <tr> <td colspan="2"> <asp:Label ID="lblLabel" runat="server" Text=""></asp:Label> <img alt="" src="logo.gif" style="width: 174px; height: 40px" /></td> </tr> <tr> <td colspan="2" dir="rtl"> <h1> <img alt="" height="168" src="http://a.cksource.com/c/1/inc/img/demo-little-red.jpg" style="margin-left: 10px; margin-right: 10px; float: left;" width="120" />????? ????? ??? ??? ?? ?? ??</h1> <p> &quot;<b>Little Red Riding Hood</b>&quot; is a famous <a href="http://en.wikipedia.org/wiki/Fairy_tale" title="Fairy tale">fairy tale</a> about a young girl&#39;s encounter with a wolf. The story has been changed considerably in its history and subject to numerous modern adaptations and readings.</p> <table align="right" border="1" cellpadding="1" cellspacing="1" style="width: 200px;"> <caption> <strong>International Names</strong></caption> <tr> <td> ????? ???????</td> <td> &nbsp;</td> </tr> <tr> <td> Italian</td> <td> <i>Cappuccetto Rosso</i></td> </tr> <tr> <td> Spanish</td> <td> <i>Caperucita Roja</i></td> </tr> </table> <p> The version most widely known today is based on the <a href="http://en.wikipedia.org/wiki/Brothers_Grimm" title="Brothers Grimm"> Brothers Grimm</a> variant. It is about a girl called Little Red Riding Hood, after the red <a href="http://en.wikipedia.org/wiki/Hood_(headgear%2529" title="Hood (headgear)">hooded</a> <a href="http://en.wikipedia.org/wiki/Cape" title="Cape">cape</a> or <a href="http://en.wikipedia.org/wiki/Cloak" title="Cloak">cloak</a> she wears. The girl walks through the woods to deliver food to her sick grandmother.</p> <p> A wolf wants to eat the girl but is afraid to do so in public. He approaches the girl, and she naïvely tells him where she is going. He suggests the girl pick some flowers, which she does. In the meantime, he goes to the grandmother&#39;s house and gains entry by pretending to be the girl. He swallows the grandmother whole, and waits for the girl, disguised as the grandmother.</p> <p> When the girl arrives, she notices he looks very strange to be her grandma. In most retellings, this eventually culminates with Little Red Riding Hood saying, &quot;My, what big teeth you have!&quot;<br /> To which the wolf replies, &quot;The better to eat you with,&quot; and swallows her whole, too.</p> <p> A <a href="http://en.wikipedia.org/wiki/Hunter" title="Hunter">hunter</a>, however, comes to the rescue and cuts the wolf open. Little Red Riding Hood and her grandmother emerge unharmed. They fill the wolf&#39;s body with heavy stones, which drown him when he falls into a well. Other versions of the story have had the grandmother shut in the closet instead of eaten, and some have Little Red Riding Hood saved by the hunter as the wolf advances on her rather than after she is eaten.</p> <p> The tale makes the clearest contrast between the safe world of the village and the dangers of the <a href="http://en.wikipedia.org/wiki/Enchanted_forest" title="Enchanted forest">forest</a>, conventional antitheses that are essentially medieval, though no written versions are as old as that.</p> </td> </tr> </table> </div> i use itextsharp to convert this content which is stored in DB to pdf file to be downloaded to the user i cant achieve this

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

< Previous Page | 1 2 3 4