chemistry - Page 3 - Developer IT

WPF: Problem with TreeView databinding

- by Am

Hi, I have a tree view defined as follows: <TreeView Grid.Row="0" Grid.Column="0" Margin="0" FlowDirection="LeftToRight" ItemTemplate="{StaticResource NavigationHeaderTemplate}" Name="TreeView2"> </TreeView> The data binding is: public class ViewTag : INotifyPropertyChanged { private string _tagName; public string TagName { get { return _tagName; } set { _tagName = value; PropertyChanged(this, new PropertyChangedEventArgs("Tag Name")); } } private ObservableCollection<ViewTag> _childTags; public ObservableCollection<ViewTag> ChildTags { get { return _childTags; } set { _childTags = value; OnPropertyChanged(new PropertyChangedEventArgs("Child Tags")); } } #region INotifyPropertyChanged Members public event PropertyChangedEventHandler PropertyChanged; public void OnPropertyChanged(PropertyChangedEventArgs e) { if (PropertyChanged != null) PropertyChanged(this, e); } #endregion public ViewTag(string tagName, ObservableCollection<ViewTag> childTags) { _tagName = tagName; _childTags = childTags; } } And my test binding is: List<ViewTag> tempTags = new List<ViewTag>(); ViewTag t1, t2, t3, t4, t5, t6; t1 = new ViewTag("Computers", null); t2 = new ViewTag("Chemistry", null); t3 = new ViewTag("Physics", null); var t123 = new ObservableCollection<ViewTag>(); t123.Add(t1); t123.Add(t2); t123.Add(t3); t4 = new ViewTag("Science", t123); var t1234 = new ObservableCollection<ViewTag>(); t1234.Add(t4); t5 = new ViewTag("All Items", t1234); t6 = new ViewTag("Untagged", null); var tall = new ObservableCollection<ViewTag>(); tall.Add(t5); tall.Add(t6); xy.Add(new ViewNavigationTree() { Header = "Tags", Image = "img/tags2.ico", Children = tall }); var rootFolders = eDataAccessLayer.RepositoryFacrory.Instance.MonitoredDirectoriesRepository.Directories.ToList(); var viewFolders = new ObservableCollection<ViewTag>(); foreach (var vf in rootFolders) { viewFolders.Add(new ViewTag(vf.FullPath, null)); } xy.Add(new ViewNavigationTree() { Header = "Folders", Image = "img/folder_16x16.png", Children = viewFolders }); xy.Add(new ViewNavigationTree() { Header = "Authors", Image = "img/user_16x16.png", Children = null }); xy.Add(new ViewNavigationTree() { Header = "Publishers", Image = "img/powerplant_32.png", Children = null }); TreeView2.ItemsSource = xy; Problem is, the tree only shows: + Tags All Items Untagged + Folders dir 1 dir 2 ... Authors Publishers The items I added under "All Items" aren't displayed. Being a WPF nub, i can't put my finger on the problem. Any help will be greatly appriciated.

Read the article

Webcast Q&A: Qualcomm Provides a Seamless Experience for Customers with Oracle WebCenter

- by kellsey.ruppel

Last Thursday we had the second webcast in our WebCenter in Action webcast series, "Qualcomm Provides a Seamless Experience for Customers with Oracle WebCenter, where customer Michael Chander from Qualcomm and Vince Casarez & Gourav Goyal from Oracle Partner Keste shared how Oracle WebCenter is powering Qualcomm’s externally facing website and providing a seamless experience for their customers. In case you missed it, here's a recap of the Q&A. Mike Chandler, Qualcomm Q: Did you run into any issues when integrating all of the different applications together?A: Definitely, our main challenges were in the area of user provisioning and security propagation, all the standard stuff you might expect when hooking up SSO for authentication and authorization. In addition, we spent several iterations getting the UI’s in sync. While everyone was given the same digital material to build too, each team interpreted and implemented it their own way. Initially as a user navigated, if you were looking for it, you could slight variations in color or font or width , stuff like that. So we had to pull all the developers responsible for the UI together and get pixel level agreement on a lot of things so we could ensure seamless transitions across applications. Q: What has been the biggest benefit your end users have seen?A: Wow, there have been several. An SSO enabled environment was huge a win for our users. The portal application that this replaced had not really been invested in by the business. With this project, we had full business participation and backing, and it really showed in some key areas like the shopping experience. For example, while ordering in the previous site, the items did not have any pictures or really usable descriptions. A tremendous amount of work was done to try and make the site more intuitive and user friendly. Site performance has also drastically improved thanks to new hardware, improved database design, and of course the fact that ADF has made great strides in runtime performance. Q: Was there any resistance internally when implementing the solution? If so, how did you overcome that?A: Within a large company, I’m sure there is always going to be competition for large projects, as there was here. Once we got through the technical analysis and settled on the technology choices, it was actually no resistance to implementing the solution. This project was fully driven by the business with the aim of long term growth. I can confidently say that the fact that this project was given the utmost importance by both the business and IT really help put down any resistance that you would typically see while implementing a new solution. Q: Given the performance, what do you estimate to be the top end capacity of the system? A:I think our top end capacity is really only limited by our hardware. I’m comfortable saying we could grow 10x on our current hardware, both in terms of transactions and users. We can easily spin up new JVM instances if needed. We already use less JVM’s than we had planned. In addition, ADF is doing a very good job with his connection pooling and application module pooling, so we see a very good ratio of users connected to the systems vs db connections, without impacting performace. Q: What's the overview or summary of feedback from the users interacting with the site?A: Feedback has been overwhelmingly positive from both the business and our customers. They’re very happy with the new SSO environment , the new LAF, and the performance of the site. Of course, it’s not all roses. No matter what, there are always going to be people that don’t like the layout or the color scheme, etc. By and large though, customers are happy and the business is happy. Q: Can you describe the impressions about the site before and after the project within Qualcomm?A: Before the project, the site worked and people were using it, but most people were not happy with it. It was slow and tended to be a bit tempermental, for example a user would perform a transaction and the system would throw and unexpected error. The user could back up and retry the steps and things would work fine, so why didn’t work the first time?. From a UI perspective, we’d hear comments like it looked like it was built by a high school student. Vince Casarez & Gourav Goyal, Keste Q: Did you run into any obstacles when implementing the solution?A: It's interesting some people call them "obstacles" on this project we just called them "dependencies". There were both technical and business related dependencies that we had to work out. Mike points out the SSO dependencies and the coordination and synchronization between the teams to have a seamless login experience and a seamless end user experience. There was also a set of dependencies on the User Acceptance testing to make sure that everyone understood the use cases for how the system would be used. With a branching into a new market and trying to match a simple user experience as many consumer sites have today, there was always a tendency for the team members to provide their suggestions on how things could be simpler. But with all the work up front on the user design and getting the business driving this set of experiences, this minimized the downstream suggestions that tend to distract a team. In this case, all the work up front allowed us to enumerate the "dependencies" and keep the distractions to a minimum. Q: Was there a lot of custom work that needed to be done for this particular solution?A: The focus for this particular solution was really on the custom processes. The interesting thing is that with the data flows and the integration with applications, there are some pre-built integrations, but realistically for the process flow, we had to build those. The framework and tooling we used made things easier so we didn’t have to implement core functionality, like transitioning from screen to screen or from flow to flow. The design feature of Task Flows really helped speed the development and keep the component infrastructure in line with the dynamic processes. Task flows and other elements like Skins are core to the infrastructure or technology stack of Oracle. This then allowed the team to center the project focus around the business flows and use cases to meet the core requirements and keep the project on time. Q: What do you think were the keys to success for rolling out WebCenter?A: The 5 main keys to success were: 1) Sponsorship from the whole organization around this project from senior executive agreement, business owners driving functionality, and IT development alignment; 2) Upfront design planning and use case definition to clearly define the project scope and requirements; 3) Focussed development and project management aligned with the top level goals and drivers; 4) User acceptance and usability testing along the way to identify potential issues and direct resolution of the issues; and 5) Constant prioritization of the issues for development to fix by the business. It also helps to have great team chemistry and really smart people working on the project. If you missed the webcast, be sure to catch the replay to see a live demonstration of WebCenter in action! Qualcomm Provides a Seamless Experience for Customers with Oracle WebCenter from Oracle WebCenter

Read the article

Get Asynchronous HttpResponse through Silverlight (F#)

- by jack2010

I am a newbie with F# and SL and playing with getting asynchronous HttpResponse through Silverlight. The following is the F# code pieces, which is tested on VS2010 and Window7 and works well, but the improvement is necessary. Any advices and discussion, especially the callback part, are welcome and great thanks. module JSONExample open System open System.IO open System.Net open System.Text open System.Web open System.Security.Authentication open System.Runtime.Serialization [<DataContract>] type Result<'TResult> = { [<field: DataMember(Name="code") >] Code:string [<field: DataMember(Name="result") >] Result:'TResult array [<field: DataMember(Name="message") >] Message:string } // The elements in the list [<DataContract>] type ChemicalElement = { [<field: DataMember(Name="name") >] Name:string [<field: DataMember(Name="boiling_point") >] BoilingPoint:string [<field: DataMember(Name="atomic_mass") >] AtomicMass:string } //http://blogs.msdn.com/b/dsyme/archive/2007/10/11/introducing-f-asynchronous-workflows.aspx //http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!194.entry type System.Net.HttpWebRequest with member x.GetResponseAsync() = Async.FromBeginEnd(x.BeginGetResponse, x.EndGetResponse) type RequestState () = let mutable request : WebRequest = null let mutable response : WebResponse = null let mutable responseStream : Stream = null member this.Request with get() = request and set v = request <- v member this.Response with get() = response and set v = response <- v member this.ResponseStream with get() = responseStream and set v = responseStream <- v let allDone = new System.Threading.ManualResetEvent(false) let getHttpWebRequest (query:string) = let query = query.Replace("'","\"") let queryUrl = sprintf "http://api.freebase.com/api/service/mqlread?query=%s" "{\"query\":"+query+"}" let request : HttpWebRequest = downcast WebRequest.Create(queryUrl) request.Method <- "GET" request.ContentType <- "application/x-www-form-urlencoded" request let GetAsynResp (request : HttpWebRequest) (callback: AsyncCallback) = let myRequestState = new RequestState() myRequestState.Request <- request let asyncResult = request.BeginGetResponse(callback, myRequestState) () // easy way to get it to run syncrnously w/ the asynch methods let GetSynResp (request : HttpWebRequest) : HttpWebResponse = let response = request.GetResponseAsync() |> Async.RunSynchronously downcast response let RespCallback (finish: Stream -> _) (asynchronousResult : IAsyncResult) = try let myRequestState : RequestState = downcast asynchronousResult.AsyncState let myWebRequest1 : WebRequest = myRequestState.Request myRequestState.Response <- myWebRequest1.EndGetResponse(asynchronousResult) let responseStream = myRequestState.Response.GetResponseStream() myRequestState.ResponseStream <- responseStream finish responseStream myRequestState.Response.Close() () with | :? WebException as e -> printfn "WebException raised!" printfn "\n%s" e.Message printfn "\n%s" (e.Status.ToString()) () | _ as e -> printfn "Exception raised!" printfn "Source : %s" e.Source printfn "Message : %s" e.Message () let printResults (stream: Stream)= let result = try use reader = new StreamReader(stream) reader.ReadToEnd(); finally () let data = Encoding.Unicode.GetBytes(result); let stream = new MemoryStream() stream.Write(data, 0, data.Length); stream.Position <- 0L let JsonSerializer = Json.DataContractJsonSerializer(typeof<Result<ChemicalElement>>) let result = JsonSerializer.ReadObject(stream) :?> Result<ChemicalElement> if result.Code<>"/api/status/ok" then raise (InvalidOperationException(result.Message)) else result.Result |> Array.iter(fun element->printfn "%A" element) let test = // Call Query (w/ generics telling it you wand an array of ChemicalElement back, the query string is wackyJSON too –I didn’t build it don’t ask me! let request = getHttpWebRequest "[{'type':'/chemistry/chemical_element','name':null,'boiling_point':null,'atomic_mass':null}]" //let response = GetSynResp request let response = GetAsynResp request (AsyncCallback (RespCallback printResults)) () ignore(test) System.Console.ReadLine() |> ignore

Read the article

add the same qtreewidgetitems into the second column

- by srinu

hello i am using the following program to display the qtreewidget. main.cpp include include "qdomsimple.h" include include "qdomsimple.h" int main(int argc, char *argv[]) { QApplication a(argc, argv); QStringList filelist; filelist.push_back("C:\department1.xml"); filelist.push_back("C:\department2.xml"); filelist.push_back("C:\department3.xml"); QDOMSimple w(filelist); w.resize(260,200); w.show(); return a.exec(); } qdomsimple.cpp include "qdomsimple.h" include include QDOMSimple::QDOMSimple(QStringList strlst,QWidget *parent) : QWidget(parent) { k=0; // DOM document QDomDocument doc("title"); QStringList headerlabels; headerlabels.push_back("Chemistry"); headerlabels.push_back("Mechanical"); headerlabels.push_back("IT"); m_tree = new QTreeWidget( this ); m_tree->setColumnCount(3); m_tree->setHeaderLabels(headerlabels); QStringList::iterator it; for(it=strlst.begin();it<strlst.end();it++) { QFile file(*it); if ( file.open( QIODevice::ReadOnly | QIODevice::Text )) { // The tree view to be filled with xml data // (m_tree is a class member variable) // Creating the DOM tree doc.setContent( &file ); file.close(); // Root of the document QDomElement root = doc.documentElement(); // Taking the first child node of the root QDomNode child = root.firstChild(); // Setting the root as the header of the tree //QTreeWidgetItem* header = new QTreeWidgetItem; //header->setText(k,root.nodeName()); //m_tree->setHeaderItem(header); // Parse until the end of document while (!child.isNull()) { //Convert a DOM node to DOM element QDomElement element = child.toElement(); //Parse only if the node is a really an element if (!element.isNull()) { //Parse the element recursively parseElement( element,0); //Go to the next sibling child = child.nextSiblingElement(); } } //m_tree->setGeometry( QApplication::desktop()->availableGeometry() ); //setGeometry( QApplication::desktop()->availableGeometry() ); } k++; } } void QDOMSimple::parseElement( QDomElement& aElement, QTreeWidgetItem *aParentItem ) { // A map of all attributes of an element QDomNamedNodeMap attrMap = aElement.attributes(); // List all attributes QStringList attrList; for ( int i = 0; i < attrMap.count(); i++ ) { // Attribute name //QString attr = attrMap.item( i ).nodeName(); //attr.append( "-" ); /* Attribute value QString attr; attr.append( attrMap.item( i ).nodeValue() );*/ //attrList.append( attr ); attrList.append(attrMap.item( i).nodeValue()); } QTreeWidgetItem* item; // Create a new view item for elements having child nodes if (aParentItem) { item = new QTreeWidgetItem(aParentItem); } // Create a new view item for elements without child nodes else { item = new QTreeWidgetItem( m_tree ); } //Set tag name and the text QString tagNText; tagNText.append( aElement.tagName() ); //tagNText.append( "------" ); //tagNText.append( aElement.text() ); item->setText(0, tagNText ); // Append attributes to the element node of the tree for ( int i = 0; i < attrList.count(); i++ ) { QTreeWidgetItem* attrItem = new QTreeWidgetItem( item ); attrItem->setText(0, attrList[i] ); } // Repeat the process recursively for child elements QDomElement child = aElement.firstChildElement(); while (!child.isNull()) { parseElement( child, item ); child = child.nextSiblingElement(); } } QDOMSimple::~QDOMSimple() { } for this i got the qtreewidget like this +file1 +file2 +file3 but actual wanted output is +file1 +file2 +file3 i don't know how to do it.Thanks in advance

Read the article

CodePlex Daily Summary for Sunday, October 06, 2013

CodePlex Daily Summary for Sunday, October 06, 2013Popular ReleasesMedia Companion: Media Companion MC3.580b: Fixed IMDB Actor names and Actor Roles, empty <actor> entries in movie nfo, and actor scraping during initial movie scrape. Revision HistoryEvent-Based Components AppBuilder: AB3.Iteration.53: Iteration 53 (Feature): Allow drag&drop of existing component (flow, step) from component list to chart. Duplicate names are automatically recognized and solved. By the color of the draged component you can see what kind of component (flow or step) is currently draged. New: AddExistingComponentFlow, PartDragDropEventHandler, ExistingStepPreparerPulse: Pulse 0.6.7.3: Pulse is now accepting donations. To donate by Bitcoin or PayPal see https://pulse.codeplex.com/wikipage?title=Donations Lots of updates in v0.6.7.3: (Feature) New option allows you to disable wallpaper changing when a full screen application is running. This way Pulse doesn't slow down/lag your videos and games :) (Fix) Some users were getting Wallbase errors when logging in. This has been fixed. (Feature) Right click a provider and you can now make a copy of it by selecting the "Dupl...MoreTerra (Terraria World Viewer): MoreTerra 1.11.1: Release 1.11.1 =========== =Bug Fixes= =========== Added more tile blocks (Clouds, crimstone) Added items (binoculars, rope, Pirahna Gun) Added ores (Lead, Tin) Chests now work, I broke them yesterday. =============== =Known Issues= =============== I am having trouble with new background walls. So you will see a red outline for crimson then a pink inside. Same with where I think the queen bee lives.VG-Ripper & PG-Ripper: PG-Ripper 1.4.19: NEW: Added Option to login as Guest NEW: Added Menu Option to delete an Forum Account NEW: Added Support for "ImageTeam.org links FIXED: Fixed Ripping of http://forum.babeunion.com ForumsSimpleExcelReportMaker: Serm 0.03: SourceCode and Sample .Net Framework 3.5 AnyCPU compile.Application Architecture Guidelines: App Architecture Guidelines 3.0.8: This document is an overview of software qualities, principles, patterns, practices, tools and libraries.fastJSON: v2.0.22: 2.0.22 - added .net 3.5 project - now compiling to 'output' directory - added signed assembly - version numbers will stay at 2.0.0.0 for drop in compatibility - file version will reflect the build number - bug fix deserializing to dictionaries instead of dataset when type is not definedResponsive SharePoint: Bootstrap 3 for SharePoint 2013 - Alpha 0.1: Bootstrap 3 for SharePoint 2013 Alpha version 0.1 NOTE - This is an alpha version, there are bound to be issues. Please help us solve them by contributing in our Discussion. Publishing - The source for Twitter Bootstrap 3.0.0 integrated into SharePoint 2013 for a site with Publishing enabled. Non-Publishing - A master page and branding assets for Twitter Bootstrap 3.0.0 integrated into SharePoint 2013 without Publishing enabled. PageLayoutSampleContent - Sample content for included page l...C++ AMP Conformance Test Suite: C++ AMP Conformance Test Suite 1.0.0: This release contains following changes from previous release: Removed the tests that were testing Microsoft specific behavior not part of open specification. The test suite now contains two folders, containing set of test cases, named ‘Tests’ and ‘TestsWithProp'. The set of tests under these two folders are identical except one difference. The set of test cases under directory ‘TestsWithProp’ makes use of ‘properties’ (which the compiler being tested should handle as mentioned in the open ...ASP.NET dhtmxGantt Class: dhtmlxGantt2.vb class: This is the latest class based on work performed. For more information read the project description and get the source files from dhtmlx.comExpressiveDataGenerators: Alpha 2: Fix serveral bugs, more testsQuickTestsFramework: 1.0.0: First release with stable API.VS Tiny Extension for TortoiseGit: 0.1c: + Icons revised + Push button disappeared when IDE loads the menu instead of toolbar. + Detected twice loading and prevented. + About box deprecated. + Next version will have major improvements. NEW: Visual Studio 2013 Support!BlackJumboDog: Ver5.9.6: 2013.09.30 Ver5.9.6 (1)SMTP???????、???????????????? (2)WinAPI??????? (3)Web???????CGI???????????????????????PayBox payment gateway provider for NB_Store: NB_Store_Gateway_01.00.02_PayBox: Paybox DNN module installMicrosoft Ajax Minifier: Microsoft Ajax Minifier 5.2: Mostly internal code tweaks. added -nosize switch to turn off the size- and gzip-calculations done after minification. removed the comments in the build targets script for the old AjaxMin build task (discussion #458831). Fixed an issue with extended Unicode characters encoded inside a string literal with adjacent \uHHHH\uHHHH sequences. Fixed an IndexOutOfRange exception when encountering a CSS identifier that's a single underscore character (_). In previous builds, the net35 and net20...AJAX Control Toolkit: September 2013 Release: AJAX Control Toolkit Release Notes - September 2013 Release (Updated) Version 7.1005September 2013 release of the AJAX Control Toolkit. AJAX Control Toolkit .NET 4.5 – AJAX Control Toolkit for .NET 4.5 and sample site (Recommended). AJAX Control Toolkit .NET 4 – AJAX Control Toolkit for .NET 4 and sample site (Recommended). AJAX Control Toolkit .NET 3.5 – AJAX Control Toolkit for .NET 3.5 and sample site (Recommended). Important UpdateThis release has been updated to fix three issues: Up...WDTVHubGen - Adds Metadata, thumbnails and subtitles to WDTV Live Hubs: WDTVHubGen.v2.1.4.apifix-alpha: WDTVHubGen.v2.1.4.apifix-alpha is for testers to figure out if we got the NEW api plugged in ok. thanksVisual Log Parser: VisualLogParser: Portable Visual Log Parser for Dotnet 4.0New ProjectsBasic4Android (B4A) Charting Framework: dhtlmxCharts, GoogleCharts etc: Basic4Android (B4A) mobile charting framework.Client Meeting Tool: This site facilitate users to create and schedule meetings for an event.FoodScan: This app focuses on implementing diet monitoring application for Malaysian overweight and obese adolescents using AR technique on Windows Phone 8.Hello Team foundation server: Try to use team foundation server and compare it with GitKDG C# Password Generator: C# password generator developed by KDG.KDG's C# Password Generator: C# password generator that uses Random to create strong passwords based on user input.Meta: Meta is the EECS 111 programming language at Northwestern University. Meta is a dynamically-typed scheme-like language built on .NET. This is its home.Monoscript: Allows using Mono and C# for scripting on Unix. Source files are automatically compiled and executed. Caching is employed to avoid recompiling unchanged files.mtdsharp: Developed by Chris Hyndman, Alec KC, Lu Huang and Merrill Huang for CS 196 at the University of Illinois.Planr.me: Planr is a time management website currently in the early development stageProject Hermes: This very project is currently closed door and under core development. The project description and other works would be published soon.Remindme for Windows Phone 8: Simple, open source Pocket client for Windows Phone 8Remindme for WinRT: Simple, open source Pocket client for WinRT and Windows 8.SQL Server Periodic Table with Molecules: This a SQL Server Database intended to be used by students and researchers for Chemistry and Physics projects. Tesseract: The Tesseract Project aims to easily display and rotate 4 Dimensional Objects in 3D Test Case Manager: A Windows Application which extends Microsoft Test Manager. Features: * one click search * test case export * better test case reader * extended edit modeTest Project for Assignment 1: This is a test ProjectTorah File: Torah File is an project that allow you to use Torah Bible and Mishneh for the computer by type of the programming languages that will be able to use the ToUSAePay nopCommerce Payment Plugin: A simple plugin for nopCommerce to use the USAePay SOAP API interface for processing credit cards.Veterinaria Dr Leo: Este es nuestro Proyecto del curso Calidad y Pruebas de Software 2013-2 Arevalo Ticlla, Susan. Chalán Malca, Elvis. Cruzado Asencio, Gustavo.Visual Studio Test Extensions: The Visual Studio Test Extensions provides extensions and tools for the Visual Studio MSTest engine. It allows to execute unit tests in a separate AppDomain.WSAAD7COM1052: Central repository for 7COM1052 - Web Scripting & Application Development (COM)wscc2013online: this is a project related to Web Application Development at the University of Hertfordshire

Read the article

Red Gate Coder interviews: Alex Davies

- by Michael Williamson

Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

C++/boost generator module, feedback/critic please

- by aaa

hello. I wrote this generator, and I think to submit to boost people. Can you give me some feedback about it it basically allows to collapse multidimensional loops to flat multi-index queue. Loop can be boost lambda expressions. Main reason for doing this is to make parallel loops easier and separate algorithm from controlling structure (my fieldwork is computational chemistry where deep loops are common) 1 #ifndef _GENERATOR_HPP_ 2 #define _GENERATOR_HPP_ 3 4 #include <boost/array.hpp> 5 #include <boost/lambda/lambda.hpp> 6 #include <boost/noncopyable.hpp> 7 8 #include <boost/mpl/bool.hpp> 9 #include <boost/mpl/int.hpp> 10 #include <boost/mpl/for_each.hpp> 11 #include <boost/mpl/range_c.hpp> 12 #include <boost/mpl/vector.hpp> 13 #include <boost/mpl/transform.hpp> 14 #include <boost/mpl/erase.hpp> 15 16 #include <boost/fusion/include/vector.hpp> 17 #include <boost/fusion/include/for_each.hpp> 18 #include <boost/fusion/include/at_c.hpp> 19 #include <boost/fusion/mpl.hpp> 20 #include <boost/fusion/include/as_vector.hpp> 21 22 #include <memory> 23 24 /** 25 for loop generator which can use lambda expressions. 26 27 For example: 28 @code 29 using namespace generator; 30 using namespace boost::lambda; 31 make_for(N, N, range(bind(std::max<int>, _1, _2), N), range(_2, _3+1)); 32 // equivalent to pseudocode 33 // for l=0,N: for k=0,N: for j=max(l,k),N: for i=k,j 34 @endcode 35 36 If range is given as upper bound only, 37 lower bound is assumed to be default constructed 38 Lambda placeholders may only reference first three indices. 39 */ 40 41 namespace generator { 42 namespace detail { 43 44 using boost::lambda::constant_type; 45 using boost::lambda::constant; 46 47 /// lambda expression identity 48 template<class E, class enable = void> 49 struct lambda { 50 typedef E type; 51 }; 52 53 /// transform/construct constant lambda expression from non-lambda 54 template<class E> 55 struct lambda<E, typename boost::disable_if< 56 boost::lambda::is_lambda_functor<E> >::type> 57 { 58 struct constant : boost::lambda::constant_type<E>::type { 59 typedef typename boost::lambda::constant_type<E>::type base_type; 60 constant() : base_type(boost::lambda::constant(E())) {} 61 constant(const E &e) : base_type(boost::lambda::constant(e)) {} 62 }; 63 typedef constant type; 64 }; 65 66 /// range functor 67 template<class L, class U> 68 struct range_ { 69 typedef boost::array<int,4> index_type; 70 range_(U upper) : bounds_(typename lambda<L>::type(), upper) {} 71 range_(L lower, U upper) : bounds_(lower, upper) {} 72 73 template< typename T, size_t N> 74 T lower(const boost::array<T,N> &index) { 75 return bound<0>(index); 76 } 77 78 template< typename T, size_t N> 79 T upper(const boost::array<T,N> &index) { 80 return bound<1>(index); 81 } 82 83 private: 84 template<bool b, typename T> 85 T bound(const boost::array<T,1> &index) { 86 return (boost::fusion::at_c(bounds_))(index[0]); 87 } 88 89 template<bool b, typename T> 90 T bound(const boost::array<T,2> &index) { 91 return (boost::fusion::at_c(bounds_))(index[0], index[1]); 92 } 93 94 template<bool b, typename T, size_t N> 95 T bound(const boost::array<T,N> &index) { 96 using boost::fusion::at_c; 97 return (at_c(bounds_))(index[0], index[1], index[2]); 98 } 99 100 boost::fusion::vector<typename lambda<L>::type, 101 typename lambda::type> bounds_; 102 }; 103 104 template<typename T, size_t N> 105 struct for_base { 106 typedef boost::array<T,N> value_type; 107 virtual ~for_base() {} 108 virtual value_type next() = 0; 109 }; 110 111 /// N-index generator 112 template<typename T, size_t N, class R, class I> 113 struct for_ : for_base<T,N> { 114 typedef typename for_base<T,N>::value_type value_type; 115 typedef R range_tuple; 116 for_(const range_tuple &r) : r_(r), state_(true) { 117 boost::fusion::for_each(r_, initialize(index)); 118 } 119 /// @return new generator 120 for_* new_() { return new for_(r_); } 121 /// @return next index value and increment 122 value_type next() { 123 value_type next; 124 using namespace boost::lambda; 125 typename value_type::iterator n = next.begin(); 126 typename value_type::iterator i = index.begin(); 127 boost::mpl::for_each(*(var(n))++ = var(i)[_1]); 128 129 state_ = advance<N>(r_, index); 130 return next; 131 } 132 /// @return false if out of bounds, true otherwise 133 operator bool() { return state_; } 134 135 private: 136 /// initialize indices 137 struct initialize { 138 value_type &index_; 139 mutable size_t i_; 140 initialize(value_type &index) : index_(index), i_(0) {} 141 template<class R_> void operator()(R_& r) const { 142 index_[i_++] = r.lower(index_); 143 } 144 }; 145 146 /// advance index[0:M) 147 template<size_t M> 148 struct advance { 149 /// stop recursion 150 struct stop { 151 stop(R r, value_type &index) {} 152 }; 153 /// advance index 154 /// @param r range tuple 155 /// @param index index array 156 advance(R &r, value_type &index) : index_(index), i_(0) { 157 namespace fusion = boost::fusion; 158 index[M-1] += 1; // increment index 159 fusion::for_each(r, *this); // update indices 160 state_ = index[M-1] >= fusion::at_c<M-1>(r).upper(index); 161 if (state_) { // out of bounds 162 typename boost::mpl::if_c<(M > 1), 163 advance<M-1>, stop>::type(r, index); 164 } 165 } 166 /// apply lower bound of range to index 167 template<typename R_> void operator()(R_& r) const { 168 if (i_ >= M) index_[i_] = r.lower(index_); 169 ++i_; 170 } 171 /// @return false if out of bounds, true otherwise 172 operator bool() { return state_; } 173 private: 174 value_type &index_; ///< index array reference 175 mutable size_t i_; ///< running index 176 bool state_; ///< out of bounds state 177 }; 178 179 value_type index; 180 range_tuple r_; 181 bool state_; 182 }; 183 184 185 /// polymorphic generator template base 186 template<typename T,size_t N> 187 struct For : boost::noncopyable { 188 typedef boost::array<T,N> value_type; 189 /// @return next index value and increment 190 value_type next() { return for_->next(); } 191 /// @return false if out of bounds, true otherwise 192 operator bool() const { return for_; } 193 protected: 194 /// reset smart pointer 195 void reset(for_base<T,N> *f) { for_.reset(f); } 196 std::auto_ptr<for_base<T,N> > for_; 197 }; 198 199 /// range [T,R) type 200 template<typename T, typename R> 201 struct range_type { 202 typedef range_<T,R> type; 203 }; 204 205 /// range identity specialization 206 template<typename T, class L, class U> 207 struct range_type<T, range_<L,U> > { 208 typedef range_<L,U> type; 209 }; 210 211 namespace fusion = boost::fusion; 212 namespace mpl = boost::mpl; 213 214 template<typename T, size_t N, class R1, class R2, class R3, class R4> 215 struct range_tuple { 216 // full range vector 217 typedef typename mpl::vector<R1,R2,R3,R4> v; 218 typedef typename mpl::end<v>::type end; 219 typedef typename mpl::advance_c<typename mpl::begin<v>::type, N>::type pos; 220 // [0:N) range vector 221 typedef typename mpl::erase<v, pos, end>::type t; 222 // transform into proper range fusion::vector 223 typedef typename fusion::result_of::as_vector< 224 typename mpl::transform<t,range_type<T, mpl::_1> >::type 225 >::type type; 226 }; 227 228 229 template<typename T, size_t N, 230 class R1, class R2, class R3, class R4, 231 class O> 232 struct for_type { 233 typedef typename range_tuple<T,N,R1,R2,R3,R4>::type range_tuple; 234 typedef for_<T, N, range_tuple, O> type; 235 }; 236 237 } // namespace detail 238 239 240 /// default index order, [0:N) 241 template<size_t N> 242 struct order { 243 typedef boost::mpl::range_c<size_t,0, N> type; 244 }; 245 246 /// N-loop generator, 0 < N <= 5 247 /// @tparam T index type 248 /// @tparam N number of indices/loops 249 /// @tparam R1,... range types 250 /// @tparam O index order 251 template<typename T, size_t N, 252 class R1, class R2 = void, class R3 = void, class R4 = void, 253 class O = typename order<N>::type> 254 struct for_ : detail::for_type<T, N, R1, R2, R3, R4, O>::type { 255 typedef typename detail::for_type<T, N, R1, R2, R3, R4, O>::type base_type; 256 typedef typename base_type::range_tuple range_tuple; 257 for_(const range_tuple &range) : base_type(range) {} 258 }; 259 260 /// loop range [L:U) 261 /// @tparam L lower bound type 262 /// @tparam U upper bound type 263 /// @return range 264 template<class L, class U> 265 detail::range_<L,U> range(L lower, U upper) { 266 return detail::range_<L,U>(lower, upper); 267 } 268 269 /// make 4-loop generator with specified index ordering 270 template<typename T, class R1, class R2, class R3, class R4, class O> 271 for_<T, 4, R1, R2, R3, R4, O> 272 make_for(R1 r1, R2 r2, R3 r3, R4 r4, const O&) { 273 typedef for_<T, 4, R1, R2, R3, R4, O> F; 274 return F(F::range_tuple(r1, r2, r3, r4)); 275 } 276 277 /// polymorphic generator template forward declaration 278 template<typename T,size_t N> 279 struct For; 280 281 /// polymorphic 4-loop generator 282 template<typename T> 283 struct For<T,4> : detail::For<T,4> { 284 /// generator with default index ordering 285 template<class R1, class R2, class R3, class R4> 286 For(R1 r1, R2 r2, R3 r3, R4 r4) { 287 this->reset(make_for<T>(r1, r2, r3, r4).new_()); 288 } 289 /// generator with specified index ordering 290 template<class R1, class R2, class R3, class R4, class O> 291 For(R1 r1, R2 r2, R3 r3, R4 r4, O o) { 292 this->reset(make_for<T>(r1, r2, r3, r4, o).new_()); 293 } 294 }; 295 296 } 297 298 299 #endif /* _GENERATOR_HPP_ */

Developer IT