Search Results

Search found 1614 results on 65 pages for 'emps (expensive managemen'.

Page 65/65 | < Previous Page | 61 62 63 64 65

The broken Promise of the Mobile Web

- by Rick Strahl

High end mobile devices have been with us now for almost 7 years and they have utterly transformed the way we access information. Mobile phones and smartphones that have access to the Internet and host smart applications are in the hands of a large percentage of the population of the world. In many places even very remote, cell phones and even smart phones are a common sight. I’ll never forget when I was in India in 2011 I was up in the Southern Indian mountains riding an elephant out of a tiny local village, with an elephant herder in front riding atop of the elephant in front of us. He was dressed in traditional garb with the loin wrap and head cloth/turban as did quite a few of the locals in this small out of the way and not so touristy village. So we’re slowly trundling along in the forest and he’s lazily using his stick to guide the elephant and… 10 minutes in he pulls out his cell phone from his sash and starts texting. In the middle of texting a huge pig jumps out from the side of the trail and he takes a picture running across our path in the jungle! So yeah, mobile technology is very pervasive and it’s reached into even very buried and unexpected parts of this world. Apps are still King Apps currently rule the roost when it comes to mobile devices and the applications that run on them. If there’s something that you need on your mobile device your first step usually is to look for an app, not use your browser. But native app development remains a pain in the butt, with the requirement to have to support 2 or 3 completely separate platforms. There are solutions that try to bridge that gap. Xamarin is on a tear at the moment, providing their cross-device toolkit to build applications using C#. While Xamarin tools are impressive – and also *very* expensive – they only address part of the development madness that is app development. There are still specific device integration isssues, dealing with the different developer programs, security and certificate setups and all that other noise that surrounds app development. There’s also PhoneGap/Cordova which provides a hybrid solution that involves creating local HTML/CSS/JavaScript based applications, and then packaging them to run in a specialized App container that can run on most mobile device platforms using a WebView interface. This allows for using of HTML technology, but it also still requires all the set up, configuration of APIs, security keys and certification and submission and deployment process just like native applications – you actually lose many of the benefits that Web based apps bring. The big selling point of Cordova is that you get to use HTML have the ability to build your UI once for all platforms and run across all of them – but the rest of the app process remains in place. Apps can be a big pain to create and manage especially when we are talking about specialized or vertical business applications that aren’t geared at the mainstream market and that don’t fit the ‘store’ model. If you’re building a small intra department application you don’t want to deal with multiple device platforms and certification etc. for various public or corporate app stores. That model is simply not a good fit both from the development and deployment perspective. Even for commercial, big ticket apps, HTML as a UI platform offers many advantages over native, from write-once run-anywhere, to remote maintenance, single point of management and failure to having full control over the application as opposed to have the app store overloads censor you. In a lot of ways Web based HTML/CSS/JavaScript applications have so much potential for building better solutions based on existing Web technologies for the very same reasons a lot of content years ago moved off the desktop to the Web. To me the Web as a mobile platform makes perfect sense, but the reality of today’s Mobile Web unfortunately looks a little different… Where’s the Love for the Mobile Web? Yet here we are in the middle of 2014, nearly 7 years after the first iPhone was released and brought the promise of rich interactive information at your fingertips, and yet we still don’t really have a solid mobile Web platform. I know what you’re thinking: “But we have lots of HTML/JavaScript/CSS features that allows us to build nice mobile interfaces”. I agree to a point – it’s actually quite possible to build nice looking, rich and capable Web UI today. We have media queries to deal with varied display sizes, CSS transforms for smooth animations and transitions, tons of CSS improvements in CSS 3 that facilitate rich layout, a host of APIs geared towards mobile device features and lately even a number of JavaScript framework choices that facilitate development of multi-screen apps in a consistent manner. Personally I’ve been working a lot with AngularJs and heavily modified Bootstrap themes to build mobile first UIs and that’s been working very well to provide highly usable and attractive UI for typical mobile business applications. From the pure UI perspective things actually look very good. Not just about the UI But it’s not just about the UI - it’s also about integration with the mobile device. When it comes to putting all those pieces together into what amounts to a consolidated platform to build mobile Web applications, I think we still have a ways to go… there are a lot of missing pieces to make it all work together and integrate with the device more smoothly, and more importantly to make it work uniformly across the majority of devices. I think there are a number of reasons for this. Slow Standards Adoption HTML standards implementations and ratification has been dreadfully slow, and browser vendors all seem to pick and choose different pieces of the technology they implement. The end result is that we have a capable UI platform that’s missing some of the infrastructure pieces to make it whole on mobile devices. There’s lots of potential but what is lacking that final 10% to build truly compelling mobile applications that can compete favorably with native applications. Some of it is the fragmentation of browsers and the slow evolution of the mobile specific HTML APIs. A host of mobile standards exist but many of the standards are in the early review stage and they have been there stuck for long periods of time and seem to move at a glacial pace. Browser vendors seem even slower to implement them, and for good reason – non-ratified standards mean that implementations may change and vendor implementations tend to be experimental and likely have to be changed later. Neither Vendors or developers are not keen on changing standards. This is the typical chicken and egg scenario, but without some forward momentum from some party we end up stuck in the mud. It seems that either the standards bodies or the vendors need to carry the torch forward and that doesn’t seem to be happening quickly enough. Mobile Device Integration just isn’t good enough Current standards are not far reaching enough to address a number of the use case scenarios necessary for many mobile applications. While not every application needs to have access to all mobile device features, almost every mobile application could benefit from some integration with other parts of the mobile device platform. Integration with GPS, phone, media, messaging, notifications, linking and contacts system are benefits that are unique to mobile applications and could be widely used, but are mostly (with the exception of GPS) inaccessible for Web based applications today. Unfortunately trying to do most of this today only with a mobile Web browser is a losing battle. Aside from PhoneGap/Cordova’s app centric model with its own custom API accessing mobile device features and the token exception of the GeoLocation API, most device integration features are not widely supported by the current crop of mobile browsers. For example there’s no usable messaging API that allows access to SMS or contacts from HTML. Even obvious components like the Media Capture API are only implemented partially by mobile devices. There are alternatives and workarounds for some of these interfaces by using browser specific code, but that’s might ugly and something that I thought we were trying to leave behind with newer browser standards. But it’s not quite working out that way. It’s utterly perplexing to me that mobile standards like Media Capture and Streams, Media Gallery Access, Responsive Images, Messaging API, Contacts Manager API have only minimal or no traction at all today. Keep in mind we’ve had mobile browsers for nearly 7 years now, and yet we still have to think about how to get access to an image from the image gallery or the camera on some devices? Heck Windows Phone IE Mobile just gained the ability to upload images recently in the Windows 8.1 Update – that’s feature that HTML has had for 20 years! These are simple concepts and common problems that should have been solved a long time ago. It’s extremely frustrating to see build 90% of a mobile Web app with relative ease and then hit a brick wall for the remaining 10%, which often can be show stoppers. The remaining 10% have to do with platform integration, browser differences and working around the limitations that browsers and ‘pinned’ applications impose on HTML applications. The maddening part is that these limitations seem arbitrary as they could easily work on all mobile platforms. For example, SMS has a URL Moniker interface that sort of works on Android, works badly with iOS (only works if the address is already in the contact list) and not at all on Windows Phone. There’s no reason this shouldn’t work universally using the same interface – after all all phones have supported SMS since before the year 2000! But, it doesn’t have to be this way Change can happen very quickly. Take the GeoLocation API for example. Geolocation has taken off at the very beginning of the mobile device era and today it works well, provides the necessary security (a big concern for many mobile APIs), and is supported by just about all major mobile and even desktop browsers today. It handles security concerns via prompts to avoid unwanted access which is a model that would work for most other device APIs in a similar fashion. One time approval and occasional re-approval if code changes or caches expire. Simple and only slightly intrusive. It all works well, even though GeoLocation actually has some physical limitations, such as representing the current location when no GPS device is present. Yet this is a solved problem, where other APIs that are conceptually much simpler to implement have failed to gain any traction at all. Technically none of these APIs should be a problem to implement, but it appears that the momentum is just not there. Inadequate Web Application Linking and Activation Another important piece of the puzzle missing is the integration of HTML based Web applications. Today HTML based applications are not first class citizens on mobile operating systems. When talking about HTML based content there’s a big difference between content and applications. Content is great for search engine discovery and plain browser usage. Content is usually accessed intermittently and permanent linking is not so critical for this type of content. But applications have different needs. Applications need to be started up quickly and must be easily switchable to support a multi-tasking user workflow. Therefore, it’s pretty crucial that mobile Web apps are integrated into the underlying mobile OS and work with the standard task management features. Unfortunately this integration is not as smooth as it should be. It starts with actually trying to find mobile Web applications, to ‘installing’ them onto a phone in an easily accessible manner in a prominent position. The experience of discovering a Mobile Web ‘App’ and making it sticky is by no means as easy or satisfying. Today the way you’d go about this is: Open the browser Search for a Web Site in the browser with your search engine of choice Hope that you find the right site Hope that you actually find a site that works for your mobile device Click on the link and run the app in a fully chrome’d browser instance (read tiny surface area) Pin the app to the home screen (with all the limitations outline above) Hope you pointed at the right URL when you pinned Even for you and me as developers, there are a few steps in there that are painful and annoying, but think about the average user. First figuring out how to search for a specific site or URL? And then pinning the app and hopefully from the right location? You’ve probably lost more than half of your audience at that point. This experience sucks. For developers too this process is painful since app developers can’t control the shortcut creation directly. This problem often gets solved by crazy coding schemes, with annoying pop-ups that try to get people to create shortcuts via fancy animations that are both annoying and add overhead to each and every application that implements this sort of thing differently. And that’s not the end of it - getting the link onto the home screen with an application icon varies quite a bit between browsers. Apple’s non-standard meta tags are prominent and they work with iOS and Android (only more recent versions), but not on Windows Phone. Windows Phone instead requires you to create an actual screen or rather a partial screen be captured for a shortcut in the tile manager. Who had that brilliant idea I wonder? Surprisingly Chrome on recent Android versions seems to actually get it right – icons use pngs, pinning is easy and pinned applications properly behave like standalone apps and retain the browser’s active page state and content. Each of the platforms has a different way to specify icons (WP doesn’t allow you to use an icon image at all), and the most widely used interface in use today is a bunch of Apple specific meta tags that other browsers choose to support. The question is: Why is there no standard implementation for installing shortcuts across mobile platforms using an official format rather than a proprietary one? Then there’s iOS and the crazy way it treats home screen linked URLs using a crazy hybrid format that is neither as capable as a Web app running in Safari nor a WebView hosted application. Moving off the Web ‘app’ link when switching to another app actually causes the browser and preview it to ‘blank out’ the Web application in the Task View (see screenshot on the right). Then, when the ‘app’ is reactivated it ends up completely restarting the browser with the original link. This is crazy behavior that you can’t easily work around. In some situations you might be able to store the application state and restore it using LocalStorage, but for many scenarios that involve complex data sources (like say Google Maps) that’s not a possibility. The only reason for this screwed up behavior I can think of is that it is deliberate to make Web apps a pain in the butt to use and forcing users trough the App Store/PhoneGap/Cordova route. App linking and management is a very basic problem – something that we essentially have solved in every desktop browser – yet on mobile devices where it arguably matters a lot more to have easy access to web content we have to jump through hoops to have even a remotely decent linking/activation experience across browsers. Where’s the Money? It’s not surprising that device home screen integration and Mobile Web support in general is in such dismal shape – the mobile OS vendors benefit financially from App store sales and have little to gain from Web based applications that bypass the App store and the cash cow that it presents. On top of that, platform specific vendor lock-in of both end users and developers who have invested in hardware, apps and consumables is something that mobile platform vendors actually aspire to. Web based interfaces that are cross-platform are the anti-thesis of that and so again it’s no surprise that the mobile Web is on a struggling path. But – that may be changing. More and more we’re seeing operations shifting to services that are subscription based or otherwise collect money for usage, and that may drive more progress into the Web direction in the end . Nothing like the almighty dollar to drive innovation forward. Do we need a Mobile Web App Store? As much as I dislike moderated experiences in today’s massive App Stores, they do at least provide one single place to look for apps for your device. I think we could really use some sort of registry, that could provide something akin to an app store for mobile Web apps, to make it easier to actually find mobile applications. This could take the form of a specialized search engine, or maybe a more formal store/registry like structure. Something like apt-get/chocolatey for Web apps. It could be curated and provide at least some feedback and reviews that might help with the integrity of applications. Coupled to that could be a native application on each platform that would allow searching and browsing of the registry and then also handle installation in the form of providing the home screen linking, plus maybe an initial security configuration that determines what features are allowed access to for the app. I’m not holding my breath. In order for this sort of thing to take off and gain widespread appeal, a lot of coordination would be required. And in order to get enough traction it would have to come from a well known entity – a mobile Web app store from a no name source is unlikely to gain high enough usage numbers to make a difference. In a way this would eliminate some of the freedom of the Web, but of course this would also be an optional search path in addition to the standard open Web search mechanisms to find and access content today. Security Security is a big deal, and one of the perceived reasons why so many IT professionals appear to be willing to go back to the walled garden of deployed apps is that Apps are perceived as safe due to the official review and curation of the App stores. Curated stores are supposed to protect you from malware, illegal and misleading content. It doesn’t always work out that way and all the major vendors have had issues with security and the review process at some time or another. Security is critical, but I also think that Web applications in general pose less of a security threat than native applications, by nature of the sandboxed browser and JavaScript environments. Web applications run externally completely and in the HTML and JavaScript sandboxes, with only a very few controlled APIs allowing access to device specific features. And as discussed earlier – security for any device interaction can be granted the same for mobile applications through a Web browser, as they can for native applications either via explicit policies loaded from the Web, or via prompting as GeoLocation does today. Security is important, but it’s certainly solvable problem for Web applications even those that need to access device hardware. Security shouldn’t be a reason for Web apps to be an equal player in mobile applications. Apps are winning, but haven’t we been here before? So now we’re finding ourselves back in an era of installed app, rather than Web based and managed apps. Only it’s even worse today than with Desktop applications, in that the apps are going through a gatekeeper that charges a toll and censors what you can and can’t do in your apps. Frankly it’s a mystery to me why anybody would buy into this model and why it’s lasted this long when we’ve already been through this process. It’s crazy… It’s really a shame that this regression is happening. We have the technology to make mobile Web apps much more prominent, but yet we’re basically held back by what seems little more than bureaucracy, partisan bickering and self interest of the major parties involved. Back in the day of the desktop it was Internet Explorer’s 98+% market shareholding back the Web from improvements for many years – now it’s the combined mobile OS market in control of the mobile browsers. If mobile Web apps were allowed to be treated the same as native apps with simple ways to install and run them consistently and persistently, that would go a long way to making mobile applications much more usable and seriously viable alternatives to native apps. But as it is mobile apps have a severe disadvantage in placement and operation. There are a few bright spots in all of this. Mozilla’s FireFoxOs is embracing the Web for it’s mobile OS by essentially building every app out of HTML and JavaScript based content. It supports both packaged and certified package modes (that can be put into the app store), and Open Web apps that are loaded and run completely off the Web and can also cache locally for offline operation using a manifest. Open Web apps are treated as full class citizens in FireFoxOS and run using the same mechanism as installed apps. Unfortunately FireFoxOs is getting a slow start with minimal device support and specifically targeting the low end market. We can hope that this approach will change and catch on with other vendors, but that’s also an uphill battle given the conflict of interest with platform lock in that it represents. Recent versions of Android also seem to be working reasonably well with mobile application integration onto the desktop and activation out of the box. Although it still uses the Apple meta tags to find icons and behavior settings, everything at least works as you would expect – icons to the desktop on pinning, WebView based full screen activation, and reliable application persistence as the browser/app is treated like a real application. Hopefully iOS will at some point provide this same level of rudimentary Web app support. What’s also interesting to me is that Microsoft hasn’t picked up on the obvious need for a solid Web App platform. Being a distant third in the mobile OS war, Microsoft certainly has nothing to lose and everything to gain by using fresh ideas and expanding into areas that the other major vendors are neglecting. But instead Microsoft is trying to beat the market leaders at their own game, fighting on their adversary’s terms instead of taking a new tack. Providing a kick ass mobile Web platform that takes the lead on some of the proposed mobile APIs would be something positive that Microsoft could do to improve its miserable position in the mobile device market. Where are we at with Mobile Web? It sure sounds like I’m really down on the Mobile Web, right? I’ve built a number of mobile apps in the last year and while overall result and response has been very positive to what we were able to accomplish in terms of UI, getting that final 10% that required device integration dialed was an absolute nightmare on every single one of them. Big compromises had to be made and some features were left out or had to be modified for some devices. In two cases we opted to go the Cordova route in order to get the integration we needed, along with the extra pain involved in that process. Unless you’re not integrating with device features and you don’t care deeply about a smooth integration with the mobile desktop, mobile Web development is fraught with frustration. So, yes I’m frustrated! But it’s not for lack of wanting the mobile Web to succeed. I am still a firm believer that we will eventually arrive a much more functional mobile Web platform that allows access to the most common device features in a sensible way. It wouldn't be difficult for device platform vendors to make Web based applications first class citizens on mobile devices. But unfortunately it looks like it will still be some time before this happens. So, what’s your experience building mobile Web apps? Are you finding similar issues? Just giving up on raw Web applications and building PhoneGap apps instead? Completely skipping the Web and going native? Leave a comment for discussion. Resources Rick Strahl on DotNet Rocks talking about Mobile Web© Rick Strahl, West Wind Technologies, 2005-2014Posted in HTML5 Mobile Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Ancillary Objects: Separate Debug ELF Files For Solaris

- by Ali Bahrami

We introduced a new object ELF object type in Solaris 11 Update 1 called the Ancillary Object. This posting describes them, using material originally written during their development, the PSARC arc case, and the Solaris Linker and Libraries Manual. ELF objects contain allocable sections, which are mapped into memory at runtime, and non-allocable sections, which are present in the file for use by debuggers and observability tools, but which are not mapped or used at runtime. Typically, all of these sections exist within a single object file. Ancillary objects allow them to instead go into a separate file. There are different reasons given for wanting such a feature. One can debate whether the added complexity is worth the benefit, and in most cases it is not. However, one important case stands out — customers with very large 32-bit objects who are not ready or able to make the transition to 64-bits. We have customers who build extremely large 32-bit objects. Historically, the debug sections in these objects have used the stabs format, which is limited, but relatively compact. In recent years, the industry has transitioned to the powerful but verbose DWARF standard. In some cases, the size of these debug sections is large enough to push the total object file size past the fundamental 4GB limit for 32-bit ELF object files. The best, and ultimately only, solution to overly large objects is to transition to 64-bits. However, consider environments where: Hundreds of users may be executing the code on large shared systems. (32-bits use less memory and bus bandwidth, and on sparc runs just as fast as 64-bit code otherwise). Complex finely tuned code, where the original authors may no longer be available. Critical production code, that was expensive to qualify and bring online, and which is otherwise serving its intended purpose without issue. Users in these risk adverse and/or high scale categories have good reasons to push 32-bits objects to the limit before moving on. Ancillary objects offer these users a longer runway. Design The design of ancillary objects is intended to be simple, both to help human understanding when examining elfdump output, and to lower the bar for debuggers such as dbx to support them. The primary and ancillary objects have the same set of section headers, with the same names, in the same order (i.e. each section has the same index in both files). A single added section of type SHT_SUNW_ANCILLARY is added to both objects, containing information that allows a debugger to identify and validate both files relative to each other. Given one of these files, the ancillary section allows you to identify the other. Allocable sections go in the primary object, and non-allocable ones go into the ancillary object. A small set of non-allocable objects, notably the symbol table, are copied into both objects. As noted above, most sections are only written to one of the two objects, but both objects have the same section header array. The section header in the file that does not contain the section data is tagged with the SHF_SUNW_ABSENT section header flag to indicate its placeholder status. Compiler writers and others who produce objects can set the SUNW_SHF_PRIMARY section header flag to mark non-allocable sections that should go to the primary object rather than the ancillary. If you don't request an ancillary object, the Solaris ELF format is unchanged. Users who don't use ancillary objects do not pay for the feature. This is important, because they exist to serve a small subset of our users, and must not complicate the common case. If you do request an ancillary object, the runtime behavior of the primary object will be the same as that of a normal object. There is no added runtime cost. The primary and ancillary object together represent a logical single object. This is facilitated by the use of a single set of section headers. One can easily imagine a tool that can merge a primary and ancillary object into a single file, or the reverse. (Note that although this is an interesting intellectual exercise, we don't actually supply such a tool because there's little practical benefit above and beyond using ld to create the files). Among the benefits of this approach are: There is no need for per-file symbol tables to reflect the contents of each file. The same symbol table that would be produced for a standard object can be used. The section contents are identical in either case — there is no need to alter data to accommodate multiple files. It is very easy for a debugger to adapt to these new files, and the processing involved can be encapsulated in input/output routines. Most of the existing debugger implementation applies without modification. The limit of a 4GB 32-bit output object is now raised to 4GB of code, and 4GB of debug data. There is also the future possibility (not currently supported) to support multiple ancillary objects, each of which could contain up to 4GB of additional debug data. It must be noted however that the 32-bit DWARF debug format is itself inherently 32-bit limited, as it uses 32-bit offsets between debug sections, so the ability to employ multiple ancillary object files may not turn out to be useful. Using Ancillary Objects (From the Solaris Linker and Libraries Guide) By default, objects contain both allocable and non-allocable sections. Allocable sections are the sections that contain executable code and the data needed by that code at runtime. Non-allocable sections contain supplemental information that is not required to execute an object at runtime. These sections support the operation of debuggers and other observability tools. The non-allocable sections in an object are not loaded into memory at runtime by the operating system, and so, they have no impact on memory use or other aspects of runtime performance no matter their size. For convenience, both allocable and non-allocable sections are normally maintained in the same file. However, there are situations in which it can be useful to separate these sections. To reduce the size of objects in order to improve the speed at which they can be copied across wide area networks. To support fine grained debugging of highly optimized code requires considerable debug data. In modern systems, the debugging data can easily be larger than the code it describes. The size of a 32-bit object is limited to 4 Gbytes. In very large 32-bit objects, the debug data can cause this limit to be exceeded and prevent the creation of the object. To limit the exposure of internal implementation details. Traditionally, objects have been stripped of non-allocable sections in order to address these issues. Stripping is effective, but destroys data that might be needed later. The Solaris link-editor can instead write non-allocable sections to an ancillary object. This feature is enabled with the -z ancillary command line option. $ ld ... -z ancillary[=outfile] ...By default, the ancillary file is given the same name as the primary output object, with a .anc file extension. However, a different name can be provided by providing an outfile value to the -z ancillary option. When -z ancillary is specified, the link-editor performs the following actions. All allocable sections are written to the primary object. In addition, all non-allocable sections containing one or more input sections that have the SHF_SUNW_PRIMARY section header flag set are written to the primary object. All remaining non-allocable sections are written to the ancillary object. The following non-allocable sections are written to both the primary object and ancillary object. .shstrtab The section name string table. .symtab The full non-dynamic symbol table. .symtab_shndx The symbol table extended index section associated with .symtab. .strtab The non-dynamic string table associated with .symtab. .SUNW_ancillary Contains the information required to identify the primary and ancillary objects, and to identify the object being examined. The primary object and all ancillary objects contain the same array of sections headers. Each section has the same section index in every file. Although the primary and ancillary objects all define the same section headers, the data for most sections will be written to a single file as described above. If the data for a section is not present in a given file, the SHF_SUNW_ABSENT section header flag is set, and the sh_size field is 0. This organization makes it possible to acquire a full list of section headers, a complete symbol table, and a complete list of the primary and ancillary objects from either of the primary or ancillary objects. The following example illustrates the underlying implementation of ancillary objects. An ancillary object is created by adding the -z ancillary command line option to an otherwise normal compilation. The file utility shows that the result is an executable named a.out, and an associated ancillary object named a.out.anc. $ cat hello.c #include <stdio.h> int main(int argc, char **argv) { (void) printf("hello, world\n"); return (0); } $ cc -g -zancillary hello.c $ file a.out a.out.anc a.out: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, ancillary object a.out.anc a.out.anc: ELF 32-bit LSB ancillary 80386 Version 1, primary object a.out $ ./a.out hello worldThe resulting primary object is an ordinary executable that can be executed in the usual manner. It is no different at runtime than an executable built without the use of ancillary objects, and then stripped of non-allocable content using the strip or mcs commands. As previously described, the primary object and ancillary objects contain the same section headers. To see how this works, it is helpful to use the elfdump utility to display these section headers and compare them. The following table shows the section header information for a selection of headers from the previous link-edit example. Index Section Name Type Primary Flags Ancillary Flags Primary Size Ancillary Size 13 .text PROGBITS ALLOC EXECINSTR ALLOC EXECINSTR SUNW_ABSENT 0x131 0 20 .data PROGBITS WRITE ALLOC WRITE ALLOC SUNW_ABSENT 0x4c 0 21 .symtab SYMTAB 0 0 0x450 0x450 22 .strtab STRTAB STRINGS STRINGS 0x1ad 0x1ad 24 .debug_info PROGBITS SUNW_ABSENT 0 0 0x1a7 28 .shstrtab STRTAB STRINGS STRINGS 0x118 0x118 29 .SUNW_ancillary SUNW_ancillary 0 0 0x30 0x30 The data for most sections is only present in one of the two files, and absent from the other file. The SHF_SUNW_ABSENT section header flag is set when the data is absent. The data for allocable sections needed at runtime are found in the primary object. The data for non-allocable sections used for debugging but not needed at runtime are placed in the ancillary file. A small set of non-allocable sections are fully present in both files. These are the .SUNW_ancillary section used to relate the primary and ancillary objects together, the section name string table .shstrtab, as well as the symbol table.symtab, and its associated string table .strtab. It is possible to strip the symbol table from the primary object. A debugger that encounters an object without a symbol table can use the .SUNW_ancillary section to locate the ancillary object, and access the symbol contained within. The primary object, and all associated ancillary objects, contain a .SUNW_ancillary section that allows all the objects to be identified and related together. $ elfdump -T SUNW_ancillary a.out a.out.anc a.out: Ancillary Section: .SUNW_ancillary index tag value [0] ANC_SUNW_CHECKSUM 0x8724 [1] ANC_SUNW_MEMBER 0x1 a.out [2] ANC_SUNW_CHECKSUM 0x8724 [3] ANC_SUNW_MEMBER 0x1a3 a.out.anc [4] ANC_SUNW_CHECKSUM 0xfbe2 [5] ANC_SUNW_NULL 0 a.out.anc: Ancillary Section: .SUNW_ancillary index tag value [0] ANC_SUNW_CHECKSUM 0xfbe2 [1] ANC_SUNW_MEMBER 0x1 a.out [2] ANC_SUNW_CHECKSUM 0x8724 [3] ANC_SUNW_MEMBER 0x1a3 a.out.anc [4] ANC_SUNW_CHECKSUM 0xfbe2 [5] ANC_SUNW_NULL 0 The ancillary sections for both objects contain the same number of elements, and are identical except for the first element. Each object, starting with the primary object, is introduced with a MEMBER element that gives the file name, followed by a CHECKSUM that identifies the object. In this example, the primary object is a.out, and has a checksum of 0x8724. The ancillary object is a.out.anc, and has a checksum of 0xfbe2. The first element in a .SUNW_ancillary section, preceding the MEMBER element for the primary object, is always a CHECKSUM element, containing the checksum for the file being examined. The presence of a .SUNW_ancillary section in an object indicates that the object has associated ancillary objects. The names of the primary and all associated ancillary objects can be obtained from the ancillary section from any one of the files. It is possible to determine which file is being examined from the larger set of files by comparing the first checksum value to the checksum of each member that follows. Debugger Access and Use of Ancillary Objects Debuggers and other observability tools must merge the information found in the primary and ancillary object files in order to build a complete view of the object. This is equivalent to processing the information from a single file. This merging is simplified by the primary object and ancillary objects containing the same section headers, and a single symbol table. The following steps can be used by a debugger to assemble the information contained in these files. Starting with the primary object, or any of the ancillary objects, locate the .SUNW_ancillary section. The presence of this section identifies the object as part of an ancillary group, contains information that can be used to obtain a complete list of the files and determine which of those files is the one currently being examined. Create a section header array in memory, using the section header array from the object being examined as an initial template. Open and read each file identified by the .SUNW_ancillary section in turn. For each file, fill in the in-memory section header array with the information for each section that does not have the SHF_SUNW_ABSENT flag set. The result will be a complete in-memory copy of the section headers with pointers to the data for all sections. Once this information has been acquired, the debugger can proceed as it would in the single file case, to access and control the running program. Note - The ELF definition of ancillary objects provides for a single primary object, and an arbitrary number of ancillary objects. At this time, the Oracle Solaris link-editor only produces a single ancillary object containing all non-allocable sections. This may change in the future. Debuggers and other observability tools should be written to handle the general case of multiple ancillary objects. ELF Implementation Details (From the Solaris Linker and Libraries Guide) To implement ancillary objects, it was necessary to extend the ELF format to add a new object type (ET_SUNW_ANCILLARY), a new section type (SHT_SUNW_ANCILLARY), and 2 new section header flags (SHF_SUNW_ABSENT, SHF_SUNW_PRIMARY). In this section, I will detail these changes, in the form of diffs to the Solaris Linker and Libraries manual. Part IV ELF Application Binary Interface Chapter 13: Object File Format Object File Format Edit Note: This existing section at the beginning of the chapter describes the ELF header. There's a table of object file types, which now includes the new ET_SUNW_ANCILLARY type. e_type Identifies the object file type, as listed in the following table. NameValueMeaning ET_NONE0No file type ET_REL1Relocatable file ET_EXEC2Executable file ET_DYN3Shared object file ET_CORE4Core file ET_LOSUNW0xfefeStart operating system specific range ET_SUNW_ANCILLARY0xfefeAncillary object file ET_HISUNW0xfefdEnd operating system specific range ET_LOPROC0xff00Start processor-specific range ET_HIPROC0xffffEnd processor-specific range Sections Edit Note: This overview section defines the section header structure, and provides a high level description of known sections. It was updated to define the new SHF_SUNW_ABSENT and SHF_SUNW_PRIMARY flags and the new SHT_SUNW_ANCILLARY section. ... sh_type Categorizes the section's contents and semantics. Section types and their descriptions are listed in Table 13-5. sh_flags Sections support 1-bit flags that describe miscellaneous attributes. Flag definitions are listed in Table 13-8. ... Table 13-5 ELF Section Types, sh_type NameValue . . . SHT_LOSUNW0x6fffffee SHT_SUNW_ancillary0x6fffffee . . . ... SHT_LOSUNW - SHT_HISUNW Values in this inclusive range are reserved for Oracle Solaris OS semantics. SHT_SUNW_ANCILLARY Present when a given object is part of a group of ancillary objects. Contains information required to identify all the files that make up the group. See Ancillary Section. ... Table 13-8 ELF Section Attribute Flags NameValue . . . SHF_MASKOS0x0ff00000 SHF_SUNW_NODISCARD0x00100000 SHF_SUNW_ABSENT0x00200000 SHF_SUNW_PRIMARY0x00400000 SHF_MASKPROC0xf0000000 . . . ... SHF_SUNW_ABSENT Indicates that the data for this section is not present in this file. When ancillary objects are created, the primary object and any ancillary objects, will all have the same section header array, to facilitate merging them to form a complete view of the object, and to allow them to use the same symbol tables. Each file contains a subset of the section data. The data for allocable sections is written to the primary object while the data for non-allocable sections is written to an ancillary file. The SHF_SUNW_ABSENT flag is used to indicate that the data for the section is not present in the object being examined. When the SHF_SUNW_ABSENT flag is set, the sh_size field of the section header must be 0. An application encountering an SHF_SUNW_ABSENT section can choose to ignore the section, or to search for the section data within one of the related ancillary files. SHF_SUNW_PRIMARY The default behavior when ancillary objects are created is to write all allocable sections to the primary object and all non-allocable sections to the ancillary objects. The SHF_SUNW_PRIMARY flag overrides this behavior. Any output section containing one more input section with the SHF_SUNW_PRIMARY flag set is written to the primary object without regard for its allocable status. ... Two members in the section header, sh_link, and sh_info, hold special information, depending on section type. Table 13-9 ELF sh_link and sh_info Interpretation sh_typesh_linksh_info . . . SHT_SUNW_ANCILLARY The section header index of the associated string table. 0 . . . Special Sections Edit Note: This section describes the sections used in Solaris ELF objects, using the types defined in the previous description of section types. It was updated to define the new .SUNW_ancillary (SHT_SUNW_ANCILLARY) section. Various sections hold program and control information. Sections in the following table are used by the system and have the indicated types and attributes. Table 13-10 ELF Special Sections NameTypeAttribute . . . .SUNW_ancillarySHT_SUNW_ancillaryNone . . . ... .SUNW_ancillary Present when a given object is part of a group of ancillary objects. Contains information required to identify all the files that make up the group. See Ancillary Section for details. ... Ancillary Section Edit Note: This new section provides the format reference describing the layout of a .SUNW_ancillary section and the meaning of the various tags. Note that these sections use the same tag/value concept used for dynamic and capabilities sections, and will be familiar to anyone used to working with ELF. In addition to the primary output object, the Solaris link-editor can produce one or more ancillary objects. Ancillary objects contain non-allocable sections that would normally be written to the primary object. When ancillary objects are produced, the primary object and all of the associated ancillary objects contain a SHT_SUNW_ancillary section, containing information that identifies these related objects. Given any one object from such a group, the ancillary section provides the information needed to identify and interpret the others. This section contains an array of the following structures. See sys/elf.h. typedef struct { Elf32_Word a_tag; union { Elf32_Word a_val; Elf32_Addr a_ptr; } a_un; } Elf32_Ancillary; typedef struct { Elf64_Xword a_tag; union { Elf64_Xword a_val; Elf64_Addr a_ptr; } a_un; } Elf64_Ancillary; For each object with this type, a_tag controls the interpretation of a_un. a_val These objects represent integer values with various interpretations. a_ptr These objects represent file offsets or addresses. The following ancillary tags exist. Table 13-NEW1 ELF Ancillary Array Tags NameValuea_un ANC_SUNW_NULL0Ignored ANC_SUNW_CHECKSUM1a_val ANC_SUNW_MEMBER2a_ptr ANC_SUNW_NULL Marks the end of the ancillary section. ANC_SUNW_CHECKSUM Provides the checksum for a file in the c_val element. When ANC_SUNW_CHECKSUM precedes the first instance of ANC_SUNW_MEMBER, it provides the checksum for the object from which the ancillary section is being read. When it follows an ANC_SUNW_MEMBER tag, it provides the checksum for that member. ANC_SUNW_MEMBER Specifies an object name. The a_ptr element contains the string table offset of a null-terminated string, that provides the file name. An ancillary section must always contain an ANC_SUNW_CHECKSUM before the first instance of ANC_SUNW_MEMBER, identifying the current object. Following that, there should be an ANC_SUNW_MEMBER for each object that makes up the complete set of objects. Each ANC_SUNW_MEMBER should be followed by an ANC_SUNW_CHECKSUM for that object. A typical ancillary section will therefore be structured as: TagMeaning ANC_SUNW_CHECKSUMChecksum of this object ANC_SUNW_MEMBERName of object #1 ANC_SUNW_CHECKSUMChecksum for object #1 . . . ANC_SUNW_MEMBERName of object N ANC_SUNW_CHECKSUMChecksum for object N ANC_SUNW_NULL An object can therefore identify itself by comparing the initial ANC_SUNW_CHECKSUM to each of the ones that follow, until it finds a match. Related Other Work The GNU developers have also encountered the need/desire to support separate debug information files, and use the solution detailed at http://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html. At the current time, the separate debug file is constructed by building the standard object first, and then copying the debug data out of it in a separate post processing step, Hence, it is limited to a total of 4GB of code and debug data, just as a single object file would be. They are aware of this, and I have seen online comments indicating that they may add direct support for generating these separate files to their link-editor. It is worth noting that the GNU objcopy utility is available on Solaris, and that the Studio dbx debugger is able to use these GNU style separate debug files even on Solaris. Although this is interesting in terms giving Linux users a familiar environment on Solaris, the 4GB limit means it is not an answer to the problem of very large 32-bit objects. We have also encountered issues with objcopy not understanding Solaris-specific ELF sections, when using this approach. The GNU community also has a current effort to adapt their DWARF debug sections in order to move them to separate files before passing the relocatable objects to the linker. The details of Project Fission can be found at http://gcc.gnu.org/wiki/DebugFission. The goal of this project appears to be to reduce the amount of data seen by the link-editor. The primary effort revolves around moving DWARF data to separate .dwo files so that the link-editor never encounters them. The details of modifying the DWARF data to be usable in this form are involved — please see the above URL for details.

Read the article
Scaling-out Your Services by Message Bus based WCF Transport Extension – Part 1 – Background

- by Shaun

Cloud computing gives us more flexibility on the computing resource, we can provision and deploy an application or service with multiple instances over multiple machines. With the increment of the service instances, how to balance the incoming message and workload would become a new challenge. Currently there are two approaches we can use to pass the incoming messages to the service instances, I would like call them dispatcher mode and pulling mode. Dispatcher Mode The dispatcher mode introduces a role which takes the responsible to find the best service instance to process the request. The image below describes the sharp of this mode. There are four clients communicate with the service through the underlying transportation. For example, if we are using HTTP the clients might be connecting to the same service URL. On the server side there’s a dispatcher listening on this URL and try to retrieve all messages. When a message came in, the dispatcher will find a proper service instance to process it. There are three mechanism to find the instance: Round-robin: Dispatcher will always send the message to the next instance. For example, if the dispatcher sent the message to instance 2, then the next message will be sent to instance 3, regardless if instance 3 is busy or not at that moment. Random: Dispatcher will find a service instance randomly, and same as the round-robin mode it regardless if the instance is busy or not. Sticky: Dispatcher will send all related messages to the same service instance. This approach always being used if the service methods are state-ful or session-ful. But as you can see, all of these approaches are not really load balanced. The clients will send messages at any time, and each message might take different process duration on the server side. This means in some cases, some of the service instances are very busy while others are almost idle. For example, if we were using round-robin mode, it could be happened that most of the simple task messages were passed to instance 1 while the complex ones were sent to instance 3, even though instance 1 should be idle. This brings some problem in our architecture. The first one is that, the response to the clients might be longer than it should be. As it’s shown in the figure above, message 6 and 9 can be processed by instance 1 or instance 2, but in reality they were dispatched to the busy instance 3 since the dispatcher and round-robin mode. Secondly, if there are many requests came from the clients in a very short period, service instances might be filled by tons of pending tasks and some instances might be crashed. Third, if we are using some cloud platform to host our service instances, for example the Windows Azure, the computing resource is billed by service deployment period instead of the actual CPU usage. This means if any service instance is idle it is wasting our money! Last one, the dispatcher would be the bottleneck of our system since all incoming messages must be routed by the dispatcher. If we are using HTTP or TCP as the transport, the dispatcher would be a network load balance. If we wants more capacity, we have to scale-up, or buy a hardware load balance which is very expensive, as well as scaling-out the service instances. Pulling Mode Pulling mode doesn’t need a dispatcher to route the messages. All service instances are listening to the same transport and try to retrieve the next proper message to process if they are idle. Since there is no dispatcher in pulling mode, it requires some features on the transportation. The transportation must support multiple client connection and server listening. HTTP and TCP doesn’t allow multiple clients are listening on the same address and port, so it cannot be used in pulling mode directly. All messages in the transportation must be FIFO, which means the old message must be received before the new one. Message selection would be a plus on the transportation. This means both service and client can specify some selection criteria and just receive some specified kinds of messages. This feature is not mandatory but would be very useful when implementing the request reply and duplex WCF channel modes. Otherwise we must have a memory dictionary to store the reply messages. I will explain more about this in the following articles. Message bus, or the message queue would be best candidate as the transportation when using the pulling mode. First, it allows multiple application to listen on the same queue, and it’s FIFO. Some of the message bus also support the message selection, such as TIBCO EMS, RabbitMQ. Some others provide in memory dictionary which can store the reply messages, for example the Redis. The principle of pulling mode is to let the service instances self-managed. This means each instance will try to retrieve the next pending incoming message if they finished the current task. This gives us more benefit and can solve the problems we met with in the dispatcher mode. The incoming message will be received to the best instance to process, which means this will be very balanced. And it will not happen that some instances are busy while other are idle, since the idle one will retrieve more tasks to make them busy. Since all instances are try their best to be busy we can use less instances than dispatcher mode, which more cost effective. Since there’s no dispatcher in the system, there is no bottleneck. When we introduced more service instances, in dispatcher mode we have to change something to let the dispatcher know the new instances. But in pulling mode since all service instance are self-managed, there no extra change at all. If there are many incoming messages, since the message bus can queue them in the transportation, service instances would not be crashed. All above are the benefits using the pulling mode, but it will introduce some problem as well. The process tracking and debugging become more difficult. Since the service instances are self-managed, we cannot know which instance will process the message. So we need more information to support debug and track. Real-time response may not be supported. All service instances will process the next message after the current one has done, if we have some real-time request this may not be a good solution. Compare with the Pros and Cons above, the pulling mode would a better solution for the distributed system architecture. Because what we need more is the scalability, cost-effect and the self-management. WCF and WCF Transport Extensibility Windows Communication Foundation (WCF) is a framework for building service-oriented applications. In the .NET world WCF is the best way to implement the service. In this series I’m going to demonstrate how to implement the pulling mode on top of a message bus by extending the WCF. I don’t want to deep into every related field in WCF but will highlight its transport extensibility. When we implemented an RPC foundation there are many aspects we need to deal with, for example the message encoding, encryption, authentication and message sending and receiving. In WCF, each aspect is represented by a channel. A message will be passed through all necessary channels and finally send to the underlying transportation. And on the other side the message will be received from the transport and though the same channels until the business logic. This mode is called “Channel Stack” in WCF, and the last channel in the channel stack must always be a transport channel, which takes the responsible for sending and receiving the messages. As we are going to implement the WCF over message bus and implement the pulling mode scaling-out solution, we need to create our own transport channel so that the client and service can exchange messages over our bus. Before we deep into the transport channel, let’s have a look on the message exchange patterns that WCF defines. Message exchange pattern (MEP) defines how client and service exchange the messages over the transportation. WCF defines 3 basic MEPs which are datagram, Request-Reply and Duplex. Datagram: Also known as one-way, or fire-forgot mode. The message sent from the client to the service, and no need any reply from the service. The client doesn’t care about the message result at all. Request-Reply: Very common used pattern. The client send the request message to the service and wait until the reply message comes from the service. Duplex: The client sent message to the service, when the service processing the message it can callback to the client. When callback the service would be like a client while the client would be like a service. In WCF, each MEP represent some channels associated. MEP Channels Datagram IInputChannel, IOutputChannel Request-Reply IRequestChannel, IReplyChannel Duplex IDuplexChannel And the channels are created by ChannelListener on the server side, and ChannelFactory on the client side. The ChannelListener and ChannelFactory are created by the TransportBindingElement. The TransportBindingElement is created by the Binding, which can be defined as a new binding or from a custom binding. For more information about the transport channel mode, please refer to the MSDN document. The figure below shows the transport channel objects when using the request-reply MEP. And this is the datagram MEP. And this is the duplex MEP. After investigated the WCF transport architecture, channel mode and MEP, we finally identified what we should do to extend our message bus based transport layer. They are: Binding: (Optional) Defines the channel elements in the channel stack and added our transport binding element at the bottom of the stack. But we can use the build-in CustomBinding as well. TransportBindingElement: Defines which MEP is supported in our transport and create the related ChannelListener and ChannelFactory. This also defines the scheme of the endpoint if using this transport. ChannelListener: Create the server side channel based on the MEP it’s. We can have one ChannelListener to create channels for all supported MEPs, or we can have ChannelListener for each MEP. In this series I will use the second approach. ChannelFactory: Create the client side channel based on the MEP it’s. We can have one ChannelFactory to create channels for all supported MEPs, or we can have ChannelFactory for each MEP. In this series I will use the second approach. Channels: Based on the MEPs we want to support, we need to implement the channels accordingly. For example, if we want our transport support Request-Reply mode we should implement IRequestChannel and IReplyChannel. In this series I will implement all 3 MEPs listed above one by one. Scaffold: In order to make our transport extension works we also need to implement some scaffold stuff. For example we need some classes to send and receive message though out message bus. We also need some codes to read and write the WCF message, etc.. These are not necessary but would be very useful in our example. Message Bus There is only one thing remained before we can begin to implement our scaling-out support WCF transport, which is the message bus. As I mentioned above, the message bus must have some features to fulfill all the WCF MEPs. In my company we will be using TIBCO EMS, which is an enterprise message bus product. And I have said before we can use any message bus production if it’s satisfied with our requests. Here I would like to introduce an interface to separate the message bus from the WCF. This allows us to implement the bus operations by any kinds bus we are going to use. The interface would be like this. 1: public interface IBus : IDisposable 2: { 3: string SendRequest(string message, bool fromClient, string from, string to = null); 4: 5: void SendReply(string message, bool fromClient, string replyTo); 6: 7: BusMessage Receive(bool fromClient, string replyTo); 8: } There are only three methods for the bus interface. Let me explain one by one. The SendRequest method takes the responsible for sending the request message into the bus. The parameters description are: message: The WCF message content. fromClient: Indicates if this message was came from the client. from: The channel ID that this message was sent from. The channel ID will be generated when any kinds of channel was created, which will be explained in the following articles. to: The channel ID that this message should be received. In Request-Reply and Duplex MEP this is necessary since the reply message must be received by the channel which sent the related request message. The SendReply method takes the responsible for sending the reply message. It’s very similar as the previous one but no “from” parameter. This is because it’s no need to reply a reply message again in any MEPs. The Receive method takes the responsible for waiting for a incoming message, includes the request message and specified reply message. It returned a BusMessage object, which contains some information about the channel information. The code of the BusMessage class is 1: public class BusMessage 2: { 3: public string MessageID { get; private set; } 4: public string From { get; private set; } 5: public string ReplyTo { get; private set; } 6: public string Content { get; private set; } 7: 8: public BusMessage(string messageId, string fromChannelId, string replyToChannelId, string content) 9: { 10: MessageID = messageId; 11: From = fromChannelId; 12: ReplyTo = replyToChannelId; 13: Content = content; 14: } 15: } Now let’s implement a message bus based on the IBus interface. Since I don’t want you to buy and install the TIBCO EMS or any other message bus products, I will implement an in process memory bus. This bus is only for test and sample purpose. It can only be used if the service and client are in the same process. Very straightforward. 1: public class InProcMessageBus : IBus 2: { 3: private readonly ConcurrentDictionary<Guid, InProcMessageEntity> _queue; 4: private readonly object _lock; 5: 6: public InProcMessageBus() 7: { 8: _queue = new ConcurrentDictionary<Guid, InProcMessageEntity>(); 9: _lock = new object(); 10: } 11: 12: public string SendRequest(string message, bool fromClient, string from, string to = null) 13: { 14: var entity = new InProcMessageEntity(message, fromClient, from, to); 15: _queue.TryAdd(entity.ID, entity); 16: return entity.ID.ToString(); 17: } 18: 19: public void SendReply(string message, bool fromClient, string replyTo) 20: { 21: var entity = new InProcMessageEntity(message, fromClient, null, replyTo); 22: _queue.TryAdd(entity.ID, entity); 23: } 24: 25: public BusMessage Receive(bool fromClient, string replyTo) 26: { 27: InProcMessageEntity e = null; 28: while (true) 29: { 30: lock (_lock) 31: { 32: var entity = _queue 33: .Where(kvp => kvp.Value.FromClient == fromClient && (kvp.Value.To == replyTo || string.IsNullOrWhiteSpace(kvp.Value.To))) 34: .FirstOrDefault(); 35: if (entity.Key != Guid.Empty && entity.Value != null) 36: { 37: _queue.TryRemove(entity.Key, out e); 38: } 39: } 40: if (e == null) 41: { 42: Thread.Sleep(100); 43: } 44: else 45: { 46: return new BusMessage(e.ID.ToString(), e.From, e.To, e.Content); 47: } 48: } 49: } 50: 51: public void Dispose() 52: { 53: } 54: } The InProcMessageBus stores the messages in the objects of InProcMessageEntity, which can take some extra information beside the WCF message itself. 1: public class InProcMessageEntity 2: { 3: public Guid ID { get; set; } 4: public string Content { get; set; } 5: public bool FromClient { get; set; } 6: public string From { get; set; } 7: public string To { get; set; } 8: 9: public InProcMessageEntity() 10: : this(string.Empty, false, string.Empty, string.Empty) 11: { 12: } 13: 14: public InProcMessageEntity(string content, bool fromClient, string from, string to) 15: { 16: ID = Guid.NewGuid(); 17: Content = content; 18: FromClient = fromClient; 19: From = from; 20: To = to; 21: } 22: } Summary OK, now I have all necessary stuff ready. The next step would be implementing our WCF message bus transport extension. In this post I described two scaling-out approaches on the service side especially if we are using the cloud platform: dispatcher mode and pulling mode. And I compared the Pros and Cons of them. Then I introduced the WCF channel stack, channel mode and the transport extension part, and identified what we should do to create our own WCF transport extension, to let our WCF services using pulling mode based on a message bus. And finally I provided some classes that need to be used in the future posts that working against an in process memory message bus, for the demonstration purpose only. In the next post I will begin to implement the transport extension step by step. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article
Using the ASP.NET Cache to cache data in a Model or Business Object layer, without a dependency on System.Web in the layer - Part One.

- by Rhames

ASP.NET applications can make use of the System.Web.Caching.Cache object to cache data and prevent repeated expensive calls to a database or other store. However, ideally an application should make use of caching at the point where data is retrieved from the database, which typically is inside a Business Objects or Model layer. One of the key features of using a UI pattern such as Model-View-Presenter (MVP) or Model-View-Controller (MVC) is that the Model and Presenter (or Controller) layers are developed without any knowledge of the UI layer. Introducing a dependency on System.Web into the Model layer would break this independence of the Model from the View. This article gives a solution to this problem, using dependency injection to inject the caching implementation into the Model layer at runtime. This allows caching to be used within the Model layer, without any knowledge of the actual caching mechanism that will be used. Create a sample application to use the caching solution Create a test SQL Server database This solution uses a SQL Server database with the same Sales data used in my previous post on calculating running totals. The advantage of using this data is that it gives nice slow queries that will exaggerate the effect of using caching! To create the data, first create a new SQL database called CacheSample. Next run the following script to create the Sale table and populate it: USE CacheSample GO CREATE TABLE Sale(DayCount smallint, Sales money) CREATE CLUSTERED INDEX ndx_DayCount ON Sale(DayCount) go INSERT Sale VALUES (1,120) INSERT Sale VALUES (2,60) INSERT Sale VALUES (3,125) INSERT Sale VALUES (4,40) DECLARE @DayCount smallint, @Sales money SET @DayCount = 5 SET @Sales = 10 WHILE @DayCount < 5000 BEGIN INSERT Sale VALUES (@DayCount,@Sales) SET @DayCount = @DayCount + 1 SET @Sales = @Sales + 15 END Next create a stored procedure to calculate the running total, and return a specified number of rows from the Sale table, using the following script: USE [CacheSample] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO -- ============================================= -- Author: Robin -- Create date: -- Description: -- ============================================= CREATE PROCEDURE [dbo].[spGetRunningTotals] -- Add the parameters for the stored procedure here @HighestDayCount smallint = null AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; IF @HighestDayCount IS NULL SELECT @HighestDayCount = MAX(DayCount) FROM dbo.Sale DECLARE @SaleTbl TABLE (DayCount smallint, Sales money, RunningTotal money) DECLARE @DayCount smallint, @Sales money, @RunningTotal money SET @RunningTotal = 0 SET @DayCount = 0 DECLARE rt_cursor CURSOR FOR SELECT DayCount, Sales FROM Sale ORDER BY DayCount OPEN rt_cursor FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales WHILE @@FETCH_STATUS = 0 AND @DayCount <= @HighestDayCount BEGIN SET @RunningTotal = @RunningTotal + @Sales INSERT @SaleTbl VALUES (@DayCount,@Sales,@RunningTotal) FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales END CLOSE rt_cursor DEALLOCATE rt_cursor SELECT DayCount, Sales, RunningTotal FROM @SaleTbl END GO Create the Sample ASP.NET application In Visual Studio create a new solution and add a class library project called CacheSample.BusinessObjects and an ASP.NET web application called CacheSample.UI. The CacheSample.BusinessObjects project will contain a single class to represent a Sale data item, with all the code to retrieve the sales from the database included in it for simplicity (normally I would at least have a separate Repository or other object that is responsible for retrieving data, and probably a data access layer as well, but for this sample I want to keep it simple). The C# code for the Sale class is shown below: using System; using System.Collections.Generic; using System.Data; using System.Data.SqlClient; namespace CacheSample.BusinessObjects { public class Sale { public Int16 DayCount { get; set; } public decimal Sales { get; set; } public decimal RunningTotal { get; set; } public static IEnumerable<Sale> GetSales(int? highestDayCount) { List<Sale> sales = new List<Sale>(); SqlParameter highestDayCountParameter = new SqlParameter("@HighestDayCount", SqlDbType.SmallInt); if (highestDayCount.HasValue) highestDayCountParameter.Value = highestDayCount; else highestDayCountParameter.Value = DBNull.Value; string connectionStr = System.Configuration.ConfigurationManager .ConnectionStrings["CacheSample"].ConnectionString; using(SqlConnection sqlConn = new SqlConnection(connectionStr)) using (SqlCommand sqlCmd = sqlConn.CreateCommand()) { sqlCmd.CommandText = "spGetRunningTotals"; sqlCmd.CommandType = CommandType.StoredProcedure; sqlCmd.Parameters.Add(highestDayCountParameter); sqlConn.Open(); using (SqlDataReader dr = sqlCmd.ExecuteReader()) { while (dr.Read()) { Sale newSale = new Sale(); newSale.DayCount = dr.GetInt16(0); newSale.Sales = dr.GetDecimal(1); newSale.RunningTotal = dr.GetDecimal(2); sales.Add(newSale); } } } return sales; } } } The static GetSale() method makes a call to the spGetRunningTotals stored procedure and then reads each row from the returned SqlDataReader into an instance of the Sale class, it then returns a List of the Sale objects, as IEnnumerable<Sale>. A reference to System.Configuration needs to be added to the CacheSample.BusinessObjects project so that the connection string can be read from the web.config file. In the CacheSample.UI ASP.NET project, create a single web page called ShowSales.aspx, and make this the default start up page. This page will contain a single button to call the GetSales() method and a label to display the results. The html mark up and the C# code behind are shown below: ShowSales.aspx <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="ShowSales.aspx.cs" Inherits="CacheSample.UI.ShowSales" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title>Cache Sample - Show All Sales</title> </head> <body> <form id="form1" runat="server"> <div> <asp:Button ID="btnTest1" runat="server" onclick="btnTest1_Click" Text="Get All Sales" />     <asp:Label ID="lblResults" runat="server"></asp:Label> </div> </form> </body> </html> ShowSales.aspx.cs using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.UI; using System.Web.UI.WebControls; using CacheSample.BusinessObjects; namespace CacheSample.UI { public partial class ShowSales : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { } protected void btnTest1_Click(object sender, EventArgs e) { System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch(); stopWatch.Start(); var sales = Sale.GetSales(null); var lastSales = sales.Last(); stopWatch.Stop(); lblResults.Text = string.Format( "Count of Sales: {0}, Last DayCount: {1}, Total Sales: {2}. Query took {3} ms", sales.Count(), lastSales.DayCount, lastSales.RunningTotal, stopWatch.ElapsedMilliseconds); } } } Finally we need to add a connection string to the CacheSample SQL Server database, called CacheSample, to the web.config file: <?xmlversion="1.0"?> <configuration> <connectionStrings> <addname="CacheSample" connectionString="data source=.\SQLEXPRESS;Integrated Security=SSPI;Initial Catalog=CacheSample" providerName="System.Data.SqlClient" /> </connectionStrings> <system.web> <compilationdebug="true"targetFramework="4.0" /> </system.web> </configuration> Run the application and click the button a few times to see how long each call to the database takes. On my system, each query takes about 450ms. Next I shall look at a solution to use the ASP.NET caching to cache the data returned by the query, so that subsequent requests to the GetSales() method are much faster. Adding Data Caching Support I am going to create my caching support in a separate project called CacheSample.Caching, so the next step is to add a class library to the solution. We shall be using the application configuration to define the implementation of our caching system, so we need a reference to System.Configuration adding to the project. ICacheProvider<T> Interface The first step in adding caching to our application is to define an interface, called ICacheProvider, in the CacheSample.Caching project, with methods to retrieve any data from the cache or to retrieve the data from the data source if it is not present in the cache. Dependency Injection will then be used to inject an implementation of this interface at runtime, allowing the users of the interface (i.e. the CacheSample.BusinessObjects project) to be completely unaware of how the caching is actually implemented. As data of any type maybe retrieved from the data source, it makes sense to use generics in the interface, with a generic type parameter defining the data type associated with a particular instance of the cache interface implementation. The C# code for the ICacheProvider interface is shown below: using System; using System.Collections.Generic; namespace CacheSample.Caching { public interface ICacheProvider { } public interface ICacheProvider<T> : ICacheProvider { T Fetch(string key, Func<T> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry); IEnumerable<T> Fetch(string key, Func<IEnumerable<T>> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry); } } The empty non-generic interface will be used as a type in a Dictionary generic collection later to store instances of the ICacheProvider<T> implementation for reuse, I prefer to use a base interface when doing this, as I think the alternative of using object makes for less clear code. The ICacheProvider<T> interface defines two overloaded Fetch methods, the difference between these is that one will return a single instance of the type T and the other will return an IEnumerable<T>, providing support for easy caching of collections of data items. Both methods will take a key parameter, which will uniquely identify the cached data, a delegate of type Func<T> or Func<IEnumerable<T>> which will provide the code to retrieve the data from the store if it is not present in the cache, and absolute or relative expiry policies to define when a cached item should expire. Note that at present there is no support for cache dependencies, but I shall be showing a method of adding this in part two of this article. CacheProviderFactory Class We need a mechanism of creating instances of our ICacheProvider<T> interface, using Dependency Injection to get the implementation of the interface. To do this we shall create a CacheProviderFactory static class in the CacheSample.Caching project. This factory will provide a generic static method called GetCacheProvider<T>(), which shall return instances of ICacheProvider<T>. We can then call this factory method with the relevant data type (for example the Sale class in the CacheSample.BusinessObject project) to get a instance of ICacheProvider for that type (e.g. call CacheProviderFactory.GetCacheProvider<Sale>() to get the ICacheProvider<Sale> implementation). The C# code for the CacheProviderFactory is shown below: using System; using System.Collections.Generic; using CacheSample.Caching.Configuration; namespace CacheSample.Caching { public static class CacheProviderFactory { private static Dictionary<Type, ICacheProvider> cacheProviders = new Dictionary<Type, ICacheProvider>(); private static object syncRoot = new object(); ///<summary> /// Factory method to create or retrieve an implementation of the /// ICacheProvider interface for type <typeparamref name="T"/>. ///</summary> ///<typeparam name="T"> /// The type that this cache provider instance will work with ///</typeparam> ///<returns>An instance of the implementation of ICacheProvider for type ///<typeparamref name="T"/>, as specified by the application /// configuration</returns> public static ICacheProvider<T> GetCacheProvider<T>() { ICacheProvider<T> cacheProvider = null; // Get the Type reference for the type parameter T Type typeOfT = typeof(T); // Lock the access to the cacheProviders dictionary // so multiple threads can work with it lock (syncRoot) { // First check if an instance of the ICacheProvider implementation // already exists in the cacheProviders dictionary for the type T if (cacheProviders.ContainsKey(typeOfT)) cacheProvider = (ICacheProvider<T>)cacheProviders[typeOfT]; else { // There is not already an instance of the ICacheProvider in // cacheProviders for the type T // so we need to create one // Get the Type reference for the application's implementation of // ICacheProvider from the configuration Type cacheProviderType = Type.GetType(CacheProviderConfigurationSection.Current. CacheProviderType); if (cacheProviderType != null) { // Now get a Type reference for the Cache Provider with the // type T generic parameter Type typeOfCacheProviderTypeForT = cacheProviderType.MakeGenericType(new Type[] { typeOfT }); if (typeOfCacheProviderTypeForT != null) { // Create the instance of the Cache Provider and add it to // the cacheProviders dictionary for future use cacheProvider = (ICacheProvider<T>)Activator. CreateInstance(typeOfCacheProviderTypeForT); cacheProviders.Add(typeOfT, cacheProvider); } } } } return cacheProvider; } } } As this code uses Activator.CreateInstance() to create instances of the ICacheProvider<T> implementation, which is a slow process, the factory class maintains a Dictionary of the previously created instances so that a cache provider needs to be created only once for each type. The type of the implementation of ICacheProvider<T> is read from a custom configuration section in the application configuration file, via the CacheProviderConfigurationSection class, which is described below. CacheProviderConfigurationSection Class The implementation of ICacheProvider<T> will be specified in a custom configuration section in the application’s configuration. To handle this create a folder in the CacheSample.Caching project called Configuration, and add a class called CacheProviderConfigurationSection to this folder. This class will extend the System.Configuration.ConfigurationSection class, and will contain a single string property called CacheProviderType. The C# code for this class is shown below: using System; using System.Configuration; namespace CacheSample.Caching.Configuration { internal class CacheProviderConfigurationSection : ConfigurationSection { public static CacheProviderConfigurationSection Current { get { return (CacheProviderConfigurationSection) ConfigurationManager.GetSection("cacheProvider"); } } [ConfigurationProperty("type", IsRequired=true)] public string CacheProviderType { get { return (string)this["type"]; } } } } Adding Data Caching to the Sales Class We now have enough code in place to add caching to the GetSales() method in the CacheSample.BusinessObjects.Sale class, even though we do not yet have an implementation of the ICacheProvider<T> interface. We need to add a reference to the CacheSample.Caching project to CacheSample.BusinessObjects so that we can use the ICacheProvider<T> interface within the GetSales() method. Once the reference is added, we can first create a unique string key based on the method name and the parameter value, so that the same cache key is used for repeated calls to the method with the same parameter values. Then we get an instance of the cache provider for the Sales type, using the CacheProviderFactory, and pass the existing code to retrieve the data from the database as the retrievalMethod delegate in a call to the Cache Provider Fetch() method. The C# code for the modified GetSales() method is shown below: public static IEnumerable<Sale> GetSales(int? highestDayCount) { string cacheKey = string.Format("CacheSample.BusinessObjects.GetSalesWithCache({0})", highestDayCount); return CacheSample.Caching.CacheProviderFactory. GetCacheProvider<Sale>().Fetch(cacheKey, delegate() { List<Sale> sales = new List<Sale>(); SqlParameter highestDayCountParameter = new SqlParameter("@HighestDayCount", SqlDbType.SmallInt); if (highestDayCount.HasValue) highestDayCountParameter.Value = highestDayCount; else highestDayCountParameter.Value = DBNull.Value; string connectionStr = System.Configuration.ConfigurationManager. ConnectionStrings["CacheSample"].ConnectionString; using (SqlConnection sqlConn = new SqlConnection(connectionStr)) using (SqlCommand sqlCmd = sqlConn.CreateCommand()) { sqlCmd.CommandText = "spGetRunningTotals"; sqlCmd.CommandType = CommandType.StoredProcedure; sqlCmd.Parameters.Add(highestDayCountParameter); sqlConn.Open(); using (SqlDataReader dr = sqlCmd.ExecuteReader()) { while (dr.Read()) { Sale newSale = new Sale(); newSale.DayCount = dr.GetInt16(0); newSale.Sales = dr.GetDecimal(1); newSale.RunningTotal = dr.GetDecimal(2); sales.Add(newSale); } } } return sales; }, null, new TimeSpan(0, 10, 0)); } This example passes the code to retrieve the Sales data from the database to the Cache Provider as an anonymous method, however it could also be written as a lambda. The main advantage of using an anonymous function (method or lambda) is that the code inside the anonymous function can access the parameters passed to the GetSales() method. Finally the absolute expiry is set to null, and the relative expiry set to 10 minutes, to indicate that the cache entry should be removed 10 minutes after the last request for the data. As the ICacheProvider<T> has a Fetch() method that returns IEnumerable<T>, we can simply return the results of the Fetch() method to the caller of the GetSales() method. This should be all that is needed for the GetSales() method to now retrieve data from a cache after the first time the data has be retrieved from the database. Implementing a ASP.NET Cache Provider The final step is to actually implement the ICacheProvider<T> interface, and add the implementation details to the web.config file for the dependency injection. The cache provider implementation needs to have access to System.Web. Therefore it could be placed in the CacheSample.UI project, or in its own project that has a reference to System.Web. Implementing the Cache Provider in a separate project is my favoured approach. Create a new project inside the solution called CacheSample.CacheProvider, and add references to System.Web and CacheSample.Caching to this project. Add a class to the project called AspNetCacheProvider. Make the class a generic class by adding the generic parameter <T> and indicate that the class implements ICacheProvider<T>. The C# code for the AspNetCacheProvider class is shown below: using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.Caching; using CacheSample.Caching; namespace CacheSample.CacheProvider { public class AspNetCacheProvider<T> : ICacheProvider<T> { #region ICacheProvider<T> Members public T Fetch(string key, Func<T> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry) { return FetchAndCache<T>(key, retrieveData, absoluteExpiry, relativeExpiry); } public IEnumerable<T> Fetch(string key, Func<IEnumerable<T>> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry) { return FetchAndCache<IEnumerable<T>>(key, retrieveData, absoluteExpiry, relativeExpiry); } #endregion #region Helper Methods private U FetchAndCache<U>(string key, Func<U> retrieveData, DateTime? absoluteExpiry, TimeSpan? relativeExpiry) { U value; if (!TryGetValue<U>(key, out value)) { value = retrieveData(); if (!absoluteExpiry.HasValue) absoluteExpiry = Cache.NoAbsoluteExpiration; if (!relativeExpiry.HasValue) relativeExpiry = Cache.NoSlidingExpiration; HttpContext.Current.Cache.Insert(key, value, null, absoluteExpiry.Value, relativeExpiry.Value); } return value; } private bool TryGetValue<U>(string key, out U value) { object cachedValue = HttpContext.Current.Cache.Get(key); if (cachedValue == null) { value = default(U); return false; } else { try { value = (U)cachedValue; return true; } catch { value = default(U); return false; } } } #endregion } } The two interface Fetch() methods call a private method called FetchAndCache(). This method first checks for a element in the HttpContext.Current.Cache with the specified cache key, and if so tries to cast this to the specified type (either T or IEnumerable<T>). If the cached element is found, the FetchAndCache() method simply returns it. If it is not found in the cache, the method calls the retrievalMethod delegate to get the data from the data source, and then adds this to the HttpContext.Current.Cache. The final step is to add the AspNetCacheProvider class to the relevant custom configuration section in the CacheSample.UI.Web.Config file. To do this there needs to be a <configSections> element added as the first element in <configuration>. This will match a custom section called <cacheProvider> with the CacheProviderConfigurationSection. Then we add a <cacheProvider> element, with a type property set to the fully qualified assembly name of the AspNetCacheProvider class, as shown below: <?xmlversion="1.0"?> <configuration> <configSections> <sectionname="cacheProvider" type="CacheSample.Base.Configuration.CacheProviderConfigurationSection, CacheSample.Base" /> </configSections> <connectionStrings> <addname="CacheSample" connectionString="data source=.\SQLEXPRESS;Integrated Security=SSPI;Initial Catalog=CacheSample" providerName="System.Data.SqlClient" /> </connectionStrings> <cacheProvidertype="CacheSample.CacheProvider.AspNetCacheProvider`1, CacheSample.CacheProvider, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null"> </cacheProvider> <system.web> <compilationdebug="true"targetFramework="4.0" /> </system.web> </configuration> One point to note is that the fully qualified assembly name of the AspNetCacheProvider class includes the notation `1 after the class name, which indicates that it is a generic class with a single generic type parameter. The CacheSample.UI project needs to have references added to CacheSample.Caching and CacheSample.CacheProvider so that the actual application is aware of the relevant cache provider implementation. Conclusion After implementing this solution, you should have a working cache provider mechanism, that will allow the middle and data access layers to implement caching support when retrieving data, without any knowledge of the actually caching implementation. If the UI is not ASP.NET based, if for example it is Winforms or WPF, the implementation of ICacheProvider<T> would be written around whatever technology is available. It could even be a standalone caching system that takes full responsibility for adding and removing items from a global store. The next part of this article will show how this caching mechanism may be extended to provide support for cache dependencies, such as the System.Web.Caching.SqlCacheDependency. Another possible extension would be to cache the cache provider implementations instead of storing them in a static Dictionary in the CacheProviderFactory. This would prevent a build up of seldom used cache providers in the application memory, as they could be removed from the cache if not used often enough, although in reality there are probably unlikely to be vast numbers of cache provider implementation instances, as most applications do not have a massive number of business object or model types.

Read the article
webserver horrible slow, sometimes incredible fast

- by dhanke

i am running a small community ( 6000+ Members ) on a non-virtual 64-bit ubuntu 11.04 system. I am not a Linux-pro, not even advanced, i just tried to setup a webserver, which does nothing special actually. Delivering some dynamic PHP and RoR websites is its task. So it might be that my configuration files do look horrible bad. Also, i might use the wrong vocabulary, so in doubt, please ask. Having a current all-time record of 520 registered users (board-accounts, no system-users) online at same time, average server-load is about 2.0 - 5.0. Meantime (~250 users) average server load value is at about 0.4 - 0.8, sometimes, on some expensive searches a bit higher. everything fine. From time to time however, the load increases up to 120 (120.0, not 12.0 ;) ). In this time, its hard to even connect via SSH, but when i reach the server, and use top/htop/iotop to see whats happening, i cannot identify any process causing high CPU load. iotop tells me about a current reading/writing speed of about approx. 70kb/s, which is quite equal to power-off i think. Memory-Usage is max. at ~ 12GB of 16GB, so swap remains empty. now the odd (at least for me:) waiting some minutes ( since i always get a bit into a panic when this happens, it feels like 5 minutes, but i suppose its more like 20-30 minutes) and the server is back to normal. everything continues as normal. another odd fact: when i run hdparm -tT /dev/sda, i get answer like: /dev/sda: Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec when i run the same command while the server is "frozen", the answer is like /dev/sda: <- takes about 5 minutes until this line appears Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec <- 5 more minutes Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec <- another 5 minutes so the values are the same, but the quoted time is completely wrong. using time command as prefix also tells me that ~ 15 minutes were used. I searched in dmesg, /var/log/[messages|syslog] - nothing found. /var/log/errors however tells me that: Jul 4 20:28:30 localhost kernel: [19080.671415] INFO: task php5-fpm:27728 blocked for more than 120 seconds. Jul 4 20:28:30 localhost kernel: [19080.671419] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables this message. multiple times. now that message does tell me that php5-fpm task was blocked or did block ? - but not if that is the cause or just one of the results of that "freeze". Anyone? to cut the long story short, i dont know where even to start analyzing. So if you can give me any advice by looking at following specs and configs, or ask me to provide more information, i`d be glad. Specs: 6 Core AMD Phenom(tm) II X6 1055T Processor * 16 Gigabyte Ram 2x 1.5 TB Seagate ST1500DL003-9VT16L via SATA 3 via SoftwareRaid (i suppose) Services: (due to service --status-all, those with [ + ]) nginx Webserver 1.0.14 mySQL 5.1.63 Server Ruby on Rails 2.3.11 ( passenger-nginx-module ) php5-fpm 5.3.6-13ubuntu3.7 SSH ido2db Further services: default crontab + nightly backup. syslog-ng Website consists of 2 subdomains, forum. and www. where forum is a phpBB3.x PHP-Board, and www a Ruby on Rails 2.3.11 application (portal). Mini-Note: sometimes i notice that the forum is pretty slow, in contrast to the always-fast (except for this "freeze") portal. Both share the same Database, but the portal is using it read-only. The Webserver is nginx, using phusion passenger module to communicate with the ruby-application. Also, for the forum it communicates with php5-fpm via socket: relevant nginx configuration parts ( with comments/questions starting by ; ) ; in case of freeze due to too high Filesystem activity, maybe adding a limit? #worker_rlimit_nofile 50000; user www-data; ; 6 cores, so i read 6 fits. maybe already wrong? worker_processes 6; pid /var/run/nginx.pid; events { worker_connections 1024; } http { passenger_root /var/lib/gems/1.8/gems/passenger-3.0.11; passenger_ruby /usr/bin/ruby1.8; ; the forum once featured a chat, which was working w/o websockets. ; so it was a hell of pull requests (deactivated now, freeze still happening) keepalive_timeout 65; keepalive_requests 50; gzip on; server { listen 80; server_name www.domain.tld; root /var/www/domain/rails/public; passenger_enabled on; } server { listen 80; server_name forum.domain.tld; location / { root /var/www/domain/forum; index index.php; } ; satic stuff to be handled by nginx location ~* ^/style/.+.(jpg|jpeg|gif|css|png|js|ico|xml)$ { access_log off; expires 30d; root /var/www/domain/forum/; } ; now the php magic, note the "backend"-fcgi_pass location ~ .php$ { fastcgi_split_path_info ^(.+\.php)(.*)$; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /var/www/domain/forum$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors on; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 60; fastcgi_send_timeout 180; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 256 16k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; fastcgi_max_temp_file_size 0; } location ~ /\.ht { deny all; } } ;the php5-fpm socket. i read that /dev/shm/ whould be the fastes place for this. bad idea in general? upstream backend { server unix:/dev/shm/phpfpm; } ... } php5-fpm settings (i changed this values due to php5-fpm error log messages higher and higher.. (freeze-problem was there before as well)* listen = /dev/shm/phpfpm user = www-data group = www-data pm = dynamic ; holy, 4000! well, shinking this value to earth-level gave me ; 100s of 502 bad gateway commands. this values were quite stable. ; since there are only max 520 users online i dont get it, why i would need ; as many children as configured here. due to keep-alive maybe? ; asking questions is easier for me since restarting server will make ; my community-members angry ;) pm.max_children = 4000 pm.start_servers = 100 pm.min_spare_servers = 50 pm.max_spare_servers = 150 pm.max_requests = 10 pm.status_path = /status ping.path = /ping ping.response = pong slowlog = log/$pool.log.slow ;should i use rlimit? ;rlimit_files = 1024 chdir = / mysql/my.cnf [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking bind-address = 127.0.0.1 key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 myisam-recover = BACKUP ; high number, but less gives some phpBB errors. max_connections = 450 table_cache = 512 ; i read twice the cpu cores, bad? thread_concurrency = 12 join_buffer_size = 2084K concurrent_insert = 3 query_cache_limit = 64M query_cache_size = 512M query_cache_type = 1 log_error = /var/log/mysql/error.log log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 2 expire_logs_days = 10 max_binlog_size = 100M low_priority_updates=1 [mysqldump] quick quote-names max_allowed_packet = 16M [isamchk] key_buffer = 16M !includedir /etc/mysql/conf.d/ I used smartctl already, hdds seem to be fine. /proc/mdstatus quotes: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sda3[1] 1459264192 blocks [2/1] [_U] md1 : active raid1 sda1[0] 3911680 blocks [2/1] [U_] unused devices: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127727 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 127727 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited I quote some questions in my configuration files, these are not (intentional) directly problem-related, but would be nice for me to know wether they are indeed questionable or done right. One additional Fact: my MYSQL-database is at 12GB size. i dont know if that does matter, but mytop sometimes shows me 4-5 seconds long insert queries, some are 20-30 seconds long. Its just a feeling that i am unable to prove (because i dont know how), but when i disable the database, the freeze seems not to happen. Example: i created a dummy rails application to see the development log. the app made some sql-queries, reads and inserts. the log quite often was like: DbTest Load (0.3ms) SELECT * FROM `db_test` WHERE (`db_test`.`id` = 31722) LIMIT 1 SQL (0.1ms) BEGIN DbTest Update (0.3ms) UPDATE `db_test` SET `updated_at` = '2012-07-04 23:32:34' WHERE `id` = 31722 - now the log stands still for 5-60 seconds. SQL (49.1ms) COMMIT - SQL-Update time in the log does not include freeze time Rendering test/index Completed in 96ms (View: 16, DB: 59) | 200 OK [http://localhost:9000/test] Bad part is: this mini-freeze here only happens from time to time as well. note: meanwhile i cannot even upload files via scp. I currently feel like running form bad to worse and back by googling for my server-problem due to immense lack of knowledge regarding server configurations. It still makes me wonder, why those problems even appear, since 250 users a time is not such a high amount, right? So my questions: whats wrong and how to fix? ;) or: what information can i provide to make the situation more clear? can you point at some critical bad configuration-line which i should consider to catch up in the documentation? are there any tools i can run to see some possible bottlenecks? any further advice? (next to: "pay someone who knows what he does" - its a private project, server costs enough already. :)) Thanks for your time and help. Best Regards, Daniel P.S.: i renamed the configfiles to domain.tld since i dont want to have any % more load to the server until its fixed. might be a exaggeratedly thought.. P.P.S: if i asked a complete duplicate question, sorry. my search results seemed to be quite specific in their own way.

Read the article
Apache + Codeigniter + New Server + Unexpected Errors

- by ngl5000

Alright here is the situation: I use to have my codeigniter site at bluehost were I did not have root access, I have since moved that site to rackspace. I have not changed any of the PHP code yet there has been some unexpected behavior. Unexpected Behavior: http://mysite.com/robots.txt Both old and new resolve to the robots file http://mysite.com/robots.txt/ The old bluehost setup resolves to my codeigniter 404 error page. The rackspace config resolves to: Not Found The requested URL /robots.txt/ was not found on this server. **This instance leads me to believe that there could be a problem with my mod rewrites or lack there of. The first one produces the error correctly through php while it seems the second senario lets the server handle this error. The next instance of this problem is even more troubling: 'http://mysite.com/search/term/9 x 1-1%2F2 white/' New site results in: Bad Request Your browser sent a request that this server could not understand. Old site results in: The actual page being loaded and the search term being unencoded. I have to assume that this has something to do with the fact that when I went to the new server I went from root level htaccess file to httpd.conf file and virtual server default and default-ssl. Here they are: Default file: <VirtualHost *:80> ServerAdmin webmaster@localhost ServerName mysite.com DocumentRoot /var/www <Directory /> Options +FollowSymLinks AllowOverride None </Directory> <Directory /var/www> Options -Indexes +FollowSymLinks -MultiViews AllowOverride None Order allow,deny allow from all RewriteEngine On RewriteBase / # force no www. (also does the IP thing) RewriteCond %{HTTPS} !=on RewriteCond %{HTTP_HOST} !^mysite\.com [NC] RewriteRule ^(.*)$ http://mysite.com/$1 [R=301,L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.+)\.(\d+)\.(js|css|png|jpg|gif)$ $1.$3 [L] # index.php remove any index.php parts RewriteCond %{THE_REQUEST} /index\.(php|html) RewriteRule (.*)index\.(php|html)(.*)$ /$1$3 [r=301,L] # codeigniter direct RewriteCond $0 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^.*$ index.php [L] </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/access.log combined Alias /doc/ "/usr/share/doc/" <Directory "/usr/share/doc/"> Options Indexes MultiViews FollowSymLinks AllowOverride None Order deny,allow Deny from all Allow from 127.0.0.0/255.0.0.0 ::1/128 </Directory> </VirtualHost> Default-ssl File <IfModule mod_ssl.c> <VirtualHost _default_:443> ServerAdmin webmaster@localhost ServerName mysite.com DocumentRoot /var/www <Directory /> Options +FollowSymLinks AllowOverride None </Directory> <Directory /var/www> Options -Indexes +FollowSymLinks -MultiViews AllowOverride None Order allow,deny allow from all RewriteEngine On RewriteBase / RewriteCond %{SERVER_PORT} !^443 RewriteRule ^ https://mysite.com%{REQUEST_URI} [R=301,L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.+)\.(\d+)\.(js|css|png|jpg|gif)$ $1.$3 [L] # index.php remove any index.php parts RewriteCond %{THE_REQUEST} /index\.(php|html) RewriteRule (.*)index\.(php|html)(.*)$ /$1$3 [r=301,L] # codeigniter direct RewriteCond $0 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^.*$ index.php [L] </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/ssl_access.log combined Alias /doc/ "/usr/share/doc/" <Directory "/usr/share/doc/"> Options Indexes MultiViews FollowSymLinks AllowOverride None Order deny,allow Deny from all Allow from 127.0.0.0/255.0.0.0 ::1/128 </Directory> # SSL Engine Switch: # Enable/Disable SSL for this virtual host. SSLEngine on # Use our self-signed certificate by default SSLCertificateFile /etc/apache2/ssl/certs/www.mysite.com.crt SSLCertificateKeyFile /etc/apache2/ssl/private/www.mysite.com.key # A self-signed (snakeoil) certificate can be created by installing # the ssl-cert package. See # /usr/share/doc/apache2.2-common/README.Debian.gz for more info. # If both key and certificate are stored in the same file, only the # SSLCertificateFile directive is needed. # SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem # SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key # Server Certificate Chain: # Point SSLCertificateChainFile at a file containing the # concatenation of PEM encoded CA certificates which form the # certificate chain for the server certificate. Alternatively # the referenced file can be the same as SSLCertificateFile # when the CA certificates are directly appended to the server # certificate for convinience. #SSLCertificateChainFile /etc/apache2/ssl.crt/server-ca.crt # Certificate Authority (CA): # Set the CA certificate verification path where to find CA # certificates for client authentication or alternatively one # huge file containing all of them (file must be PEM encoded) # Note: Inside SSLCACertificatePath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCACertificatePath /etc/ssl/certs/ #SSLCACertificateFile /etc/apache2/ssl.crt/ca-bundle.crt # Certificate Revocation Lists (CRL): # Set the CA revocation path where to find CA CRLs for client # authentication or alternatively one huge file containing all # of them (file must be PEM encoded) # Note: Inside SSLCARevocationPath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCARevocationPath /etc/apache2/ssl.crl/ #SSLCARevocationFile /etc/apache2/ssl.crl/ca-bundle.crl # Client Authentication (Type): # Client certificate verification type and depth. Types are # none, optional, require and optional_no_ca. Depth is a # number which specifies how deeply to verify the certificate # issuer chain before deciding the certificate is not valid. #SSLVerifyClient require #SSLVerifyDepth 10 # Access Control: # With SSLRequire you can do per-directory access control based # on arbitrary complex boolean expressions containing server # variable checks and other lookup directives. The syntax is a # mixture between C and Perl. See the mod_ssl documentation # for more details. #<Location /> #SSLRequire ( %{SSL_CIPHER} !~ m/^(EXP|NULL)/ \ # and %{SSL_CLIENT_S_DN_O} eq "Snake Oil, Ltd." \ # and %{SSL_CLIENT_S_DN_OU} in {"Staff", "CA", "Dev"} \ # and %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 \ # and %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 20 ) \ # or %{REMOTE_ADDR} =~ m/^192\.76\.162\.[0-9]+$/ #</Location> # SSL Engine Options: # Set various options for the SSL engine. # o FakeBasicAuth: # Translate the client X.509 into a Basic Authorisation. This means that # the standard Auth/DBMAuth methods can be used for access control. The # user name is the `one line' version of the client's X.509 certificate. # Note that no password is obtained from the user. Every entry in the user # file needs this password: `xxj31ZMTZzkVA'. # o ExportCertData: # This exports two additional environment variables: SSL_CLIENT_CERT and # SSL_SERVER_CERT. These contain the PEM-encoded certificates of the # server (always existing) and the client (only existing when client # authentication is used). This can be used to import the certificates # into CGI scripts. # o StdEnvVars: # This exports the standard SSL/TLS related `SSL_*' environment variables. # Per default this exportation is switched off for performance reasons, # because the extraction step is an expensive operation and is usually # useless for serving static content. So one usually enables the # exportation for CGI and SSI requests only. # o StrictRequire: # This denies access when "SSLRequireSSL" or "SSLRequire" applied even # under a "Satisfy any" situation, i.e. when it applies access is denied # and no other module can change it. # o OptRenegotiate: # This enables optimized SSL connection renegotiation handling when SSL # directives are used in per-directory context. #SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire <FilesMatch "\.(cgi|shtml|phtml|php)$"> SSLOptions +StdEnvVars </FilesMatch> <Directory /usr/lib/cgi-bin> SSLOptions +StdEnvVars </Directory> # SSL Protocol Adjustments: # The safe and default but still SSL/TLS standard compliant shutdown # approach is that mod_ssl sends the close notify alert but doesn't wait for # the close notify alert from client. When you need a different shutdown # approach you can use one of the following variables: # o ssl-unclean-shutdown: # This forces an unclean shutdown when the connection is closed, i.e. no # SSL close notify alert is send or allowed to received. This violates # the SSL/TLS standard but is needed for some brain-dead browsers. Use # this when you receive I/O errors because of the standard approach where # mod_ssl sends the close notify alert. # o ssl-accurate-shutdown: # This forces an accurate shutdown when the connection is closed, i.e. a # SSL close notify alert is send and mod_ssl waits for the close notify # alert of the client. This is 100% SSL/TLS standard compliant, but in # practice often causes hanging connections with brain-dead browsers. Use # this only for browsers where you know that their SSL implementation # works correctly. # Notice: Most problems of broken clients are also related to the HTTP # keep-alive facility, so you usually additionally want to disable # keep-alive for those clients, too. Use variable "nokeepalive" for this. # Similarly, one has to force some clients to use HTTP/1.0 to workaround # their broken HTTP/1.1 implementation. Use variables "downgrade-1.0" and # "force-response-1.0" for this. BrowserMatch "MSIE [2-6]" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 # MSIE 7 and newer should be able to use keepalive BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown httpd.conf File Just a lot of stuff from html5 boiler plate, I will post it if need be Old htaccess file <IfModule mod_rewrite.c> # index.php remove any index.php parts RewriteCond %{THE_REQUEST} /index\.(php|html) RewriteRule (.*)index\.(php|html)(.*)$ /$1$3 [r=301,L] RewriteCond $1 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^(.*)/$ /$1 [r=301,L] # codeigniter direct RewriteCond $1 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^(.*)$ /index.php/$1 [L] </IfModule> Any Help would be hugely appreciated!!

Read the article
PC powers off at random times

- by Timo Huovinen

Short Version After experiencing some problems with Mobo batteries my PC started to power off at random times, the power off is instant and sudden and does not restart afterwards, need help figuring out the cause. Facts: Powers off when PC is playing games Powers off when PC is idle Powers off when PC is in safe mode Powers off when PC is in BIOS Powers off when PC is booted through a Windows installation USB Replaced the motherboard battery several times Replaced the 650W PSU with a 750W PSU Replaced the RAM Swapped the RAM between slots Re-applied thermal paste to the CPU Checked if the motherboard touches the case Nothing is overclocked PC Specs PC specs: OS: Windows 7 Ultimate SP1 RAM: klingston 1333MHz 4GB stick CPU: AMD Phenom II x4 955 Mobo: Gigabyte 88GMA-UD2H rev 2.2 Motherboard battery: CR2032 3v HDD: 500GB Seagate ST3500418AS ATA Device Graphics: ATI/AMD Radeon HD 6870 Very Long version Around 10 months ago I built a brand new gaming PC. Around 6 months ago it's time setting in windows started resetting to the year 2010. I swapped the Motherboard battery for a new one of the exact same size and shape and voltage, and the problems disappeared...for around 2 weeks. Then the same problem happened again, time gets reset, I swapped the battery again, and the problem was gone for good and everything was great for about 3 months.. then another problem started happening, the PC started to power off suddenly and without warning at completely random times, sometimes the PC works for and hour, sometimes 5 minutes. So I read on the forums that it might be either the PSU or the motherboard Battery or RAM or HDD or the Graphics card or the CPU or the motherboard or the drivers or a Virus or Grounding issues, or something short circuiting, basically it can be anything... I spent some days researching, and decided to remove the possibility of a virus. I reset the CMOS, cleared all BIOS settings and reinstalled windows 7 after a full format of the HDD, but the random power off kept happening. I then disabled the restart on error option in windows and looked at the event log for error events, but they did not help me figure out the problem. Network list service depends on network location awareness the dependency group failed to start Source Kernel Power Event 41 Task Category 63 Source Disk Event ID 11 Task Category None The driver detected a controller error on device disk I took apart the PC, every little piece, re-applied some expensive thermal paste to the CPU, and double checked that none of the pieces are touching the PC case. The problem was gone, the PC no longer powered off randomly I re-attached the graphics card and all was good for 4 months... then the power off problem appeared again, but was happening at high intervals, the PC would shutdown once in 2 days on average, at random points in time, sometimes when it's idle all day long, sometimes when it's running CRYSIS 2. I checked the CPU temperature, because I know that AMD CPU's have a built in protection mechanism that switches off the PC if the CPU gets too hot, and the Temp was 50C system temp, and 45C CPU after running the PC all day long (I did not do tests to see if there are any temperature spikes, don't know how to do them) Originally the PSU that powered the PC was 650Watts and had one 4 pin cable to power the CPU, I replaced it with a new 750Watts PSU which has two 4 pin cables for the CPU, but the problem remained. I removed the graphics card and let the motherboard use the built in one, but the PC kept suddenly powering off at random times. I took apart the PC completely again, and re-applied thermal paste to the CPU, added lots of insulation, and checked for any type of short-circuit possibility again and again, but the problem remained. The problem was like that for some months. I replaced the Battery a couple of times over the time, changed lots of options in windows, and tried everything I could, but it kept powering off, so I stopped using the PC as much as I used to, just living with the random power offs from time to time, until a couple of days ago, when the power off happens almost immediately after powering on the PC. I replaced the RAM with a brand new one, but that did not help. Took apart the PC again, checked for anything anywhere that might cause it, found some small scratches on the very edge of the motherboard to the left of the PCI express x16 slot. This might cause the problem, I thought, but the scratch looks very superficial, not deep at all, and if the scratch did harm the motherboard, wouldn't it cause it to not start at all? And why did it start to power off a while ago, and then suddenly stop powering off? The scratches could not have vanished??? did chkdsk \d but it powered off when it was at 75% I removed the hard disks, the graphics card, while I fiddled with the BIOS settings, and suddenly the PC shut down while I was looking at the BIOS version. This makes me realize, it is not caused by: HDD, Windows, Drivers or the Graphics card I cleared the CMOS again, updated the BIOS from F5 to F6f beta, but that did not help, it might even seem that the PC powers off even sooner. The shutdown even happened to me while I booted through a windows 7 installation USB and was in the repair console. I removed one of the cables powering the CPU, now only one 4pin cable powers it, and it worked for 30mins after doing that, which makes me think that it's the CPU overheating, and because it gets less power, it overheats slower? The things that I am still considering: CPU overheating (does not seem to overheat, maybe false readings?) Motherboard short circuiting (faulty motherboard?) I desperately need some advice in what is faulty, is it a faulty Motherboard or an overheating CPU? or maybe something else? I have been breaking my head over this problem over a span of 6 months. I'm not sure if this is a good place to ask this question, if it is not, then tell me where I can get some experienced help. More info I have also discovered a mysterious piece that seems to have fallen out of the motherboard i119.photobucket.com/albums/o126/yurikolovsky/strangepiece.jpg What is it? Looks like each time that it powers off the datetime gets reset I also found another forum post tomshardware.co.uk/forum/… except I don't have Integrated PeripheralsUSB Keyboard Function option in BIOS :S Comments summary (asked by Random moderator) Q. tell me, if the computer restarts, is it immediately? Does it take a second and then restarts? Do you see (BSOD) or hear (PSU, short circuit) any suspicious when it happens? After reading trough it, it remains the mainboard that is faulty. – JohannesM A. Immediate power off, all the fans stop instantly, all the light turn off instantly, no sound or anything, and it remains off until I turn it back on. Thanks for the feedback, faulty motherboard is what I fear. Q. Try stress-testing the system with Prime95 and see if errors or shutdowns occur when the CPU is under full load. – speakr A. Prime95 heat stress test peaked CPU heat at 60C after 5mins, it powered off after 30mins of testing in the middle of the test with no errors, Prime95 Heat test or the stress-testing with low RAM usage (small or in-place FFTs) do not report errors while testing for 10-60 mins. The power off does not seem like it is affected by Prime95 at all Makes me wonder if it's a CPU or Motherboard issue at all. Q. I had similar random/intermittent problems with my old board. It gave one of a few different symptoms: keyboard and/or mouse would die and/or the RAM wouldn't work and/or it would shut down. It was in bad shape. One problems was that my old PSU had literally burned the connector on it (browned around the pins), another was that a broken lead inside the layers of the PCB would work sometimes if it happened to be hot or if I bent the board—by jamming a hunk of wood behind it. I managed to keep the board alive for several years, but eventually nothing I did would make it work correctly anymore. – Synetech A. I will try that as the last resort, ok? ;) Q. Have you tried a different power cord, surge protector, outlet (on a different circuit). It's worth a shot just to ensure it's not subpar wiring or a week circuit (dips in power may cause shutdown if the PSU can't pull enough juice from the wall). – Kyle A. yes, I attached the PC to an entirely different outlet on a different circuit and the problem persists. After connecting it to a different outlet after starting the PC it gave me 3 long beeps and 1 short one, then the PC immediately proceeded to boot up normally. Q. Re-check your mainboard manual and all PSU connections to your mainboard to be sure that nothing is missing (e.g. 12V ATX 4-pin/6-pin connector). If you can provoke shutdowns with Prime95, then consider buying new hardware -- a stable system should run Prime95 for 24h without any errors. Prime95 mentions errors in the log when they occur and gives a summary after the stress test was stopped manually (e.g. "0 errors, 0 warnings", if all is fine) – speakr A. Re-checked, there are no more PSU connectors that I can physically connect, except the one ATX 4-pin (there are 2 that power the CPU) that I disconnected on purpose, I have reconnected it but the problem persists. Q. With one PC I had a short curcuit. The power button on the front plate had its cables soldered, but not isolated, and the contacts were very close to the metal case. A heavier touch was enough to cause a shutdown. The PC's vibration could be enough – ott-- A. yes, it seems to switch off with even the lightest touch, I switched on the PC, then pulled out the front panel power cable that connects to the motherboard so the power button does not work anymore, after 5 mins of working like that, with the power button completely disconnected, just sitting idle, the PC powered off again, I don't think it's the power button. Q. I wonder if you dare to operate components without the case, that is remove motherboard, power, disk ( just put the motherboard on a wooden desk). Don't bend the adapters when running like that. – ott-- A. yes, I do dare to do that, but only tomorrow, too tired/late right now.

Read the article
Red Gate Coder interviews: Alex Davies

- by Michael Williamson

Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

Read the article
A way of doing real-world test-driven development (and some thoughts about it)

- by Thomas Weller

Lately, I exchanged some arguments with Derick Bailey about some details of the red-green-refactor cycle of the Test-driven development process. In short, the issue revolved around the fact that it’s not enough to have a test red or green, but it’s also important to have it red or green for the right reasons. While for me, it’s sufficient to initially have a NotImplementedException in place, Derick argues that this is not totally correct (see these two posts: Red/Green/Refactor, For The Right Reasons and Red For The Right Reason: Fail By Assertion, Not By Anything Else). And he’s right. But on the other hand, I had no idea how his insights could have any practical consequence for my own individual interpretation of the red-green-refactor cycle (which is not really red-green-refactor, at least not in its pure sense, see the rest of this article). This made me think deeply for some days now. In the end I found out that the ‘right reason’ changes in my understanding depending on what development phase I’m in. To make this clear (at least I hope it becomes clear…) I started to describe my way of working in some detail, and then something strange happened: The scope of the article slightly shifted from focusing ‘only’ on the ‘right reason’ issue to something more general, which you might describe as something like 'Doing real-world TDD in .NET , with massive use of third-party add-ins’. This is because I feel that there is a more general statement about Test-driven development to make: It’s high time to speak about the ‘How’ of TDD, not always only the ‘Why’. Much has been said about this, and me myself also contributed to that (see here: TDD is not about testing, it's about how we develop software). But always justifying what you do is very unsatisfying in the long run, it is inherently defensive, and it costs time and effort that could be used for better and more important things. And frankly: I’m somewhat sick and tired of repeating time and again that the test-driven way of software development is highly preferable for many reasons - I don’t want to spent my time exclusively on stating the obvious… So, again, let’s say it clearly: TDD is programming, and programming is TDD. Other ways of programming (code-first, sometimes called cowboy-coding) are exceptional and need justification. – I know that there are many people out there who will disagree with this radical statement, and I also know that it’s not a description of the real world but more of a mission statement or something. But nevertheless I’m absolutely sure that in some years this statement will be nothing but a platitude. Side note: Some parts of this post read as if I were paid by Jetbrains (the manufacturer of the ReSharper add-in – R#), but I swear I’m not. Rather I think that Visual Studio is just not production-complete without it, and I wouldn’t even consider to do professional work without having this add-in installed... The three parts of a software component Before I go into some details, I first should describe my understanding of what belongs to a software component (assembly, type, or method) during the production process (i.e. the coding phase). Roughly, I come up with the three parts shown below: First, we need to have some initial sort of requirement. This can be a multi-page formal document, a vague idea in some programmer’s brain of what might be needed, or anything in between. In either way, there has to be some sort of requirement, be it explicit or not. – At the C# micro-level, the best way that I found to formulate that is to define interfaces for just about everything, even for internal classes, and to provide them with exhaustive xml comments. The next step then is to re-formulate these requirements in an executable form. This is specific to the respective programming language. - For C#/.NET, the Gallio framework (which includes MbUnit) in conjunction with the ReSharper add-in for Visual Studio is my toolset of choice. The third part then finally is the production code itself. It’s development is entirely driven by the requirements and their executable formulation. This is the delivery, the two other parts are ‘only’ there to make its production possible, to give it a decent quality and reliability, and to significantly reduce related costs down the maintenance timeline. So while the first two parts are not really relevant for the customer, they are very important for the developer. The customer (or in Scrum terms: the Product Owner) is not interested at all in how the product is developed, he is only interested in the fact that it is developed as cost-effective as possible, and that it meets his functional and non-functional requirements. The rest is solely a matter of the developer’s craftsmanship, and this is what I want to talk about during the remainder of this article… An example To demonstrate my way of doing real-world TDD, I decided to show the development of a (very) simple Calculator component. The example is deliberately trivial and silly, as examples always are. I am totally aware of the fact that real life is never that simple, but I only want to show some development principles here… The requirement As already said above, I start with writing down some words on the initial requirement, and I normally use interfaces for that, even for internal classes - the typical question “intf or not” doesn’t even come to mind. I need them for my usual workflow and using them automatically produces high componentized and testable code anyway. To think about their usage in every single situation would slow down the production process unnecessarily. So this is what I begin with: namespace Calculator { /// <summary> /// Defines a very simple calculator component for demo purposes. /// </summary> public interface ICalculator { /// <summary> /// Gets the result of the last successful operation. /// </summary> /// <value>The last result.</value> /// <remarks> /// Will be <see langword="null" /> before the first successful operation. /// </remarks> double? LastResult { get; } } // interface ICalculator } // namespace Calculator So, I’m not beginning with a test, but with a sort of code declaration - and still I insist on being 100% test-driven. There are three important things here: Starting this way gives me a method signature, which allows to use IntelliSense and AutoCompletion and thus eliminates the danger of typos - one of the most regular, annoying, time-consuming, and therefore expensive sources of error in the development process. In my understanding, the interface definition as a whole is more of a readable requirement document and technical documentation than anything else. So this is at least as much about documentation than about coding. The documentation must completely describe the behavior of the documented element. I normally use an IoC container or some sort of self-written provider-like model in my architecture. In either case, I need my components defined via service interfaces anyway. - I will use the LinFu IoC framework here, for no other reason as that is is very simple to use. The ‘Red’ (pt. 1) First I create a folder for the project’s third-party libraries and put the LinFu.Core dll there. Then I set up a test project (via a Gallio project template), and add references to the Calculator project and the LinFu dll. Finally I’m ready to write the first test, which will look like the following: namespace Calculator.Test { [TestFixture] public class CalculatorTest { private readonly ServiceContainer container = new ServiceContainer(); [Test] public void CalculatorLastResultIsInitiallyNull() { ICalculator calculator = container.GetService<ICalculator>(); Assert.IsNull(calculator.LastResult); } } // class CalculatorTest } // namespace Calculator.Test This is basically the executable formulation of what the interface definition states (part of). Side note: There’s one principle of TDD that is just plain wrong in my eyes: I’m talking about the Red is 'does not compile' thing. How could a compiler error ever be interpreted as a valid test outcome? I never understood that, it just makes no sense to me. (Or, in Derick’s terms: this reason is as wrong as a reason ever could be…) A compiler error tells me: Your code is incorrect, but nothing more. Instead, the ‘Red’ part of the red-green-refactor cycle has a clearly defined meaning to me: It means that the test works as intended and fails only if its assumptions are not met for some reason. Back to our Calculator. When I execute the above test with R#, the Gallio plugin will give me this output: So this tells me that the test is red for the wrong reason: There’s no implementation that the IoC-container could load, of course. So let’s fix that. With R#, this is very easy: First, create an ICalculator - derived type: Next, implement the interface members: And finally, move the new class to its own file: So far my ‘work’ was six mouse clicks long, the only thing that’s left to do manually here, is to add the Ioc-specific wiring-declaration and also to make the respective class non-public, which I regularly do to force my components to communicate exclusively via interfaces: This is what my Calculator class looks like as of now: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get { throw new NotImplementedException(); } } } } Back to the test fixture, we have to put our IoC container to work: [TestFixture] public class CalculatorTest { #region Fields private readonly ServiceContainer container = new ServiceContainer(); #endregion // Fields #region Setup/TearDown [FixtureSetUp] public void FixtureSetUp() { container.LoadFrom(AppDomain.CurrentDomain.BaseDirectory, "Calculator.dll"); } ... Because I have a R# live template defined for the setup/teardown method skeleton as well, the only manual coding here again is the IoC-specific stuff: two lines, not more… The ‘Red’ (pt. 2) Now, the execution of the above test gives the following result: This time, the test outcome tells me that the method under test is called. And this is the point, where Derick and I seem to have somewhat different views on the subject: Of course, the test still is worthless regarding the red/green outcome (or: it’s still red for the wrong reasons, in that it gives a false negative). But as far as I am concerned, I’m not really interested in the test outcome at this point of the red-green-refactor cycle. Rather, I only want to assert that my test actually calls the right method. If that’s the case, I will happily go on to the ‘Green’ part… The ‘Green’ Making the test green is quite trivial. Just make LastResult an automatic property: [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get; private set; } } One more round… Now on to something slightly more demanding (cough…). Let’s state that our Calculator exposes an Add() method: ... /// <summary> /// Adds the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the additon.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Add(double operand1, double operand2); } // interface ICalculator A remark: I sometimes hear the complaint that xml comment stuff like the above is hard to read. That’s certainly true, but irrelevant to me, because I read xml code comments with the CR_Documentor tool window. And using that, it looks like this: Apart from that, I’m heavily using xml code comments (see e.g. here for a detailed guide) because there is the possibility of automating help generation with nightly CI builds (using MS Sandcastle and the Sandcastle Help File Builder), and then publishing the results to some intranet location. This way, a team always has first class, up-to-date technical documentation at hand about the current codebase. (And, also very important for speeding up things and avoiding typos: You have IntelliSense/AutoCompletion and R# support, and the comments are subject to compiler checking…). Back to our Calculator again: Two more R# – clicks implement the Add() skeleton: ... public double Add(double operand1, double operand2) { throw new NotImplementedException(); } } // class Calculator As we have stated in the interface definition (which actually serves as our requirement document!), the operands are not allowed to be negative. So let’s start implementing that. Here’s the test: [Test] [Row(-0.5, 2)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } As you can see, I’m using a data-driven unit test method here, mainly for these two reasons: Because I know that I will have to do the same test for the second operand in a few seconds, I save myself from implementing another test method for this purpose. Rather, I only will have to add another Row attribute to the existing one. From the test report below, you can see that the argument values are explicitly printed out. This can be a valuable documentation feature even when everything is green: One can quickly review what values were tested exactly - the complete Gallio HTML-report (as it will be produced by the Continuous Integration runs) shows these values in a quite clear format (see below for an example). Back to our Calculator development again, this is what the test result tells us at the moment: So we’re red again, because there is not yet an implementation… Next we go on and implement the necessary parameter verification to become green again, and then we do the same thing for the second operand. To make a long story short, here’s the test and the method implementation at the end of the second cycle: // in CalculatorTest: [Test] [Row(-0.5, 2)] [Row(295, -123)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } // in Calculator: public double Add(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } throw new NotImplementedException(); } So far, we have sheltered our method from unwanted input, and now we can safely operate on the parameters without further caring about their validity (this is my interpretation of the Fail Fast principle, which is regarded here in more detail). Now we can think about the method’s successful outcomes. First let’s write another test for that: [Test] [Row(1, 1, 2)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } Again, I’m regularly using row based test methods for these kinds of unit tests. The above shown pattern proved to be extremely helpful for my development work, I call it the Defined-Input/Expected-Output test idiom: You define your input arguments together with the expected method result. There are two major benefits from that way of testing: In the course of refining a method, it’s very likely to come up with additional test cases. In our case, we might add tests for some edge cases like ‘one of the operands is zero’ or ‘the sum of the two operands causes an overflow’, or maybe there’s an external test protocol that has to be fulfilled (e.g. an ISO norm for medical software), and this results in the need of testing against additional values. In all these scenarios we only have to add another Row attribute to the test. Remember that the argument values are written to the test report, so as a side-effect this produces valuable documentation. (This can become especially important if the fulfillment of some sort of external requirements has to be proven). So your test method might look something like that in the end: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 2)] [Row(0, 999999999, 999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, double.MaxValue)] [Row(4, double.MaxValue - 2.5, double.MaxValue)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } And this will produce the following HTML report (with Gallio): Not bad for the amount of work we invested in it, huh? - There might be scenarios where reports like that can be useful for demonstration purposes during a Scrum sprint review… The last requirement to fulfill is that the LastResult property is expected to store the result of the last operation. I don’t show this here, it’s trivial enough and brings nothing new… And finally: Refactor (for the right reasons) To demonstrate my way of going through the refactoring portion of the red-green-refactor cycle, I added another method to our Calculator component, namely Subtract(). Here’s the code (tests and production): // CalculatorTest.cs: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtract(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, result); } [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtractGivesExpectedLastResult(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, calculator.LastResult); } ... // ICalculator.cs: /// <summary> /// Subtracts the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the subtraction.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Subtract(double operand1, double operand2); ... // Calculator.cs: public double Subtract(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } return (this.LastResult = operand1 - operand2).Value; } Obviously, the argument validation stuff that was produced during the red-green part of our cycle duplicates the code from the previous Add() method. So, to avoid code duplication and minimize the number of code lines of the production code, we do an Extract Method refactoring. One more time, this is only a matter of a few mouse clicks (and giving the new method a name) with R#: Having done that, our production code finally looks like that: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { #region ICalculator public double? LastResult { get; private set; } public double Add(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 + operand2).Value; } public double Subtract(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 - operand2).Value; } #endregion // ICalculator #region Implementation (Helper) private static void ThrowIfOneOperandIsInvalid(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } } #endregion // Implementation (Helper) } // class Calculator } // namespace Calculator But is the above worth the effort at all? It’s obviously trivial and not very impressive. All our tests were green (for the right reasons), and refactoring the code did not change anything. It’s not immediately clear how this refactoring work adds value to the project. Derick puts it like this: STOP! Hold on a second… before you go any further and before you even think about refactoring what you just wrote to make your test pass, you need to understand something: if your done with your requirements after making the test green, you are not required to refactor the code. I know… I’m speaking heresy, here. Toss me to the wolves, I’ve gone over to the dark side! Seriously, though… if your test is passing for the right reasons, and you do not need to write any test or any more code for you class at this point, what value does refactoring add? Derick immediately answers his own question: So why should you follow the refactor portion of red/green/refactor? When you have added code that makes the system less readable, less understandable, less expressive of the domain or concern’s intentions, less architecturally sound, less DRY, etc, then you should refactor it. I couldn’t state it more precise. From my personal perspective, I’d add the following: You have to keep in mind that real-world software systems are usually quite large and there are dozens or even hundreds of occasions where micro-refactorings like the above can be applied. It’s the sum of them all that counts. And to have a good overall quality of the system (e.g. in terms of the Code Duplication Percentage metric) you have to be pedantic on the individual, seemingly trivial cases. My job regularly requires the reading and understanding of ‘foreign’ code. So code quality/readability really makes a HUGE difference for me – sometimes it can be even the difference between project success and failure… Conclusions The above described development process emerged over the years, and there were mainly two things that guided its evolution (you might call it eternal principles, personal beliefs, or anything in between): Test-driven development is the normal, natural way of writing software, code-first is exceptional. So ‘doing TDD or not’ is not a question. And good, stable code can only reliably be produced by doing TDD (yes, I know: many will strongly disagree here again, but I’ve never seen high-quality code – and high-quality code is code that stood the test of time and causes low maintenance costs – that was produced code-first…) It’s the production code that pays our bills in the end. (Though I have seen customers these days who demand an acceptance test battery as part of the final delivery. Things seem to go into the right direction…). The test code serves ‘only’ to make the production code work. But it’s the number of delivered features which solely counts at the end of the day - no matter how much test code you wrote or how good it is. With these two things in mind, I tried to optimize my coding process for coding speed – or, in business terms: productivity - without sacrificing the principles of TDD (more than I’d do either way…). As a result, I consider a ratio of about 3-5/1 for test code vs. production code as normal and desirable. In other words: roughly 60-80% of my code is test code (This might sound heavy, but that is mainly due to the fact that software development standards only begin to evolve. The entire software development profession is very young, historically seen; only at the very beginning, and there are no viable standards yet. If you think about software development as a kind of casting process, where the test code is the mold and the resulting production code is the final product, then the above ratio sounds no longer extraordinary…) Although the above might look like very much unnecessary work at first sight, it’s not. With the aid of the mentioned add-ins, doing all the above is a matter of minutes, sometimes seconds (while writing this post took hours and days…). The most important thing is to have the right tools at hand. Slow developer machines or the lack of a tool or something like that - for ‘saving’ a few 100 bucks - is just not acceptable and a very bad decision in business terms (though I quite some times have seen and heard that…). Production of high-quality products needs the usage of high-quality tools. This is a platitude that every craftsman knows… The here described round-trip will take me about five to ten minutes in my real-world development practice. I guess it’s about 30% more time compared to developing the ‘traditional’ (code-first) way. But the so manufactured ‘product’ is of much higher quality and massively reduces maintenance costs, which is by far the single biggest cost factor, as I showed in this previous post: It's the maintenance, stupid! (or: Something is rotten in developerland.). In the end, this is a highly cost-effective way of software development… But on the other hand, there clearly is a trade-off here: coding speed vs. code quality/later maintenance costs. The here described development method might be a perfect fit for the overwhelming majority of software projects, but there certainly are some scenarios where it’s not - e.g. if time-to-market is crucial for a software project. So this is a business decision in the end. It’s just that you have to know what you’re doing and what consequences this might have… Some last words First, I’d like to thank Derick Bailey again. His two aforementioned posts (which I strongly recommend for reading) inspired me to think deeply about my own personal way of doing TDD and to clarify my thoughts about it. I wouldn’t have done that without this inspiration. I really enjoy that kind of discussions… I agree with him in all respects. But I don’t know (yet?) how to bring his insights into the described production process without slowing things down. The above described method proved to be very “good enough” in my practical experience. But of course, I’m open to suggestions here… My rationale for now is: If the test is initially red during the red-green-refactor cycle, the ‘right reason’ is: it actually calls the right method, but this method is not yet operational. Later on, when the cycle is finished and the tests become part of the regular, automated Continuous Integration process, ‘red’ certainly must occur for the ‘right reason’: in this phase, ‘red’ MUST mean nothing but an unfulfilled assertion - Fail By Assertion, Not By Anything Else!

Read the article
value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article
256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

- by Alan Smith

For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour. This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005> define wooden texture { noise surface { ambient 0.2 diffuse 0.7 specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1 lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7 } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } } viewpoint { from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60 resolution 640, 480 aspect 1.6 image_format 0 } light <-10, 30, 20> light <-10, 30, -20> object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden } object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome } After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue } In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000 As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72 Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: · On-Premise o Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. · Windows Azure o Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment. The architecture of each worker role is shown below. The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" > <WorkerRole name="CloudRayWorkerRole" vmsize="Small"> <Imports> </Imports> <ConfigurationSettings> <Setting name="DataConnectionString" /> </ConfigurationSettings> <LocalResources> <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" /> </LocalResources> </WorkerRole> </ServiceDefinition> The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1. The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2. PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3. DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4. The JPG file is uploaded from local storage to the images blob container. 5. A message is placed on the images queue to indicate a new image is available for download. 6. The job message is deleted from the job queue. 7. The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() { // Set environment variables string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation); string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation); LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder"); string localStorageRootPath = rayStorage.RootPath; JobQueue jobQueue = new JobQueue("renderjobs"); JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs"); CloudRayBlob sceneBlob = new CloudRayBlob("scenes"); CloudRayBlob imageBlob = new CloudRayBlob("images"); RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource(); Frames = 0; while (true) { // Get the render job from the queue CloudQueueMessage jobMsg = jobQueue.Get(); if (jobMsg != null) { // Get the file details string sceneFile = jobMsg.AsString; string tgaFile = sceneFile.Replace(".pi", ".tga"); string jpgFile = sceneFile.Replace(".pi", ".jpg"); string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile); string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile); string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile); // Copy the scene file to local storage sceneBlob.DownloadFile(sceneFilePath); // Run the ray tracer. string polyrayArguments = string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath); Process polyRayProcess = new Process(); polyRayProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath); polyRayProcess.StartInfo.Arguments = polyrayArguments; polyRayProcess.Start(); polyRayProcess.WaitForExit(); // Convert the image string dtaArguments = string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath)); Process dtaProcess = new Process(); dtaProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath); dtaProcess.StartInfo.Arguments = dtaArguments; dtaProcess.Start(); dtaProcess.WaitForExit(); // Upload the image to blob storage imageBlob.UploadFile(jpgFilePath); // Add a download job. downloadQueue.Add(jpgFile); // Delete the render job message jobQueue.Delete(jobMsg); Frames++; } else { Thread.Sleep(1000); } // Log the worker role activity. roleLifecycleDataSource.Alive ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames); } } Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage. public class RoleLifecycle : TableServiceEntity { public string ServerName { get; set; } public string Status { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } public long SecondsRunning { get; set; } public DateTime LastActiveTime { get; set; } public int Frames { get; set; } public string Comment { get; set; } public RoleLifecycle() { } public RoleLifecycle(string roleName) { PartitionKey = roleName; RowKey = Utils.GetAscendingRowKey(); Status = "Started"; StartTime = DateTime.UtcNow; LastActiveTime = StartTime; EndTime = StartTime; SecondsRunning = 0; Frames = 0; } } A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="16" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames. Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation. With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes. At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="256" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames. Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds. We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes. The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each. Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. · Windows Azure can provide massive compute power, on demand, in a matter of minutes. · The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. · Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. · No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. · Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. · Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. · Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. · Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. · If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. · Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. · Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Disable .htaccess from apache allowoverride none, still reads .htaccess files

- by John Magnolia

I have moved all of our .htaccess config into <Directory> blocks and set AllowOverride None in the default and default-ssl. Although after restarting apache it is still reading the .htaccess files. How can I completely turn off reading these files? Update of all files with "AllowOverride" /etc/apache2/mods-available/userdir.conf <IfModule mod_userdir.c> UserDir public_html UserDir disabled root <Directory /home/*/public_html> AllowOverride FileInfo AuthConfig Limit Indexes Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec <Limit GET POST OPTIONS> Order allow,deny Allow from all </Limit> <LimitExcept GET POST OPTIONS> Order deny,allow Deny from all </LimitExcept> </Directory> </IfModule> /etc/apache2/mods-available/alias.conf <IfModule alias_module> # # Aliases: Add here as many aliases as you need (with no limit). The format is # Alias fakename realname # # Note that if you include a trailing / on fakename then the server will # require it to be present in the URL. So "/icons" isn't aliased in this # example, only "/icons/". If the fakename is slash-terminated, then the # realname must also be slash terminated, and if the fakename omits the # trailing slash, the realname must also omit it. # # We include the /icons/ alias for FancyIndexed directory listings. If # you do not use FancyIndexing, you may comment this out. # Alias /icons/ "/usr/share/apache2/icons/" <Directory "/usr/share/apache2/icons"> Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all </Directory> </IfModule> /etc/apache2/httpd.conf # # Directives to allow use of AWStats as a CGI # Alias /awstatsclasses "/usr/share/doc/awstats/examples/wwwroot/classes/" Alias /awstatscss "/usr/share/doc/awstats/examples/wwwroot/css/" Alias /awstatsicons "/usr/share/doc/awstats/examples/wwwroot/icon/" ScriptAlias /awstats/ "/usr/share/doc/awstats/examples/wwwroot/cgi-bin/" # # This is to permit URL access to scripts/files in AWStats directory. # <Directory "/usr/share/doc/awstats/examples/wwwroot"> Options None AllowOverride None Order allow,deny Allow from all </Directory> Alias /awstats-icon/ /usr/share/awstats/icon/ <Directory /usr/share/awstats/icon> Options None AllowOverride None Order allow,deny Allow from all </Directory> /etc/apache2/sites-available/default-ssl <IfModule mod_ssl.c> <VirtualHost _default_:443> ServerAdmin webmaster@localhost DocumentRoot /var/www <Directory /> Options FollowSymLinks AllowOverride None </Directory> <Directory /var/www/> Options Indexes FollowSymLinks MultiViews AllowOverride None </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/ssl_access.log combined # SSL Engine Switch: # Enable/Disable SSL for this virtual host. SSLEngine on # A self-signed (snakeoil) certificate can be created by installing # the ssl-cert package. See # /usr/share/doc/apache2.2-common/README.Debian.gz for more info. # If both key and certificate are stored in the same file, only the # SSLCertificateFile directive is needed. SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key # Server Certificate Chain: # Point SSLCertificateChainFile at a file containing the # concatenation of PEM encoded CA certificates which form the # certificate chain for the server certificate. Alternatively # the referenced file can be the same as SSLCertificateFile # when the CA certificates are directly appended to the server # certificate for convinience. #SSLCertificateChainFile /etc/apache2/ssl.crt/server-ca.crt # Certificate Authority (CA): # Set the CA certificate verification path where to find CA # certificates for client authentication or alternatively one # huge file containing all of them (file must be PEM encoded) # Note: Inside SSLCACertificatePath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCACertificatePath /etc/ssl/certs/ #SSLCACertificateFile /etc/apache2/ssl.crt/ca-bundle.crt # Certificate Revocation Lists (CRL): # Set the CA revocation path where to find CA CRLs for client # authentication or alternatively one huge file containing all # of them (file must be PEM encoded) # Note: Inside SSLCARevocationPath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCARevocationPath /etc/apache2/ssl.crl/ #SSLCARevocationFile /etc/apache2/ssl.crl/ca-bundle.crl # Client Authentication (Type): # Client certificate verification type and depth. Types are # none, optional, require and optional_no_ca. Depth is a # number which specifies how deeply to verify the certificate # issuer chain before deciding the certificate is not valid. #SSLVerifyClient require #SSLVerifyDepth 10 # Access Control: # With SSLRequire you can do per-directory access control based # on arbitrary complex boolean expressions containing server # variable checks and other lookup directives. The syntax is a # mixture between C and Perl. See the mod_ssl documentation # for more details. #<Location /> #SSLRequire ( %{SSL_CIPHER} !~ m/^(EXP|NULL)/ \ # and %{SSL_CLIENT_S_DN_O} eq "Snake Oil, Ltd." \ # and %{SSL_CLIENT_S_DN_OU} in {"Staff", "CA", "Dev"} \ # and %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 \ # and %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 20 ) \ # or %{REMOTE_ADDR} =~ m/^192\.76\.162\.[0-9]+$/ #</Location> # SSL Engine Options: # Set various options for the SSL engine. # o FakeBasicAuth: # Translate the client X.509 into a Basic Authorisation. This means that # the standard Auth/DBMAuth methods can be used for access control. The # user name is the `one line' version of the client's X.509 certificate. # Note that no password is obtained from the user. Every entry in the user # file needs this password: `xxj31ZMTZzkVA'. # o ExportCertData: # This exports two additional environment variables: SSL_CLIENT_CERT and # SSL_SERVER_CERT. These contain the PEM-encoded certificates of the # server (always existing) and the client (only existing when client # authentication is used). This can be used to import the certificates # into CGI scripts. # o StdEnvVars: # This exports the standard SSL/TLS related `SSL_*' environment variables. # Per default this exportation is switched off for performance reasons, # because the extraction step is an expensive operation and is usually # useless for serving static content. So one usually enables the # exportation for CGI and SSI requests only. # o StrictRequire: # This denies access when "SSLRequireSSL" or "SSLRequire" applied even # under a "Satisfy any" situation, i.e. when it applies access is denied # and no other module can change it. # o OptRenegotiate: # This enables optimized SSL connection renegotiation handling when SSL # directives are used in per-directory context. #SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire <FilesMatch "\.(cgi|shtml|phtml|php)$"> SSLOptions +StdEnvVars </FilesMatch> <Directory /usr/lib/cgi-bin> SSLOptions +StdEnvVars </Directory> # SSL Protocol Adjustments: # The safe and default but still SSL/TLS standard compliant shutdown # approach is that mod_ssl sends the close notify alert but doesn't wait for # the close notify alert from client. When you need a different shutdown # approach you can use one of the following variables: # o ssl-unclean-shutdown: # This forces an unclean shutdown when the connection is closed, i.e. no # SSL close notify alert is send or allowed to received. This violates # the SSL/TLS standard but is needed for some brain-dead browsers. Use # this when you receive I/O errors because of the standard approach where # mod_ssl sends the close notify alert. # o ssl-accurate-shutdown: # This forces an accurate shutdown when the connection is closed, i.e. a # SSL close notify alert is send and mod_ssl waits for the close notify # alert of the client. This is 100% SSL/TLS standard compliant, but in # practice often causes hanging connections with brain-dead browsers. Use # this only for browsers where you know that their SSL implementation # works correctly. # Notice: Most problems of broken clients are also related to the HTTP # keep-alive facility, so you usually additionally want to disable # keep-alive for those clients, too. Use variable "nokeepalive" for this. # Similarly, one has to force some clients to use HTTP/1.0 to workaround # their broken HTTP/1.1 implementation. Use variables "downgrade-1.0" and # "force-response-1.0" for this. BrowserMatch "MSIE [2-6]" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 # MSIE 7 and newer should be able to use keepalive BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown </VirtualHost> </IfModule> /etc/apache2/sites-available/default <VirtualHost *:80> ServerAdmin webmaster@localhost DocumentRoot /var/www <Directory /> Options FollowSymLinks AllowOverride None </Directory> <Directory /var/www/> Options -Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> Alias /delboy /usr/share/phpmyadmin <Directory /usr/share/phpmyadmin> # Restrict phpmyadmin access Order Deny,Allow Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/access.log combined Alias /doc/ "/usr/share/doc/" <Directory "/usr/share/doc/"> Options Indexes MultiViews FollowSymLinks AllowOverride None Order deny,allow Deny from all Allow from 127.0.0.0/255.0.0.0 ::1/128 </Directory> </VirtualHost> /etc/apache2/conf.d/security # # Disable access to the entire file system except for the directories that # are explicitly allowed later. # # This currently breaks the configurations that come with some web application # Debian packages. # #<Directory /> # AllowOverride None # Order Deny,Allow # Deny from all #</Directory> # Changing the following options will not really affect the security of the # server, but might make attacks slightly more difficult in some cases. # # ServerTokens # This directive configures what you return as the Server HTTP response # Header. The default is 'Full' which sends information about the OS-Type # and compiled in modules. # Set to one of: Full | OS | Minimal | Minor | Major | Prod # where Full conveys the most information, and Prod the least. # #ServerTokens Minimal ServerTokens OS #ServerTokens Full # # Optionally add a line containing the server version and virtual host # name to server-generated pages (internal error documents, FTP directory # listings, mod_status and mod_info output etc., but not CGI generated # documents or custom error documents). # Set to "EMail" to also include a mailto: link to the ServerAdmin. # Set to one of: On | Off | EMail # #ServerSignature Off ServerSignature On # # Allow TRACE method # # Set to "extended" to also reflect the request body (only for testing and # diagnostic purposes). # # Set to one of: On | Off | extended # TraceEnable Off #TraceEnable On /etc/apache2/apache2.conf # # Based upon the NCSA server configuration files originally by Rob McCool. # # This is the main Apache server configuration file. It contains the # configuration directives that give the server its instructions. # See http://httpd.apache.org/docs/2.2/ for detailed information about # the directives. # # Do NOT simply read the instructions in here without understanding # what they do. They're here only as hints or reminders. If you are unsure # consult the online docs. You have been warned. # # The configuration directives are grouped into three basic sections: # 1. Directives that control the operation of the Apache server process as a # whole (the 'global environment'). # 2. Directives that define the parameters of the 'main' or 'default' server, # which responds to requests that aren't handled by a virtual host. # These directives also provide default values for the settings # of all virtual hosts. # 3. Settings for virtual hosts, which allow Web requests to be sent to # different IP addresses or hostnames and have them handled by the # same Apache server process. # # Configuration and logfile names: If the filenames you specify for many # of the server's control files begin with "/" (or "drive:/" for Win32), the # server will use that explicit path. If the filenames do *not* begin # with "/", the value of ServerRoot is prepended -- so "foo.log" # with ServerRoot set to "/etc/apache2" will be interpreted by the # server as "/etc/apache2/foo.log". # ### Section 1: Global Environment # # The directives in this section affect the overall operation of Apache, # such as the number of concurrent requests it can handle or where it # can find its configuration files. # # # ServerRoot: The top of the directory tree under which the server's # configuration, error, and log files are kept. # # NOTE! If you intend to place this on an NFS (or otherwise network) # mounted filesystem then please read the LockFile documentation (available # at <URL:http://httpd.apache.org/docs/2.2/mod/mpm_common.html#lockfile>); # you will save yourself a lot of trouble. # # Do NOT add a slash at the end of the directory path. # #ServerRoot "/etc/apache2" # # The accept serialization lock file MUST BE STORED ON A LOCAL DISK. # LockFile ${APACHE_LOCK_DIR}/accept.lock # # PidFile: The file in which the server should record its process # identification number when it starts. # This needs to be set in /etc/apache2/envvars # PidFile ${APACHE_PID_FILE} # # Timeout: The number of seconds before receives and sends time out. # Timeout 300 # # KeepAlive: Whether or not to allow persistent connections (more than # one request per connection). Set to "Off" to deactivate. # KeepAlive On # # MaxKeepAliveRequests: The maximum number of requests to allow # during a persistent connection. Set to 0 to allow an unlimited amount. # We recommend you leave this number high, for maximum performance. # MaxKeepAliveRequests 100 # # KeepAliveTimeout: Number of seconds to wait for the next request from the # same client on the same connection. # KeepAliveTimeout 4 ## ## Server-Pool Size Regulation (MPM specific) ## # prefork MPM # StartServers: number of server processes to start # MinSpareServers: minimum number of server processes which are kept spare # MaxSpareServers: maximum number of server processes which are kept spare # MaxClients: maximum number of server processes allowed to start # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_prefork_module> StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 500 </IfModule> # worker MPM # StartServers: initial number of server processes to start # MaxClients: maximum number of simultaneous client connections # MinSpareThreads: minimum number of worker threads which are kept spare # MaxSpareThreads: maximum number of worker threads which are kept spare # ThreadLimit: ThreadsPerChild can be changed to this maximum value during a # graceful restart. ThreadLimit can only be changed by stopping # and starting Apache. # ThreadsPerChild: constant number of worker threads in each server process # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_worker_module> StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0 </IfModule> # event MPM # StartServers: initial number of server processes to start # MaxClients: maximum number of simultaneous client connections # MinSpareThreads: minimum number of worker threads which are kept spare # MaxSpareThreads: maximum number of worker threads which are kept spare # ThreadsPerChild: constant number of worker threads in each server process # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_event_module> StartServers 2 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule> # These need to be set in /etc/apache2/envvars User ${APACHE_RUN_USER} Group ${APACHE_RUN_GROUP} # # AccessFileName: The name of the file to look for in each directory # for additional configuration directives. See also the AllowOverride # directive. # AccessFileName .htaccess # # The following lines prevent .htaccess and .htpasswd files from being # viewed by Web clients. # <Files ~ "^\.ht"> Order allow,deny Deny from all Satisfy all </Files> # # DefaultType is the default MIME type the server will use for a document # if it cannot otherwise determine one, such as from filename extensions. # If your server contains mostly text or HTML documents, "text/plain" is # a good value. If most of your content is binary, such as applications # or images, you may want to use "application/octet-stream" instead to # keep browsers from trying to display binary files as though they are # text. # DefaultType text/plain # # HostnameLookups: Log the names of clients or just their IP addresses # e.g., www.apache.org (on) or 204.62.129.132 (off). # The default is off because it'd be overall better for the net if people # had to knowingly turn this feature on, since enabling it means that # each client request will result in AT LEAST one lookup request to the # nameserver. # HostnameLookups Off # ErrorLog: The location of the error log file. # If you do not specify an ErrorLog directive within a <VirtualHost> # container, error messages relating to that virtual host will be # logged here. If you *do* define an error logfile for a <VirtualHost> # container, that host's errors will be logged there and not here. # ErrorLog ${APACHE_LOG_DIR}/error.log # # LogLevel: Control the number of messages logged to the error_log. # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. # LogLevel warn # Include module configuration: Include mods-enabled/*.load Include mods-enabled/*.conf # Include all the user configurations: Include httpd.conf # Include ports listing Include ports.conf # # The following directives define some format nicknames for use with # a CustomLog directive (see below). # If you are behind a reverse proxy, you might want to change %h into %{X-Forwarded-For}i # LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %O" common LogFormat "%{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent # Include of directories ignores editors' and dpkg's backup files, # see README.Debian for details. # Include generic snippets of statements Include conf.d/ # Include the virtual host configurations: Include sites-enabled/

Read the article
UIImagePickerController, UIImage, Memory and More!

- by Itay

I've noticed that there are many questions about how to handle UIImage objects, especially in conjunction with UIImagePickerController and then displaying it in a view (usually a UIImageView). Here is a collection of common questions and their answers. Feel free to edit and add your own. I obviously learnt all this information from somewhere too. Various forum posts, StackOverflow answers and my own experimenting brought me to all these solutions. Credit goes to those who posted some sample code that I've since used and modified. I don't remember who you all are - but hats off to you! How Do I Select An Image From the User's Images or From the Camera? You use UIImagePickerController. The documentation for the class gives a decent overview of how one would use it, and can be found here. Basically, you create an instance of the class, which is a modal view controller, display it, and set yourself (or some class) to be the delegate. Then you'll get notified when a user selects some form of media (movie or image in 3.0 on the 3GS), and you can do whatever you want. My Delegate Was Called - How Do I Get The Media? The delegate method signature is the following: - (void)imagePickerController:(UIImagePickerController *)picker didFinishPickingMediaWithInfo:(NSDictionary *)info; You should put a breakpoint in the debugger to see what's in the dictionary, but you use that to extract the media. For example: UIImage* image = [info objectForKey:UIImagePickerControllerOriginalImage]; There are other keys that work as well, all in the documentation. OK, I Got The Image, But It Doesn't Have Any Geolocation Data. What gives? Unfortunately, Apple decided that we're not worthy of this information. When they load the data into the UIImage, they strip it of all the EXIF/Geolocation data. Can I Get To The Original File Representing This Image on the Disk? Nope. For security purposes, you only get the UIImage. How Can I Look At The Underlying Pixels of the UIImage? Since the UIImage is immutable, you can't look at the direct pixels. However, you can make a copy. The code to this looks something like this: UIImage* image = ...; // An image NSData* pixelData = (NSData*) CGDataProviderCopyData(CGImageGetDataProvider(image.CGImage)); unsigned char* pixelBytes = (unsigned char *)[pixelData bytes]; // Take away the red pixel, assuming 32-bit RGBA for(int i = 0; i < [pixelData length]; i += 4) { pixelBytes[i] = 0; // red pixelBytes[i+1] = pixelBytes[i+1]; // green pixelBytes[i+2] = pixelBytes[i+2]; // blue pixelBytes[i+3] = pixelBytes[i+3]; // alpha } However, note that CGDataProviderCopyData provides you with an "immutable" reference to the data - meaning you can't change it (and you may get a BAD_ACCESS error if you do). Look at the next question if you want to see how you can modify the pixels. How Do I Modify The Pixels of the UIImage? The UIImage is immutable, meaning you can't change it. Apple posted a great article on how to get a copy of the pixels and modify them, and rather than copy and paste it here, you should just go read the article. Once you have the bitmap context as they mention in the article, you can do something similar to this to get a new UIImage with the modified pixels: CGImageRef ref = CGBitmapContextCreateImage(bitmap); UIImage* newImage = [UIImage imageWithCGImage:ref]; Do remember to release your references though, otherwise you're going to be leaking quite a bit of memory. After I Select 3 Images From The Camera, I Run Out Of Memory. Help! You have to remember that even though on disk these images take up only a few hundred kilobytes at most, that's because they're compressed as a PNG or JPG. When they are loaded into the UIImage, they become uncompressed. A quick over-the-envelope calculation would be: width x height x 4 = bytes in memory That's assuming 32-bit pixels. If you have 16-bit pixels (some JPGs are stored as RGBA-5551), then you'd replace the 4 with a 2. Now, images taken with the camera are 1600 x 1200 pixels, so let's do the math: 1600 x 1200 x 4 = 7,680,000 bytes = ~8 MB 8 MB is a lot, especially when you have a limit of around 24 MB for your application. That's why you run out of memory. OK, I Understand Why I Have No Memory. What Do I Do? There is never any reason to display images at their full resolution. The iPhone has a screen of 480 x 320 pixels, so you're just wasting space. If you find yourself in this situation, ask yourself the following question: Do I need the full resolution image? If the answer is yes, then you should save it to disk for later use. If the answer is no, then read the next part. Once you've decided what to do with the full-resolution image, then you need to create a smaller image to use for displaying. Many times you might even want several sizes for your image: a thumbnail, a full-size one for displaying, and the original full-resolution image. OK, I'm Hooked. How Do I Resize the Image? Unfortunately, there is no defined way how to resize an image. Also, it's important to note that when you resize it, you'll get a new image - you're not modifying the old one. There are a couple of methods to do the resizing. I'll present them both here, and explain the pros and cons of each. Method 1: Using UIKit + (UIImage*)imageWithImage:(UIImage*)image scaledToSize:(CGSize)newSize; { // Create a graphics image context UIGraphicsBeginImageContext(newSize); // Tell the old image to draw in this new context, with the desired // new size [image drawInRect:CGRectMake(0,0,newSize.width,newSize.height)]; // Get the new image from the context UIImage* newImage = UIGraphicsGetImageFromCurrentImageContext(); // End the context UIGraphicsEndImageContext(); // Return the new image. return newImage; } This method is very simple, and works great. It will also deal with the UIImageOrientation for you, meaning that you don't have to care whether the camera was sideways when the picture was taken. However, this method is not thread safe, and since thumbnailing is a relatively expensive operation (approximately ~2.5s on a 3G for a 1600 x 1200 pixel image), this is very much an operation you may want to do in the background, on a separate thread. Method 2: Using CoreGraphics + (UIImage*)imageWithImage:(UIImage*)sourceImage scaledToSize:(CGSize)newSize; { CGFloat targetWidth = targetSize.width; CGFloat targetHeight = targetSize.height; CGImageRef imageRef = [sourceImage CGImage]; CGBitmapInfo bitmapInfo = CGImageGetBitmapInfo(imageRef); CGColorSpaceRef colorSpaceInfo = CGImageGetColorSpace(imageRef); if (bitmapInfo == kCGImageAlphaNone) { bitmapInfo = kCGImageAlphaNoneSkipLast; } CGContextRef bitmap; if (sourceImage.imageOrientation == UIImageOrientationUp || sourceImage.imageOrientation == UIImageOrientationDown) { bitmap = CGBitmapContextCreate(NULL, targetWidth, targetHeight, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, bitmapInfo); } else { bitmap = CGBitmapContextCreate(NULL, targetHeight, targetWidth, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, bitmapInfo); } if (sourceImage.imageOrientation == UIImageOrientationLeft) { CGContextRotateCTM (bitmap, radians(90)); CGContextTranslateCTM (bitmap, 0, -targetHeight); } else if (sourceImage.imageOrientation == UIImageOrientationRight) { CGContextRotateCTM (bitmap, radians(-90)); CGContextTranslateCTM (bitmap, -targetWidth, 0); } else if (sourceImage.imageOrientation == UIImageOrientationUp) { // NOTHING } else if (sourceImage.imageOrientation == UIImageOrientationDown) { CGContextTranslateCTM (bitmap, targetWidth, targetHeight); CGContextRotateCTM (bitmap, radians(-180.)); } CGContextDrawImage(bitmap, CGRectMake(0, 0, targetWidth, targetHeight), imageRef); CGImageRef ref = CGBitmapContextCreateImage(bitmap); UIImage* newImage = [UIImage imageWithCGImage:ref]; CGContextRelease(bitmap); CGImageRelease(ref); return newImage; } The benefit of this method is that it is thread-safe, plus it takes care of all the small things (using correct color space and bitmap info, dealing with image orientation) that the UIKit version does. How Do I Resize and Maintain Aspect Ratio (like the AspectFill option)? It is very similar to the method above, and it looks like this: + (UIImage*)imageWithImage:(UIImage*)sourceImage scaledToSizeWithSameAspectRatio:(CGSize)targetSize; { CGSize imageSize = sourceImage.size; CGFloat width = imageSize.width; CGFloat height = imageSize.height; CGFloat targetWidth = targetSize.width; CGFloat targetHeight = targetSize.height; CGFloat scaleFactor = 0.0; CGFloat scaledWidth = targetWidth; CGFloat scaledHeight = targetHeight; CGPoint thumbnailPoint = CGPointMake(0.0,0.0); if (CGSizeEqualToSize(imageSize, targetSize) == NO) { CGFloat widthFactor = targetWidth / width; CGFloat heightFactor = targetHeight / height; if (widthFactor > heightFactor) { scaleFactor = widthFactor; // scale to fit height } else { scaleFactor = heightFactor; // scale to fit width } scaledWidth = width * scaleFactor; scaledHeight = height * scaleFactor; // center the image if (widthFactor > heightFactor) { thumbnailPoint.y = (targetHeight - scaledHeight) * 0.5; } else if (widthFactor < heightFactor) { thumbnailPoint.x = (targetWidth - scaledWidth) * 0.5; } } CGImageRef imageRef = [sourceImage CGImage]; CGBitmapInfo bitmapInfo = CGImageGetBitmapInfo(imageRef); CGColorSpaceRef colorSpaceInfo = CGImageGetColorSpace(imageRef); if (bitmapInfo == kCGImageAlphaNone) { bitmapInfo = kCGImageAlphaNoneSkipLast; } CGContextRef bitmap; if (sourceImage.imageOrientation == UIImageOrientationUp || sourceImage.imageOrientation == UIImageOrientationDown) { bitmap = CGBitmapContextCreate(NULL, targetWidth, targetHeight, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, bitmapInfo); } else { bitmap = CGBitmapContextCreate(NULL, targetHeight, targetWidth, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, bitmapInfo); } // In the right or left cases, we need to switch scaledWidth and scaledHeight, // and also the thumbnail point if (sourceImage.imageOrientation == UIImageOrientationLeft) { thumbnailPoint = CGPointMake(thumbnailPoint.y, thumbnailPoint.x); CGFloat oldScaledWidth = scaledWidth; scaledWidth = scaledHeight; scaledHeight = oldScaledWidth; CGContextRotateCTM (bitmap, radians(90)); CGContextTranslateCTM (bitmap, 0, -targetHeight); } else if (sourceImage.imageOrientation == UIImageOrientationRight) { thumbnailPoint = CGPointMake(thumbnailPoint.y, thumbnailPoint.x); CGFloat oldScaledWidth = scaledWidth; scaledWidth = scaledHeight; scaledHeight = oldScaledWidth; CGContextRotateCTM (bitmap, radians(-90)); CGContextTranslateCTM (bitmap, -targetWidth, 0); } else if (sourceImage.imageOrientation == UIImageOrientationUp) { // NOTHING } else if (sourceImage.imageOrientation == UIImageOrientationDown) { CGContextTranslateCTM (bitmap, targetWidth, targetHeight); CGContextRotateCTM (bitmap, radians(-180.)); } CGContextDrawImage(bitmap, CGRectMake(thumbnailPoint.x, thumbnailPoint.y, scaledWidth, scaledHeight), imageRef); CGImageRef ref = CGBitmapContextCreateImage(bitmap); UIImage* newImage = [UIImage imageWithCGImage:ref]; CGContextRelease(bitmap); CGImageRelease(ref); return newImage; } The method we employ here is to create a bitmap with the desired size, but draw an image that is actually larger, thus maintaining the aspect ratio. So We've Got Our Scaled Images - How Do I Save Them To Disk? This is pretty simple. Remember that we want to save a compressed version to disk, and not the uncompressed pixels. Apple provides two functions that help us with this (documentation is here): NSData* UIImagePNGRepresentation(UIImage *image); NSData* UIImageJPEGRepresentation (UIImage *image, CGFloat compressionQuality); And if you want to use them, you'd do something like: UIImage* myThumbnail = ...; // Get some image NSData* imageData = UIImagePNGRepresentation(myThumbnail); Now we're ready to save it to disk, which is the final step (say into the documents directory): // Give a name to the file NSString* imageName = @"MyImage.png"; // Now, we have to find the documents directory so we can save it // Note that you might want to save it elsewhere, like the cache directory, // or something similar. NSArray* paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES); NSString* documentsDirectory = [paths objectAtIndex:0]; // Now we get the full path to the file NSString* fullPathToFile = [documentsDirectory stringByAppendingPathComponent:imageName]; // and then we write it out [imageData writeToFile:fullPathToFile atomically:NO]; You would repeat this for every version of the image you have. How Do I Load These Images Back Into Memory? Just look at the various UIImage initialization methods, such as +imageWithContentsOfFile: in the Apple documentation.

Read the article

Search Results

Search found 1614 results on 65 pages for 'emps (expensive managemen'.

Page 65/65 | < Previous Page | 61 62 63 64 65

- by Rick Strahl

- by Ali Bahrami

- by Shaun

- by Rhames

- by dhanke

- by ngl5000

- by Timo Huovinen

- by Michael Williamson

- by Thomas Weller

- by john.rose

- by Alan Smith

- by rchrd

- by John Magnolia

- by Itay

< Previous Page | 61 62 63 64 65