Search Results

Search found 7216 results on 289 pages for 'low cost'.

Page 287/289 | < Previous Page | 283 284 285 286 287 288 289 | Next Page >

The broken Promise of the Mobile Web

- by Rick Strahl

High end mobile devices have been with us now for almost 7 years and they have utterly transformed the way we access information. Mobile phones and smartphones that have access to the Internet and host smart applications are in the hands of a large percentage of the population of the world. In many places even very remote, cell phones and even smart phones are a common sight. I’ll never forget when I was in India in 2011 I was up in the Southern Indian mountains riding an elephant out of a tiny local village, with an elephant herder in front riding atop of the elephant in front of us. He was dressed in traditional garb with the loin wrap and head cloth/turban as did quite a few of the locals in this small out of the way and not so touristy village. So we’re slowly trundling along in the forest and he’s lazily using his stick to guide the elephant and… 10 minutes in he pulls out his cell phone from his sash and starts texting. In the middle of texting a huge pig jumps out from the side of the trail and he takes a picture running across our path in the jungle! So yeah, mobile technology is very pervasive and it’s reached into even very buried and unexpected parts of this world. Apps are still King Apps currently rule the roost when it comes to mobile devices and the applications that run on them. If there’s something that you need on your mobile device your first step usually is to look for an app, not use your browser. But native app development remains a pain in the butt, with the requirement to have to support 2 or 3 completely separate platforms. There are solutions that try to bridge that gap. Xamarin is on a tear at the moment, providing their cross-device toolkit to build applications using C#. While Xamarin tools are impressive – and also *very* expensive – they only address part of the development madness that is app development. There are still specific device integration isssues, dealing with the different developer programs, security and certificate setups and all that other noise that surrounds app development. There’s also PhoneGap/Cordova which provides a hybrid solution that involves creating local HTML/CSS/JavaScript based applications, and then packaging them to run in a specialized App container that can run on most mobile device platforms using a WebView interface. This allows for using of HTML technology, but it also still requires all the set up, configuration of APIs, security keys and certification and submission and deployment process just like native applications – you actually lose many of the benefits that Web based apps bring. The big selling point of Cordova is that you get to use HTML have the ability to build your UI once for all platforms and run across all of them – but the rest of the app process remains in place. Apps can be a big pain to create and manage especially when we are talking about specialized or vertical business applications that aren’t geared at the mainstream market and that don’t fit the ‘store’ model. If you’re building a small intra department application you don’t want to deal with multiple device platforms and certification etc. for various public or corporate app stores. That model is simply not a good fit both from the development and deployment perspective. Even for commercial, big ticket apps, HTML as a UI platform offers many advantages over native, from write-once run-anywhere, to remote maintenance, single point of management and failure to having full control over the application as opposed to have the app store overloads censor you. In a lot of ways Web based HTML/CSS/JavaScript applications have so much potential for building better solutions based on existing Web technologies for the very same reasons a lot of content years ago moved off the desktop to the Web. To me the Web as a mobile platform makes perfect sense, but the reality of today’s Mobile Web unfortunately looks a little different… Where’s the Love for the Mobile Web? Yet here we are in the middle of 2014, nearly 7 years after the first iPhone was released and brought the promise of rich interactive information at your fingertips, and yet we still don’t really have a solid mobile Web platform. I know what you’re thinking: “But we have lots of HTML/JavaScript/CSS features that allows us to build nice mobile interfaces”. I agree to a point – it’s actually quite possible to build nice looking, rich and capable Web UI today. We have media queries to deal with varied display sizes, CSS transforms for smooth animations and transitions, tons of CSS improvements in CSS 3 that facilitate rich layout, a host of APIs geared towards mobile device features and lately even a number of JavaScript framework choices that facilitate development of multi-screen apps in a consistent manner. Personally I’ve been working a lot with AngularJs and heavily modified Bootstrap themes to build mobile first UIs and that’s been working very well to provide highly usable and attractive UI for typical mobile business applications. From the pure UI perspective things actually look very good. Not just about the UI But it’s not just about the UI - it’s also about integration with the mobile device. When it comes to putting all those pieces together into what amounts to a consolidated platform to build mobile Web applications, I think we still have a ways to go… there are a lot of missing pieces to make it all work together and integrate with the device more smoothly, and more importantly to make it work uniformly across the majority of devices. I think there are a number of reasons for this. Slow Standards Adoption HTML standards implementations and ratification has been dreadfully slow, and browser vendors all seem to pick and choose different pieces of the technology they implement. The end result is that we have a capable UI platform that’s missing some of the infrastructure pieces to make it whole on mobile devices. There’s lots of potential but what is lacking that final 10% to build truly compelling mobile applications that can compete favorably with native applications. Some of it is the fragmentation of browsers and the slow evolution of the mobile specific HTML APIs. A host of mobile standards exist but many of the standards are in the early review stage and they have been there stuck for long periods of time and seem to move at a glacial pace. Browser vendors seem even slower to implement them, and for good reason – non-ratified standards mean that implementations may change and vendor implementations tend to be experimental and likely have to be changed later. Neither Vendors or developers are not keen on changing standards. This is the typical chicken and egg scenario, but without some forward momentum from some party we end up stuck in the mud. It seems that either the standards bodies or the vendors need to carry the torch forward and that doesn’t seem to be happening quickly enough. Mobile Device Integration just isn’t good enough Current standards are not far reaching enough to address a number of the use case scenarios necessary for many mobile applications. While not every application needs to have access to all mobile device features, almost every mobile application could benefit from some integration with other parts of the mobile device platform. Integration with GPS, phone, media, messaging, notifications, linking and contacts system are benefits that are unique to mobile applications and could be widely used, but are mostly (with the exception of GPS) inaccessible for Web based applications today. Unfortunately trying to do most of this today only with a mobile Web browser is a losing battle. Aside from PhoneGap/Cordova’s app centric model with its own custom API accessing mobile device features and the token exception of the GeoLocation API, most device integration features are not widely supported by the current crop of mobile browsers. For example there’s no usable messaging API that allows access to SMS or contacts from HTML. Even obvious components like the Media Capture API are only implemented partially by mobile devices. There are alternatives and workarounds for some of these interfaces by using browser specific code, but that’s might ugly and something that I thought we were trying to leave behind with newer browser standards. But it’s not quite working out that way. It’s utterly perplexing to me that mobile standards like Media Capture and Streams, Media Gallery Access, Responsive Images, Messaging API, Contacts Manager API have only minimal or no traction at all today. Keep in mind we’ve had mobile browsers for nearly 7 years now, and yet we still have to think about how to get access to an image from the image gallery or the camera on some devices? Heck Windows Phone IE Mobile just gained the ability to upload images recently in the Windows 8.1 Update – that’s feature that HTML has had for 20 years! These are simple concepts and common problems that should have been solved a long time ago. It’s extremely frustrating to see build 90% of a mobile Web app with relative ease and then hit a brick wall for the remaining 10%, which often can be show stoppers. The remaining 10% have to do with platform integration, browser differences and working around the limitations that browsers and ‘pinned’ applications impose on HTML applications. The maddening part is that these limitations seem arbitrary as they could easily work on all mobile platforms. For example, SMS has a URL Moniker interface that sort of works on Android, works badly with iOS (only works if the address is already in the contact list) and not at all on Windows Phone. There’s no reason this shouldn’t work universally using the same interface – after all all phones have supported SMS since before the year 2000! But, it doesn’t have to be this way Change can happen very quickly. Take the GeoLocation API for example. Geolocation has taken off at the very beginning of the mobile device era and today it works well, provides the necessary security (a big concern for many mobile APIs), and is supported by just about all major mobile and even desktop browsers today. It handles security concerns via prompts to avoid unwanted access which is a model that would work for most other device APIs in a similar fashion. One time approval and occasional re-approval if code changes or caches expire. Simple and only slightly intrusive. It all works well, even though GeoLocation actually has some physical limitations, such as representing the current location when no GPS device is present. Yet this is a solved problem, where other APIs that are conceptually much simpler to implement have failed to gain any traction at all. Technically none of these APIs should be a problem to implement, but it appears that the momentum is just not there. Inadequate Web Application Linking and Activation Another important piece of the puzzle missing is the integration of HTML based Web applications. Today HTML based applications are not first class citizens on mobile operating systems. When talking about HTML based content there’s a big difference between content and applications. Content is great for search engine discovery and plain browser usage. Content is usually accessed intermittently and permanent linking is not so critical for this type of content. But applications have different needs. Applications need to be started up quickly and must be easily switchable to support a multi-tasking user workflow. Therefore, it’s pretty crucial that mobile Web apps are integrated into the underlying mobile OS and work with the standard task management features. Unfortunately this integration is not as smooth as it should be. It starts with actually trying to find mobile Web applications, to ‘installing’ them onto a phone in an easily accessible manner in a prominent position. The experience of discovering a Mobile Web ‘App’ and making it sticky is by no means as easy or satisfying. Today the way you’d go about this is: Open the browser Search for a Web Site in the browser with your search engine of choice Hope that you find the right site Hope that you actually find a site that works for your mobile device Click on the link and run the app in a fully chrome’d browser instance (read tiny surface area) Pin the app to the home screen (with all the limitations outline above) Hope you pointed at the right URL when you pinned Even for you and me as developers, there are a few steps in there that are painful and annoying, but think about the average user. First figuring out how to search for a specific site or URL? And then pinning the app and hopefully from the right location? You’ve probably lost more than half of your audience at that point. This experience sucks. For developers too this process is painful since app developers can’t control the shortcut creation directly. This problem often gets solved by crazy coding schemes, with annoying pop-ups that try to get people to create shortcuts via fancy animations that are both annoying and add overhead to each and every application that implements this sort of thing differently. And that’s not the end of it - getting the link onto the home screen with an application icon varies quite a bit between browsers. Apple’s non-standard meta tags are prominent and they work with iOS and Android (only more recent versions), but not on Windows Phone. Windows Phone instead requires you to create an actual screen or rather a partial screen be captured for a shortcut in the tile manager. Who had that brilliant idea I wonder? Surprisingly Chrome on recent Android versions seems to actually get it right – icons use pngs, pinning is easy and pinned applications properly behave like standalone apps and retain the browser’s active page state and content. Each of the platforms has a different way to specify icons (WP doesn’t allow you to use an icon image at all), and the most widely used interface in use today is a bunch of Apple specific meta tags that other browsers choose to support. The question is: Why is there no standard implementation for installing shortcuts across mobile platforms using an official format rather than a proprietary one? Then there’s iOS and the crazy way it treats home screen linked URLs using a crazy hybrid format that is neither as capable as a Web app running in Safari nor a WebView hosted application. Moving off the Web ‘app’ link when switching to another app actually causes the browser and preview it to ‘blank out’ the Web application in the Task View (see screenshot on the right). Then, when the ‘app’ is reactivated it ends up completely restarting the browser with the original link. This is crazy behavior that you can’t easily work around. In some situations you might be able to store the application state and restore it using LocalStorage, but for many scenarios that involve complex data sources (like say Google Maps) that’s not a possibility. The only reason for this screwed up behavior I can think of is that it is deliberate to make Web apps a pain in the butt to use and forcing users trough the App Store/PhoneGap/Cordova route. App linking and management is a very basic problem – something that we essentially have solved in every desktop browser – yet on mobile devices where it arguably matters a lot more to have easy access to web content we have to jump through hoops to have even a remotely decent linking/activation experience across browsers. Where’s the Money? It’s not surprising that device home screen integration and Mobile Web support in general is in such dismal shape – the mobile OS vendors benefit financially from App store sales and have little to gain from Web based applications that bypass the App store and the cash cow that it presents. On top of that, platform specific vendor lock-in of both end users and developers who have invested in hardware, apps and consumables is something that mobile platform vendors actually aspire to. Web based interfaces that are cross-platform are the anti-thesis of that and so again it’s no surprise that the mobile Web is on a struggling path. But – that may be changing. More and more we’re seeing operations shifting to services that are subscription based or otherwise collect money for usage, and that may drive more progress into the Web direction in the end . Nothing like the almighty dollar to drive innovation forward. Do we need a Mobile Web App Store? As much as I dislike moderated experiences in today’s massive App Stores, they do at least provide one single place to look for apps for your device. I think we could really use some sort of registry, that could provide something akin to an app store for mobile Web apps, to make it easier to actually find mobile applications. This could take the form of a specialized search engine, or maybe a more formal store/registry like structure. Something like apt-get/chocolatey for Web apps. It could be curated and provide at least some feedback and reviews that might help with the integrity of applications. Coupled to that could be a native application on each platform that would allow searching and browsing of the registry and then also handle installation in the form of providing the home screen linking, plus maybe an initial security configuration that determines what features are allowed access to for the app. I’m not holding my breath. In order for this sort of thing to take off and gain widespread appeal, a lot of coordination would be required. And in order to get enough traction it would have to come from a well known entity – a mobile Web app store from a no name source is unlikely to gain high enough usage numbers to make a difference. In a way this would eliminate some of the freedom of the Web, but of course this would also be an optional search path in addition to the standard open Web search mechanisms to find and access content today. Security Security is a big deal, and one of the perceived reasons why so many IT professionals appear to be willing to go back to the walled garden of deployed apps is that Apps are perceived as safe due to the official review and curation of the App stores. Curated stores are supposed to protect you from malware, illegal and misleading content. It doesn’t always work out that way and all the major vendors have had issues with security and the review process at some time or another. Security is critical, but I also think that Web applications in general pose less of a security threat than native applications, by nature of the sandboxed browser and JavaScript environments. Web applications run externally completely and in the HTML and JavaScript sandboxes, with only a very few controlled APIs allowing access to device specific features. And as discussed earlier – security for any device interaction can be granted the same for mobile applications through a Web browser, as they can for native applications either via explicit policies loaded from the Web, or via prompting as GeoLocation does today. Security is important, but it’s certainly solvable problem for Web applications even those that need to access device hardware. Security shouldn’t be a reason for Web apps to be an equal player in mobile applications. Apps are winning, but haven’t we been here before? So now we’re finding ourselves back in an era of installed app, rather than Web based and managed apps. Only it’s even worse today than with Desktop applications, in that the apps are going through a gatekeeper that charges a toll and censors what you can and can’t do in your apps. Frankly it’s a mystery to me why anybody would buy into this model and why it’s lasted this long when we’ve already been through this process. It’s crazy… It’s really a shame that this regression is happening. We have the technology to make mobile Web apps much more prominent, but yet we’re basically held back by what seems little more than bureaucracy, partisan bickering and self interest of the major parties involved. Back in the day of the desktop it was Internet Explorer’s 98+% market shareholding back the Web from improvements for many years – now it’s the combined mobile OS market in control of the mobile browsers. If mobile Web apps were allowed to be treated the same as native apps with simple ways to install and run them consistently and persistently, that would go a long way to making mobile applications much more usable and seriously viable alternatives to native apps. But as it is mobile apps have a severe disadvantage in placement and operation. There are a few bright spots in all of this. Mozilla’s FireFoxOs is embracing the Web for it’s mobile OS by essentially building every app out of HTML and JavaScript based content. It supports both packaged and certified package modes (that can be put into the app store), and Open Web apps that are loaded and run completely off the Web and can also cache locally for offline operation using a manifest. Open Web apps are treated as full class citizens in FireFoxOS and run using the same mechanism as installed apps. Unfortunately FireFoxOs is getting a slow start with minimal device support and specifically targeting the low end market. We can hope that this approach will change and catch on with other vendors, but that’s also an uphill battle given the conflict of interest with platform lock in that it represents. Recent versions of Android also seem to be working reasonably well with mobile application integration onto the desktop and activation out of the box. Although it still uses the Apple meta tags to find icons and behavior settings, everything at least works as you would expect – icons to the desktop on pinning, WebView based full screen activation, and reliable application persistence as the browser/app is treated like a real application. Hopefully iOS will at some point provide this same level of rudimentary Web app support. What’s also interesting to me is that Microsoft hasn’t picked up on the obvious need for a solid Web App platform. Being a distant third in the mobile OS war, Microsoft certainly has nothing to lose and everything to gain by using fresh ideas and expanding into areas that the other major vendors are neglecting. But instead Microsoft is trying to beat the market leaders at their own game, fighting on their adversary’s terms instead of taking a new tack. Providing a kick ass mobile Web platform that takes the lead on some of the proposed mobile APIs would be something positive that Microsoft could do to improve its miserable position in the mobile device market. Where are we at with Mobile Web? It sure sounds like I’m really down on the Mobile Web, right? I’ve built a number of mobile apps in the last year and while overall result and response has been very positive to what we were able to accomplish in terms of UI, getting that final 10% that required device integration dialed was an absolute nightmare on every single one of them. Big compromises had to be made and some features were left out or had to be modified for some devices. In two cases we opted to go the Cordova route in order to get the integration we needed, along with the extra pain involved in that process. Unless you’re not integrating with device features and you don’t care deeply about a smooth integration with the mobile desktop, mobile Web development is fraught with frustration. So, yes I’m frustrated! But it’s not for lack of wanting the mobile Web to succeed. I am still a firm believer that we will eventually arrive a much more functional mobile Web platform that allows access to the most common device features in a sensible way. It wouldn't be difficult for device platform vendors to make Web based applications first class citizens on mobile devices. But unfortunately it looks like it will still be some time before this happens. So, what’s your experience building mobile Web apps? Are you finding similar issues? Just giving up on raw Web applications and building PhoneGap apps instead? Completely skipping the Web and going native? Leave a comment for discussion. Resources Rick Strahl on DotNet Rocks talking about Mobile Web© Rick Strahl, West Wind Technologies, 2005-2014Posted in HTML5 Mobile Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Quick guide to Oracle IRM 11g: Classification design

- by Simon Thorpe

Quick guide to Oracle IRM 11g indexThis is the final article in the quick guide to Oracle IRM. If you've followed everything prior you will now have a fully functional and tested Information Rights Management service. It doesn't matter if you've been following the 10g or 11g guide as this next article is common to both. ContentsWhy this is the most important part... Understanding the classification and standard rights model Identifying business use cases Creating an effective IRM classification modelOne single classification across the entire businessA context for each and every possible granular use caseWhat makes a good context? Deciding on the use of roles in the context Reviewing the features and security for context roles Summary Why this is the most important part...Now the real work begins, installing and getting an IRM system running is as simple as following instructions. However to actually have an IRM technology easily protecting your most sensitive information without interfering with your users existing daily work flows and be able to scale IRM across the entire business, requires thought into how confidential documents are created, used and distributed. This article is going to give you the information you need to ask the business the right questions so that you can deploy your IRM service successfully. The IRM team here at Oracle have over 10 years of experience in helping customers and it is important you understand the following to be successful in securing access to your most confidential information. Whatever you are trying to secure, be it mergers and acquisitions information, engineering intellectual property, health care documentation or financial reports. No matter what type of user is going to access the information, be they employees, contractors or customers, there are common goals you are always trying to achieve.Securing the content at the earliest point possible and do it automatically. Removing the dependency on the user to decide to secure the content reduces the risk of mistakes significantly and therefore results a more secure deployment. K.I.S.S. (Keep It Simple Stupid) Reduce complexity in the rights/classification model. Oracle IRM lets you make changes to access to documents even after they are secured which allows you to start with a simple model and then introduce complexity once you've understood how the technology is going to be used in the business. After an initial learning period you can review your implementation and start to make informed decisions based on user feedback and administration experience. Clearly communicate to the user, when appropriate, any changes to their existing work practice. You must make every effort to make the transition to sealed content as simple as possible. For external users you must help them understand why you are securing the documents and inform them the value of the technology to both your business and them. Before getting into the detail, I must pay homage to Martin White, Vice President of client services in SealedMedia, the company Oracle acquired and who created Oracle IRM. In the SealedMedia years Martin was involved with every single customer and was key to the design of certain aspects of the IRM technology, specifically the context model we will be discussing here. Listening carefully to customers and understanding the flexibility of the IRM technology, Martin taught me all the skills of helping customers build scalable, effective and simple to use IRM deployments. No matter how well the engineering department designed the software, badly designed and poorly executed projects can result in difficult to use and manage, and ultimately insecure solutions. The advice and information that follows was born with Martin and he's still delivering IRM consulting with customers and can be found at www.thinkers.co.uk. It is from Martin and others that Oracle not only has the most advanced, scalable and usable document security solution on the market, but Oracle and their partners have the most experience in delivering successful document security solutions. Understanding the classification and standard rights model The goal of any successful IRM deployment is to balance the increase in security the technology brings without over complicating the way people use secured content and avoid a significant increase in administration and maintenance. With Oracle it is possible to automate the protection of content, deploy the desktop software transparently and use authentication methods such that users can open newly secured content initially unaware the document is any different to an insecure one. That is until of course they attempt to do something for which they don't have any rights, such as copy and paste to an insecure application or try and print. Central to achieving this objective is creating a classification model that is simple to understand and use but also provides the right level of complexity to meet the business needs. In Oracle IRM the term used for each classification is a "context". A context defines the relationship between.A group of related documents The people that use the documents The roles that these people perform The rights that these people need to perform their role The context is the key to the success of Oracle IRM. It provides the separation of the role and rights of a user from the content itself. Documents are sealed to contexts but none of the rights, user or group information is stored within the content itself. Sealing only places information about the location of the IRM server that sealed it, the context applied to the document and a few other pieces of metadata that pertain only to the document. This important separation of rights from content means that millions of documents can be secured against a single classification and a user needs only one right assigned to be able to access all documents. If you have followed all the previous articles in this guide, you will be ready to start defining contexts to which your sensitive information will be protected. But before you even start with IRM, you need to understand how your own business uses and creates sensitive documents and emails. Identifying business use cases Oracle is able to support multiple classification systems, but usually there is one single initial need for the technology which drives a deployment. This need might be to protect sensitive mergers and acquisitions information, engineering intellectual property, financial documents. For this and every subsequent use case you must understand how users create and work with documents, to who they are distributed and how the recipients should interact with them. A successful IRM deployment should start with one well identified use case (we go through some examples towards the end of this article) and then after letting this use case play out in the business, you learn how your users work with content, how well your communication to the business worked and if the classification system you deployed delivered the right balance. It is at this point you can start rolling the technology out further. Creating an effective IRM classification model Once you have selected the initial use case you will address with IRM, you need to design a classification model that defines the access to secured documents within the use case. In Oracle IRM there is an inbuilt classification system called the "context" model. In Oracle IRM 11g it is possible to extend the server to support any rights classification model, but the majority of users who are not using an application integration (such as Oracle IRM within Oracle Beehive) are likely to be starting out with the built in context model. Before looking at creating a classification system with IRM, it is worth reviewing some recognized standards and methods for creating and implementing security policy. A very useful set of documents are the ISO 17799 guidelines and the SANS security policy templates. First task is to create a context against which documents are to be secured. A context consists of a group of related documents (all top secret engineering research), a list of roles (contributors and readers) which define how users can access documents and a list of users (research engineers) who have been given a role allowing them to interact with sealed content. Before even creating the first context it is wise to decide on a philosophy which will dictate the level of granularity, the question is, where do you start? At a department level? By project? By technology? First consider the two ends of the spectrum... One single classification across the entire business Imagine that instead of having separate contexts, one for engineering intellectual property, one for your financial data, one for human resources personally identifiable information, you create one context for all documents across the entire business. Whilst you may have immediate objections, there are some significant benefits in thinking about considering this. Document security classification decisions are simple. You only have one context to chose from! User provisioning is simple, just make sure everyone has a role in the only context in the business. Administration is very low, if you assign rights to groups from the business user repository you probably never have to touch IRM administration again. There are however some obvious downsides to this model.All users in have access to all IRM secured content. So potentially a sales person could access sensitive mergers and acquisition documents, if they can get their hands on a copy that is. You cannot delegate control of different documents to different parts of the business, this may not satisfy your regulatory requirements for the separation and delegation of duties. Changing a users role affects every single document ever secured. Even though it is very unlikely a business would ever use one single context to secure all their sensitive information, thinking about this scenario raises one very important point. Just having one single context and securing all confidential documents to it, whilst incurring some of the problems detailed above, has one huge value. Once secured, IRM protected content can ONLY be accessed by authorized users. Just think of all the sensitive documents in your business today, imagine if you could ensure that only everyone you trust could open them. Even if an employee lost a laptop or someone accidentally sent an email to the wrong recipient, only the right people could open that file. A context for each and every possible granular use case Now let's think about the total opposite of a single context design. What if you created a context for each and every single defined business need and created multiple contexts within this for each level of granularity? Let's take a use case where we need to protect engineering intellectual property. Imagine we have 6 different engineering groups, and in each we have a research department, a design department and manufacturing. The company information security policy defines 3 levels of information sensitivity... restricted, confidential and top secret. Then let's say that each group and department needs to define access to information from both internal and external users. Finally add into the mix that they want to review the rights model for each context every financial quarter. This would result in a huge amount of contexts. For example, lets just look at the resulting contexts for one engineering group. Q1FY2010 Restricted Internal - Engineering Group 1 - Research Q1FY2010 Restricted Internal - Engineering Group 1 - Design Q1FY2010 Restricted Internal - Engineering Group 1 - Manufacturing Q1FY2010 Restricted External- Engineering Group 1 - Research Q1FY2010 Restricted External - Engineering Group 1 - Design Q1FY2010 Restricted External - Engineering Group 1 - Manufacturing Q1FY2010 Confidential Internal - Engineering Group 1 - Research Q1FY2010 Confidential Internal - Engineering Group 1 - Design Q1FY2010 Confidential Internal - Engineering Group 1 - Manufacturing Q1FY2010 Confidential External - Engineering Group 1 - Research Q1FY2010 Confidential External - Engineering Group 1 - Design Q1FY2010 Confidential External - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret Internal - Engineering Group 1 - Research Q1FY2010 Top Secret Internal - Engineering Group 1 - Design Q1FY2010 Top Secret Internal - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret External - Engineering Group 1 - Research Q1FY2010 Top Secret External - Engineering Group 1 - Design Q1FY2010 Top Secret External - Engineering Group 1 - Manufacturing Now multiply the above by 6 for each engineering group, 18 contexts. You are then creating/reviewing another 18 every 3 months. After a year you've got 72 contexts. What would be the advantages of such a complex classification model? You can satisfy very granular rights requirements, for example only an authorized engineering group 1 researcher can create a top secret report for access internally, and his role will be reviewed on a very frequent basis. Your business may have very complex rights requirements and mapping this directly to IRM may be an obvious exercise. The disadvantages of such a classification model are significant...Huge administrative overhead. Someone in the business must manage, review and administrate each of these contexts. If the engineering group had a single administrator, they would have 72 classifications to reside over each year. From an end users perspective life will be very confusing. Imagine if a user has rights in just 6 of these contexts. They may be able to print content from one but not another, be able to edit content in 2 contexts but not the other 4. Such confusion at the end user level causes frustration and resistance to the use of the technology. Increased synchronization complexity. Imagine a user who after 3 years in the company ends up with over 300 rights in many different contexts across the business. This would result in long synchronization times as the client software updates all your offline rights. Hard to understand who can do what with what. Imagine being the VP of engineering and as part of an internal security audit you are asked the question, "What rights to researchers have to our top secret information?". In this complex model the answer is not simple, it would depend on many roles in many contexts. Of course this example is extreme, but it highlights that trying to build many barriers in your business can result in a nightmare of administration and confusion amongst users. In the real world what we need is a balance of the two. We need to seek an optimum number of contexts. Too many contexts are unmanageable and too few contexts does not give fine enough granularity. What makes a good context? Good context design derives mainly from how well you understand your business requirements to secure access to confidential information. Some customers I have worked with can tell me exactly the documents they wish to secure and know exactly who should be opening them. However there are some customers who know only of the government regulation that requires them to control access to certain types of information, they don't actually know where the documents are, how they are created or understand exactly who should have access. Therefore you need to know how to ask the business the right questions that lead to information which help you define a context. First ask these questions about a set of documentsWhat is the topic? Who are legitimate contributors on this topic? Who are the authorized readership? If the answer to any one of these is significantly different, then it probably merits a separate context. Remember that sealed documents are inherently secure and as such they cannot leak to your competitors, therefore it is better sealed to a broad context than not sealed at all. Simplicity is key here. Always revert to the first extreme example of a single classification, then work towards essential complexity. If there is any doubt, always prefer fewer contexts. Remember, Oracle IRM allows you to change your mind later on. You can implement a design now and continue to change and refine as you learn how the technology is used. It is easy to go from a simple model to a more complex one, it is much harder to take a complex model that is already embedded in the work practice of users and try to simplify it. It is also wise to take a single use case and address this first with the business. Don't try and tackle many different problems from the outset. Do one, learn from the process, refine it and then take what you have learned into the next use case, refine and continue. Once you have a good grasp of the technology and understand how your business will use it, you can then start rolling out the technology wider across the business. Deciding on the use of roles in the context Once you have decided on that first initial use case and a context to create let's look at the details you need to decide upon. For each context, identify; Administrative rolesBusiness owner, the person who makes decisions about who may or may not see content in this context. This is often the person who wanted to use IRM and drove the business purchase. They are the usually the person with the most at risk when sensitive information is lost. Point of contact, the person who will handle requests for access to content. Sometimes the same as the business owner, sometimes a trusted secretary or administrator. Context administrator, the person who will enact the decisions of the Business Owner. Sometimes the point of contact, sometimes a trusted IT person. Document related rolesContributors, the people who create and edit documents in this context. Reviewers, the people who are involved in reviewing documents but are not trusted to secure information to this classification. This role is not always necessary. (See later discussion on Published-work and Work-in-Progress) Readers, the people who read documents from this context. Some people may have several of the roles above, which is fine. What you are trying to do is understand and define how the business interacts with your sensitive information. These roles obviously map directly to roles available in Oracle IRM. Reviewing the features and security for context roles At this point we have decided on a classification of information, understand what roles people in the business will play when administrating this classification and how they will interact with content. The final piece of the puzzle in getting the information for our first context is to look at the permissions people will have to sealed documents. First think why are you protecting the documents in the first place? It is to prevent the loss of leaking of information to the wrong people. To control the information, making sure that people only access the latest versions of documents. You are not using Oracle IRM to prevent unauthorized people from doing legitimate work. This is an important point, with IRM you can erect many barriers to prevent access to content yet too many restrictions and authorized users will often find ways to circumvent using the technology and end up distributing unprotected originals. Because IRM is a security technology, it is easy to get carried away restricting different groups. However I would highly recommend starting with a simple solution with few restrictions. Ensure that everyone who reasonably needs to read documents can do so from the outset. Remember that with Oracle IRM you can change rights to content whenever you wish and tighten security. Always return to the fact that the greatest value IRM brings is that ONLY authorized users can access secured content, remember that simple "one context for the entire business" model. At the start of the deployment you really need to aim for user acceptance and therefore a simple model is more likely to succeed. As time passes and users understand how IRM works you can start to introduce more restrictions and complexity. Another key aspect to focus on is handling exceptions. If you decide on a context model where engineering can only access engineering information, and sales can only access sales data. Act quickly when a sales manager needs legitimate access to a set of engineering documents. Having a quick and effective process for permitting other people with legitimate needs to obtain appropriate access will be rewarded with acceptance from the user community. These use cases can often be satisfied by integrating IRM with a good Identity & Access Management technology which simplifies the process of assigning users the correct business roles. The big print issue... Printing is often an issue of contention, users love to print but the business wants to ensure sensitive information remains in the controlled digital world. There are many cases of physical document loss causing a business pain, it is often overlooked that IRM can help with this issue by limiting the ability to generate physical copies of digital content. However it can be hard to maintain a balance between security and usability when it comes to printing. Consider the following points when deciding about whether to give print rights. Oracle IRM sealed documents can contain watermarks that expose information about the user, time and location of access and the classification of the document. This information would reside in the printed copy making it easier to trace who printed it. Printed documents are slower to distribute in comparison to their digital counterparts, so time sensitive information in printed format may present a lower risk. Print activity is audited, therefore you can monitor and react to users abusing print rights. Summary In summary it is important to think carefully about the way you create your context model. As you ask the business these questions you may get a variety of different requirements. There may be special projects that require a context just for sensitive information created during the lifetime of the project. There may be a department that requires all information in the group is secured and you might have a few senior executives who wish to use IRM to exchange a small number of highly sensitive documents with a very small number of people. Oracle IRM, with its very flexible context classification system, can support all of these use cases. The trick is to introducing the complexity to deliver them at the right level. In another article i'm working on I will go through some examples of how Oracle IRM might map to existing business use cases. But for now, this article covers all the important questions you need to get your IRM service deployed and successfully protecting your most sensitive information.

Read the article
How to find and fix performance problems in ORM powered applications

- by FransBouma

Once in a while we get requests about how to fix performance problems with our framework. As it comes down to following the same steps and looking into the same things every single time, I decided to write a blogpost about it instead, so more people can learn from this and solve performance problems in their O/R mapper powered applications. In some parts it's focused on LLBLGen Pro but it's also usable for other O/R mapping frameworks, as the vast majority of performance problems in O/R mapper powered applications are not specific for a certain O/R mapper framework. Too often, the developer looks at the wrong part of the application, trying to fix what isn't a problem in that part, and getting frustrated that 'things are so slow with <insert your favorite framework X here>'. I'm in the O/R mapper business for a long time now (almost 10 years, full time) and as it's a small world, we O/R mapper developers know almost all tricks to pull off by now: we all know what to do to make task ABC faster and what compromises (because there are almost always compromises) to deal with if we decide to make ABC faster that way. Some O/R mapper frameworks are faster in X, others in Y, but you can be sure the difference is mainly a result of a compromise some developers are willing to deal with and others aren't. That's why the O/R mapper frameworks on the market today are different in many ways, even though they all fetch and save entities from and to a database. I'm not suggesting there's no room for improvement in today's O/R mapper frameworks, there always is, but it's not a matter of 'the slowness of the application is caused by the O/R mapper' anymore. Perhaps query generation can be optimized a bit here, row materialization can be optimized a bit there, but it's mainly coming down to milliseconds. Still worth it if you're a framework developer, but it's not much compared to the time spend inside databases and in user code: if a complete fetch takes 40ms or 50ms (from call to entity object collection), it won't make a difference for your application as that 10ms difference won't be noticed. That's why it's very important to find the real locations of the problems so developers can fix them properly and don't get frustrated because their quest to get a fast, performing application failed. Performance tuning basics and rules Finding and fixing performance problems in any application is a strict procedure with four prescribed steps: isolate, analyze, interpret and fix, in that order. It's key that you don't skip a step nor make assumptions: these steps help you find the reason of a problem which seems to be there, and how to fix it or leave it as-is. Skipping a step, or when you assume things will be bad/slow without doing analysis will lead to the path of premature optimization and won't actually solve your problems, only create new ones. The most important rule of finding and fixing performance problems in software is that you have to understand what 'performance problem' actually means. Most developers will say "when a piece of software / code is slow, you have a performance problem". But is that actually the case? If I write a Linq query which will aggregate, group and sort 5 million rows from several tables to produce a resultset of 10 rows, it might take more than a couple of milliseconds before that resultset is ready to be consumed by other logic. If I solely look at the Linq query, the code consuming the resultset of the 10 rows and then look at the time it takes to complete the whole procedure, it will appear to me to be slow: all that time taken to produce and consume 10 rows? But if you look closer, if you analyze and interpret the situation, you'll see it does a tremendous amount of work, and in that light it might even be extremely fast. With every performance problem you encounter, always do realize that what you're trying to solve is perhaps not a technical problem at all, but a perception problem. The second most important rule you have to understand is based on the old saying "Penny wise, Pound Foolish": the part which takes e.g. 5% of the total time T for a given task isn't worth optimizing if you have another part which takes a much larger part of the total time T for that same given task. Optimizing parts which are relatively insignificant for the total time taken is not going to bring you better results overall, even if you totally optimize that part away. This is the core reason why analysis of the complete set of application parts which participate in a given task is key to being successful in solving performance problems: No analysis -> no problem -> no solution. One warning up front: hunting for performance will always include making compromises. Fast software can be made maintainable, but if you want to squeeze as much performance out of your software, you will inevitably be faced with the dilemma of compromising one or more from the group {readability, maintainability, features} for the extra performance you think you'll gain. It's then up to you to decide whether it's worth it. In almost all cases it's not. The reason for this is simple: the vast majority of performance problems can be solved by implementing the proper algorithms, the ones with proven Big O-characteristics so you know the performance you'll get plus you know the algorithm will work. The time taken by the algorithm implementing code is inevitable: you already implemented the best algorithm. You might find some optimizations on the technical level but in general these are minor. Let's look at the four steps to see how they guide us through the quest to find and fix performance problems. Isolate The first thing you need to do is to isolate the areas in your application which are assumed to be slow. For example, if your application is a web application and a given page is taking several seconds or even minutes to load, it's a good candidate to check out. It's important to start with the isolate step because it allows you to focus on a single code path per area with a clear begin and end and ignore the rest. The rest of the steps are taken per identified problematic area. Keep in mind that isolation focuses on tasks in an application, not code snippets. A task is something that's started in your application by either another task or the user, or another program, and has a beginning and an end. You can see a task as a piece of functionality offered by your application. Analyze Once you've determined the problem areas, you have to perform analysis on the code paths of each area, to see where the performance problems occur and which areas are not the problem. This is a multi-layered effort: an application which uses an O/R mapper typically consists of multiple parts: there's likely some kind of interface (web, webservice, windows etc.), a part which controls the interface and business logic, the O/R mapper part and the RDBMS, all connected with either a network or inter-process connections provided by the OS or other means. Each of these parts, including the connectivity plumbing, eat up a part of the total time it takes to complete a task, e.g. load a webpage with all orders of a given customer X. To understand which parts participate in the task / area we're investigating and how much they contribute to the total time taken to complete the task, analysis of each participating task is essential. Start with the code you wrote which starts the task, analyze the code and track the path it follows through your application. What does the code do along the way, verify whether it's correct or not. Analyze whether you have implemented the right algorithms in your code for this particular area. Remember we're looking at one area at a time, which means we're ignoring all other code paths, just the code path of the current problematic area, from begin to end and back. Don't dig in and start optimizing at the code level just yet. We're just analyzing. If your analysis reveals big architectural stupidity, it's perhaps a good idea to rethink the architecture at this point. For the rest, we're analyzing which means we collect data about what could be wrong, for each participating part of the complete application. Reviewing the code you wrote is a good tool to get deeper understanding of what is going on for a given task but ultimately it lacks precision and overview what really happens: humans aren't good code interpreters, computers are. We therefore need to utilize tools to get deeper understanding about which parts contribute how much time to the total task, triggered by which other parts and for example how many times are they called. There are two different kind of tools which are necessary: .NET profilers and O/R mapper / RDBMS profilers. .NET profiling .NET profilers (e.g. dotTrace by JetBrains or Ants by Red Gate software) show exactly which pieces of code are called, how many times they're called, and the time it took to run that piece of code, at the method level and sometimes even at the line level. The .NET profilers are essential tools for understanding whether the time taken to complete a given task / area in your application is consumed by .NET code, where exactly in your code, the path to that code, how many times that code was called by other code and thus reveals where hotspots are located: the areas where a solution can be found. Importantly, they also reveal which areas can be left alone: remember our penny wise pound foolish saying: if a profiler reveals that a group of methods are fast, or don't contribute much to the total time taken for a given task, ignore them. Even if the code in them is perhaps complex and looks like a candidate for optimization: you can work all day on that, it won't matter. As we're focusing on a single area of the application, it's best to start profiling right before you actually activate the task/area. Most .NET profilers support this by starting the application without starting the profiling procedure just yet. You navigate to the particular part which is slow, start profiling in the profiler, in your application you perform the actions which are considered slow, and afterwards you get a snapshot in the profiler. The snapshot contains the data collected by the profiler during the slow action, so most data is produced by code in the area to investigate. This is important, because it allows you to stay focused on a single area. O/R mapper and RDBMS profiling .NET profilers give you a good insight in the .NET side of things, but not in the RDBMS side of the application. As this article is about O/R mapper powered applications, we're also looking at databases, and the software making it possible to consume the database in your application: the O/R mapper. To understand which parts of the O/R mapper and database participate how much to the total time taken for task T, we need different tools. There are two kind of tools focusing on O/R mappers and database performance profiling: O/R mapper profilers and RDBMS profilers. For O/R mapper profilers, you can look at LLBLGen Prof by hibernating rhinos or the Linq to Sql/LLBLGen Pro profiler by Huagati. Hibernating rhinos also have profilers for other O/R mappers like NHibernate (NHProf) and Entity Framework (EFProf) and work the same as LLBLGen Prof. For RDBMS profilers, you have to look whether the RDBMS vendor has a profiler. For example for SQL Server, the profiler is shipped with SQL Server, for Oracle it's build into the RDBMS, however there are also 3rd party tools. Which tool you're using isn't really important, what's important is that you get insight in which queries are executed during the task / area we're currently focused on and how long they took. Here, the O/R mapper profilers have an advantage as they collect the time it took to execute the query from the application's perspective so they also collect the time it took to transport data across the network. This is important because a query which returns a massive resultset or a resultset with large blob/clob/ntext/image fields takes more time to get transported across the network than a small resultset and a database profiler doesn't take this into account most of the time. Another tool to use in this case, which is more low level and not all O/R mappers support it (though LLBLGen Pro and NHibernate as well do) is tracing: most O/R mappers offer some form of tracing or logging system which you can use to collect the SQL generated and executed and often also other activity behind the scenes. While tracing can produce a tremendous amount of data in some cases, it also gives insight in what's going on. Interpret After we've completed the analysis step it's time to look at the data we've collected. We've done code reviews to see whether we've done anything stupid and which parts actually take place and if the proper algorithms have been implemented. We've done .NET profiling to see which parts are choke points and how much time they contribute to the total time taken to complete the task we're investigating. We've performed O/R mapper profiling and RDBMS profiling to see which queries were executed during the task, how many queries were generated and executed and how long they took to complete, including network transportation. All this data reveals two things: which parts are big contributors to the total time taken and which parts are irrelevant. Both aspects are very important. The parts which are irrelevant (i.e. don't contribute significantly to the total time taken) can be ignored from now on, we won't look at them. The parts which contribute a lot to the total time taken are important to look at. We now have to first look at the .NET profiler results, to see whether the time taken is consumed in our own code, in .NET framework code, in the O/R mapper itself or somewhere else. For example if most of the time is consumed by DbCommand.ExecuteReader, the time it took to complete the task is depending on the time the data is fetched from the database. If there was just 1 query executed, according to tracing or O/R mapper profilers / RDBMS profilers, check whether that query is optimal, uses indexes or has to deal with a lot of data. Interpret means that you follow the path from begin to end through the data collected and determine where, along the path, the most time is contributed. It also means that you have to check whether this was expected or is totally unexpected. My previous example of the 10 row resultset of a query which groups millions of rows will likely reveal that a long time is spend inside the database and almost no time is spend in the .NET code, meaning the RDBMS part contributes the most to the total time taken, the rest is compared to that time, irrelevant. Considering the vastness of the source data set, it's expected this will take some time. However, does it need tweaking? Perhaps all possible tweaks are already in place. In the interpret step you then have to decide that further action in this area is necessary or not, based on what the analysis results show: if the analysis results were unexpected and in the area where the most time is contributed to the total time taken is room for improvement, action should be taken. If not, you can only accept the situation and move on. In all cases, document your decision together with the analysis you've done. If you decide that the perceived performance problem is actually expected due to the nature of the task performed, it's essential that in the future when someone else looks at the application and starts asking questions you can answer them properly and new analysis is only necessary if situations changed. Fix After interpreting the analysis results you've concluded that some areas need adjustment. This is the fix step: you're actively correcting the performance problem with proper action targeted at the real cause. In many cases related to O/R mapper powered applications it means you'll use different features of the O/R mapper to achieve the same goal, or apply optimizations at the RDBMS level. It could also mean you apply caching inside your application (compromise memory consumption over performance) to avoid unnecessary re-querying data and re-consuming the results. After applying a change, it's key you re-do the analysis and interpretation steps: compare the results and expectations with what you had before, to see whether your actions had any effect or whether it moved the problem to a different part of the application. Don't fall into the trap to do partly analysis: do the full analysis again: .NET profiling and O/R mapper / RDBMS profiling. It might very well be that the changes you've made make one part faster but another part significantly slower, in such a way that the overall problem hasn't changed at all. Performance tuning is dealing with compromises and making choices: to use one feature over the other, to accept a higher memory footprint, to go away from the strict-OO path and execute queries directly onto the RDBMS, these are choices and compromises which will cross your path if you want to fix performance problems with respect to O/R mappers or data-access and databases in general. In most cases it's not a big issue: alternatives are often good choices too and the compromises aren't that hard to deal with. What is important is that you document why you made a choice, a compromise: which analysis data, which interpretation led you to the choice made. This is key for good maintainability in the years to come. Most common performance problems with O/R mappers Below is an incomplete list of common performance problems related to data-access / O/R mappers / RDBMS code. It will help you with fixing the hotspots you found in the interpretation step. SELECT N+1: (Lazy-loading specific). Lazy loading triggered performance bottlenecks. Consider a list of Orders bound to a grid. You have a Field mapped onto a related field in Order, Customer.CompanyName. Showing this column in the grid will make the grid fetch (indirectly) for each row the Customer row. This means you'll get for the single list not 1 query (for the orders) but 1+(the number of orders shown) queries. To solve this: use eager loading using a prefetch path to fetch the customers with the orders. SELECT N+1 is easy to spot with an O/R mapper profiler or RDBMS profiler: if you see a lot of identical queries executed at once, you have this problem. Prefetch paths using many path nodes or sorting, or limiting. Eager loading problem. Prefetch paths can help with performance, but as 1 query is fetched per node, it can be the number of data fetched in a child node is bigger than you think. Also consider that data in every node is merged on the client within the parent. This is fast, but it also can take some time if you fetch massive amounts of entities. If you keep fetches small, you can use tuning parameters like the ParameterizedPrefetchPathThreshold setting to get more optimal queries. Deep inheritance hierarchies of type Target Per Entity/Type. If you use inheritance of type Target per Entity / Type (each type in the inheritance hierarchy is mapped onto its own table/view), fetches will join subtype- and supertype tables in many cases, which can lead to a lot of performance problems if the hierarchy has many types. With this problem, keep inheritance to a minimum if possible, or switch to a hierarchy of type Target Per Hierarchy, which means all entities in the inheritance hierarchy are mapped onto the same table/view. Of course this has its own set of drawbacks, but it's a compromise you might want to take. Fetching massive amounts of data by fetching large lists of entities. LLBLGen Pro supports paging (and limiting the # of rows returned), which is often key to process through large sets of data. Use paging on the RDBMS if possible (so a query is executed which returns only the rows in the page requested). When using paging in a web application, be sure that you switch server-side paging on on the datasourcecontrol used. In this case, paging on the grid alone is not enough: this can lead to fetching a lot of data which is then loaded into the grid and paged there. Keep note that analyzing queries for paging could lead to the false assumption that paging doesn't occur, e.g. when the query contains a field of type ntext/image/clob/blob and DISTINCT can't be applied while it should have (e.g. due to a join): the datareader will do DISTINCT filtering on the client. this is a little slower but it does perform paging functionality on the data-reader so it won't fetch all rows even if the query suggests it does. Fetch massive amounts of data because blob/clob/ntext/image fields aren't excluded. LLBLGen Pro supports field exclusion for queries. You can exclude fields (also in prefetch paths) per query to avoid fetching all fields of an entity, e.g. when you don't need them for the logic consuming the resultset. Excluding fields can greatly reduce the amount of time spend on data-transport across the network. Use this optimization if you see that there's a big difference between query execution time on the RDBMS and the time reported by the .NET profiler for the ExecuteReader method call. Doing client-side aggregates/scalar calculations by consuming a lot of data. If possible, try to formulate a scalar query or group by query using the projection system or GetScalar functionality of LLBLGen Pro to do data consumption on the RDBMS server. It's far more efficient to process data on the RDBMS server than to first load it all in memory, then traverse the data in-memory to calculate a value. Using .ToList() constructs inside linq queries. It might be you use .ToList() somewhere in a Linq query which makes the query be run partially in-memory. Example: var q = from c in metaData.Customers.ToList() where c.Country=="Norway" select c; This will actually fetch all customers in-memory and do an in-memory filtering, as the linq query is defined on an IEnumerable<T>, and not on the IQueryable<T>. Linq is nice, but it can often be a bit unclear where some parts of a Linq query might run. Fetching all entities to delete into memory first. To delete a set of entities it's rather inefficient to first fetch them all into memory and then delete them one by one. It's more efficient to execute a DELETE FROM ... WHERE query on the database directly to delete the entities in one go. LLBLGen Pro supports this feature, and so do some other O/R mappers. It's not always possible to do this operation in the context of an O/R mapper however: if an O/R mapper relies on a cache, these kind of operations are likely not supported because they make it impossible to track whether an entity is actually removed from the DB and thus can be removed from the cache. Fetching all entities to update with an expression into memory first. Similar to the previous point: it is more efficient to update a set of entities directly with a single UPDATE query using an expression instead of fetching the entities into memory first and then updating the entities in a loop, and afterwards saving them. It might however be a compromise you don't want to take as it is working around the idea of having an object graph in memory which is manipulated and instead makes the code fully aware there's a RDBMS somewhere. Conclusion Performance tuning is almost always about compromises and making choices. It's also about knowing where to look and how the systems in play behave and should behave. The four steps I provided should help you stay focused on the real problem and lead you towards the solution. Knowing how to optimally use the systems participating in your own code (.NET framework, O/R mapper, RDBMS, network/services) is key for success as well as knowing what's going on inside the application you built. I hope you'll find this guide useful in tracking down performance problems and dealing with them in a useful way.

Read the article
Ancillary Objects: Separate Debug ELF Files For Solaris

- by Ali Bahrami

We introduced a new object ELF object type in Solaris 11 Update 1 called the Ancillary Object. This posting describes them, using material originally written during their development, the PSARC arc case, and the Solaris Linker and Libraries Manual. ELF objects contain allocable sections, which are mapped into memory at runtime, and non-allocable sections, which are present in the file for use by debuggers and observability tools, but which are not mapped or used at runtime. Typically, all of these sections exist within a single object file. Ancillary objects allow them to instead go into a separate file. There are different reasons given for wanting such a feature. One can debate whether the added complexity is worth the benefit, and in most cases it is not. However, one important case stands out — customers with very large 32-bit objects who are not ready or able to make the transition to 64-bits. We have customers who build extremely large 32-bit objects. Historically, the debug sections in these objects have used the stabs format, which is limited, but relatively compact. In recent years, the industry has transitioned to the powerful but verbose DWARF standard. In some cases, the size of these debug sections is large enough to push the total object file size past the fundamental 4GB limit for 32-bit ELF object files. The best, and ultimately only, solution to overly large objects is to transition to 64-bits. However, consider environments where: Hundreds of users may be executing the code on large shared systems. (32-bits use less memory and bus bandwidth, and on sparc runs just as fast as 64-bit code otherwise). Complex finely tuned code, where the original authors may no longer be available. Critical production code, that was expensive to qualify and bring online, and which is otherwise serving its intended purpose without issue. Users in these risk adverse and/or high scale categories have good reasons to push 32-bits objects to the limit before moving on. Ancillary objects offer these users a longer runway. Design The design of ancillary objects is intended to be simple, both to help human understanding when examining elfdump output, and to lower the bar for debuggers such as dbx to support them. The primary and ancillary objects have the same set of section headers, with the same names, in the same order (i.e. each section has the same index in both files). A single added section of type SHT_SUNW_ANCILLARY is added to both objects, containing information that allows a debugger to identify and validate both files relative to each other. Given one of these files, the ancillary section allows you to identify the other. Allocable sections go in the primary object, and non-allocable ones go into the ancillary object. A small set of non-allocable objects, notably the symbol table, are copied into both objects. As noted above, most sections are only written to one of the two objects, but both objects have the same section header array. The section header in the file that does not contain the section data is tagged with the SHF_SUNW_ABSENT section header flag to indicate its placeholder status. Compiler writers and others who produce objects can set the SUNW_SHF_PRIMARY section header flag to mark non-allocable sections that should go to the primary object rather than the ancillary. If you don't request an ancillary object, the Solaris ELF format is unchanged. Users who don't use ancillary objects do not pay for the feature. This is important, because they exist to serve a small subset of our users, and must not complicate the common case. If you do request an ancillary object, the runtime behavior of the primary object will be the same as that of a normal object. There is no added runtime cost. The primary and ancillary object together represent a logical single object. This is facilitated by the use of a single set of section headers. One can easily imagine a tool that can merge a primary and ancillary object into a single file, or the reverse. (Note that although this is an interesting intellectual exercise, we don't actually supply such a tool because there's little practical benefit above and beyond using ld to create the files). Among the benefits of this approach are: There is no need for per-file symbol tables to reflect the contents of each file. The same symbol table that would be produced for a standard object can be used. The section contents are identical in either case — there is no need to alter data to accommodate multiple files. It is very easy for a debugger to adapt to these new files, and the processing involved can be encapsulated in input/output routines. Most of the existing debugger implementation applies without modification. The limit of a 4GB 32-bit output object is now raised to 4GB of code, and 4GB of debug data. There is also the future possibility (not currently supported) to support multiple ancillary objects, each of which could contain up to 4GB of additional debug data. It must be noted however that the 32-bit DWARF debug format is itself inherently 32-bit limited, as it uses 32-bit offsets between debug sections, so the ability to employ multiple ancillary object files may not turn out to be useful. Using Ancillary Objects (From the Solaris Linker and Libraries Guide) By default, objects contain both allocable and non-allocable sections. Allocable sections are the sections that contain executable code and the data needed by that code at runtime. Non-allocable sections contain supplemental information that is not required to execute an object at runtime. These sections support the operation of debuggers and other observability tools. The non-allocable sections in an object are not loaded into memory at runtime by the operating system, and so, they have no impact on memory use or other aspects of runtime performance no matter their size. For convenience, both allocable and non-allocable sections are normally maintained in the same file. However, there are situations in which it can be useful to separate these sections. To reduce the size of objects in order to improve the speed at which they can be copied across wide area networks. To support fine grained debugging of highly optimized code requires considerable debug data. In modern systems, the debugging data can easily be larger than the code it describes. The size of a 32-bit object is limited to 4 Gbytes. In very large 32-bit objects, the debug data can cause this limit to be exceeded and prevent the creation of the object. To limit the exposure of internal implementation details. Traditionally, objects have been stripped of non-allocable sections in order to address these issues. Stripping is effective, but destroys data that might be needed later. The Solaris link-editor can instead write non-allocable sections to an ancillary object. This feature is enabled with the -z ancillary command line option. $ ld ... -z ancillary[=outfile] ...By default, the ancillary file is given the same name as the primary output object, with a .anc file extension. However, a different name can be provided by providing an outfile value to the -z ancillary option. When -z ancillary is specified, the link-editor performs the following actions. All allocable sections are written to the primary object. In addition, all non-allocable sections containing one or more input sections that have the SHF_SUNW_PRIMARY section header flag set are written to the primary object. All remaining non-allocable sections are written to the ancillary object. The following non-allocable sections are written to both the primary object and ancillary object. .shstrtab The section name string table. .symtab The full non-dynamic symbol table. .symtab_shndx The symbol table extended index section associated with .symtab. .strtab The non-dynamic string table associated with .symtab. .SUNW_ancillary Contains the information required to identify the primary and ancillary objects, and to identify the object being examined. The primary object and all ancillary objects contain the same array of sections headers. Each section has the same section index in every file. Although the primary and ancillary objects all define the same section headers, the data for most sections will be written to a single file as described above. If the data for a section is not present in a given file, the SHF_SUNW_ABSENT section header flag is set, and the sh_size field is 0. This organization makes it possible to acquire a full list of section headers, a complete symbol table, and a complete list of the primary and ancillary objects from either of the primary or ancillary objects. The following example illustrates the underlying implementation of ancillary objects. An ancillary object is created by adding the -z ancillary command line option to an otherwise normal compilation. The file utility shows that the result is an executable named a.out, and an associated ancillary object named a.out.anc. $ cat hello.c #include <stdio.h> int main(int argc, char **argv) { (void) printf("hello, world\n"); return (0); } $ cc -g -zancillary hello.c $ file a.out a.out.anc a.out: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, ancillary object a.out.anc a.out.anc: ELF 32-bit LSB ancillary 80386 Version 1, primary object a.out $ ./a.out hello worldThe resulting primary object is an ordinary executable that can be executed in the usual manner. It is no different at runtime than an executable built without the use of ancillary objects, and then stripped of non-allocable content using the strip or mcs commands. As previously described, the primary object and ancillary objects contain the same section headers. To see how this works, it is helpful to use the elfdump utility to display these section headers and compare them. The following table shows the section header information for a selection of headers from the previous link-edit example. Index Section Name Type Primary Flags Ancillary Flags Primary Size Ancillary Size 13 .text PROGBITS ALLOC EXECINSTR ALLOC EXECINSTR SUNW_ABSENT 0x131 0 20 .data PROGBITS WRITE ALLOC WRITE ALLOC SUNW_ABSENT 0x4c 0 21 .symtab SYMTAB 0 0 0x450 0x450 22 .strtab STRTAB STRINGS STRINGS 0x1ad 0x1ad 24 .debug_info PROGBITS SUNW_ABSENT 0 0 0x1a7 28 .shstrtab STRTAB STRINGS STRINGS 0x118 0x118 29 .SUNW_ancillary SUNW_ancillary 0 0 0x30 0x30 The data for most sections is only present in one of the two files, and absent from the other file. The SHF_SUNW_ABSENT section header flag is set when the data is absent. The data for allocable sections needed at runtime are found in the primary object. The data for non-allocable sections used for debugging but not needed at runtime are placed in the ancillary file. A small set of non-allocable sections are fully present in both files. These are the .SUNW_ancillary section used to relate the primary and ancillary objects together, the section name string table .shstrtab, as well as the symbol table.symtab, and its associated string table .strtab. It is possible to strip the symbol table from the primary object. A debugger that encounters an object without a symbol table can use the .SUNW_ancillary section to locate the ancillary object, and access the symbol contained within. The primary object, and all associated ancillary objects, contain a .SUNW_ancillary section that allows all the objects to be identified and related together. $ elfdump -T SUNW_ancillary a.out a.out.anc a.out: Ancillary Section: .SUNW_ancillary index tag value [0] ANC_SUNW_CHECKSUM 0x8724 [1] ANC_SUNW_MEMBER 0x1 a.out [2] ANC_SUNW_CHECKSUM 0x8724 [3] ANC_SUNW_MEMBER 0x1a3 a.out.anc [4] ANC_SUNW_CHECKSUM 0xfbe2 [5] ANC_SUNW_NULL 0 a.out.anc: Ancillary Section: .SUNW_ancillary index tag value [0] ANC_SUNW_CHECKSUM 0xfbe2 [1] ANC_SUNW_MEMBER 0x1 a.out [2] ANC_SUNW_CHECKSUM 0x8724 [3] ANC_SUNW_MEMBER 0x1a3 a.out.anc [4] ANC_SUNW_CHECKSUM 0xfbe2 [5] ANC_SUNW_NULL 0 The ancillary sections for both objects contain the same number of elements, and are identical except for the first element. Each object, starting with the primary object, is introduced with a MEMBER element that gives the file name, followed by a CHECKSUM that identifies the object. In this example, the primary object is a.out, and has a checksum of 0x8724. The ancillary object is a.out.anc, and has a checksum of 0xfbe2. The first element in a .SUNW_ancillary section, preceding the MEMBER element for the primary object, is always a CHECKSUM element, containing the checksum for the file being examined. The presence of a .SUNW_ancillary section in an object indicates that the object has associated ancillary objects. The names of the primary and all associated ancillary objects can be obtained from the ancillary section from any one of the files. It is possible to determine which file is being examined from the larger set of files by comparing the first checksum value to the checksum of each member that follows. Debugger Access and Use of Ancillary Objects Debuggers and other observability tools must merge the information found in the primary and ancillary object files in order to build a complete view of the object. This is equivalent to processing the information from a single file. This merging is simplified by the primary object and ancillary objects containing the same section headers, and a single symbol table. The following steps can be used by a debugger to assemble the information contained in these files. Starting with the primary object, or any of the ancillary objects, locate the .SUNW_ancillary section. The presence of this section identifies the object as part of an ancillary group, contains information that can be used to obtain a complete list of the files and determine which of those files is the one currently being examined. Create a section header array in memory, using the section header array from the object being examined as an initial template. Open and read each file identified by the .SUNW_ancillary section in turn. For each file, fill in the in-memory section header array with the information for each section that does not have the SHF_SUNW_ABSENT flag set. The result will be a complete in-memory copy of the section headers with pointers to the data for all sections. Once this information has been acquired, the debugger can proceed as it would in the single file case, to access and control the running program. Note - The ELF definition of ancillary objects provides for a single primary object, and an arbitrary number of ancillary objects. At this time, the Oracle Solaris link-editor only produces a single ancillary object containing all non-allocable sections. This may change in the future. Debuggers and other observability tools should be written to handle the general case of multiple ancillary objects. ELF Implementation Details (From the Solaris Linker and Libraries Guide) To implement ancillary objects, it was necessary to extend the ELF format to add a new object type (ET_SUNW_ANCILLARY), a new section type (SHT_SUNW_ANCILLARY), and 2 new section header flags (SHF_SUNW_ABSENT, SHF_SUNW_PRIMARY). In this section, I will detail these changes, in the form of diffs to the Solaris Linker and Libraries manual. Part IV ELF Application Binary Interface Chapter 13: Object File Format Object File Format Edit Note: This existing section at the beginning of the chapter describes the ELF header. There's a table of object file types, which now includes the new ET_SUNW_ANCILLARY type. e_type Identifies the object file type, as listed in the following table. NameValueMeaning ET_NONE0No file type ET_REL1Relocatable file ET_EXEC2Executable file ET_DYN3Shared object file ET_CORE4Core file ET_LOSUNW0xfefeStart operating system specific range ET_SUNW_ANCILLARY0xfefeAncillary object file ET_HISUNW0xfefdEnd operating system specific range ET_LOPROC0xff00Start processor-specific range ET_HIPROC0xffffEnd processor-specific range Sections Edit Note: This overview section defines the section header structure, and provides a high level description of known sections. It was updated to define the new SHF_SUNW_ABSENT and SHF_SUNW_PRIMARY flags and the new SHT_SUNW_ANCILLARY section. ... sh_type Categorizes the section's contents and semantics. Section types and their descriptions are listed in Table 13-5. sh_flags Sections support 1-bit flags that describe miscellaneous attributes. Flag definitions are listed in Table 13-8. ... Table 13-5 ELF Section Types, sh_type NameValue . . . SHT_LOSUNW0x6fffffee SHT_SUNW_ancillary0x6fffffee . . . ... SHT_LOSUNW - SHT_HISUNW Values in this inclusive range are reserved for Oracle Solaris OS semantics. SHT_SUNW_ANCILLARY Present when a given object is part of a group of ancillary objects. Contains information required to identify all the files that make up the group. See Ancillary Section. ... Table 13-8 ELF Section Attribute Flags NameValue . . . SHF_MASKOS0x0ff00000 SHF_SUNW_NODISCARD0x00100000 SHF_SUNW_ABSENT0x00200000 SHF_SUNW_PRIMARY0x00400000 SHF_MASKPROC0xf0000000 . . . ... SHF_SUNW_ABSENT Indicates that the data for this section is not present in this file. When ancillary objects are created, the primary object and any ancillary objects, will all have the same section header array, to facilitate merging them to form a complete view of the object, and to allow them to use the same symbol tables. Each file contains a subset of the section data. The data for allocable sections is written to the primary object while the data for non-allocable sections is written to an ancillary file. The SHF_SUNW_ABSENT flag is used to indicate that the data for the section is not present in the object being examined. When the SHF_SUNW_ABSENT flag is set, the sh_size field of the section header must be 0. An application encountering an SHF_SUNW_ABSENT section can choose to ignore the section, or to search for the section data within one of the related ancillary files. SHF_SUNW_PRIMARY The default behavior when ancillary objects are created is to write all allocable sections to the primary object and all non-allocable sections to the ancillary objects. The SHF_SUNW_PRIMARY flag overrides this behavior. Any output section containing one more input section with the SHF_SUNW_PRIMARY flag set is written to the primary object without regard for its allocable status. ... Two members in the section header, sh_link, and sh_info, hold special information, depending on section type. Table 13-9 ELF sh_link and sh_info Interpretation sh_typesh_linksh_info . . . SHT_SUNW_ANCILLARY The section header index of the associated string table. 0 . . . Special Sections Edit Note: This section describes the sections used in Solaris ELF objects, using the types defined in the previous description of section types. It was updated to define the new .SUNW_ancillary (SHT_SUNW_ANCILLARY) section. Various sections hold program and control information. Sections in the following table are used by the system and have the indicated types and attributes. Table 13-10 ELF Special Sections NameTypeAttribute . . . .SUNW_ancillarySHT_SUNW_ancillaryNone . . . ... .SUNW_ancillary Present when a given object is part of a group of ancillary objects. Contains information required to identify all the files that make up the group. See Ancillary Section for details. ... Ancillary Section Edit Note: This new section provides the format reference describing the layout of a .SUNW_ancillary section and the meaning of the various tags. Note that these sections use the same tag/value concept used for dynamic and capabilities sections, and will be familiar to anyone used to working with ELF. In addition to the primary output object, the Solaris link-editor can produce one or more ancillary objects. Ancillary objects contain non-allocable sections that would normally be written to the primary object. When ancillary objects are produced, the primary object and all of the associated ancillary objects contain a SHT_SUNW_ancillary section, containing information that identifies these related objects. Given any one object from such a group, the ancillary section provides the information needed to identify and interpret the others. This section contains an array of the following structures. See sys/elf.h. typedef struct { Elf32_Word a_tag; union { Elf32_Word a_val; Elf32_Addr a_ptr; } a_un; } Elf32_Ancillary; typedef struct { Elf64_Xword a_tag; union { Elf64_Xword a_val; Elf64_Addr a_ptr; } a_un; } Elf64_Ancillary; For each object with this type, a_tag controls the interpretation of a_un. a_val These objects represent integer values with various interpretations. a_ptr These objects represent file offsets or addresses. The following ancillary tags exist. Table 13-NEW1 ELF Ancillary Array Tags NameValuea_un ANC_SUNW_NULL0Ignored ANC_SUNW_CHECKSUM1a_val ANC_SUNW_MEMBER2a_ptr ANC_SUNW_NULL Marks the end of the ancillary section. ANC_SUNW_CHECKSUM Provides the checksum for a file in the c_val element. When ANC_SUNW_CHECKSUM precedes the first instance of ANC_SUNW_MEMBER, it provides the checksum for the object from which the ancillary section is being read. When it follows an ANC_SUNW_MEMBER tag, it provides the checksum for that member. ANC_SUNW_MEMBER Specifies an object name. The a_ptr element contains the string table offset of a null-terminated string, that provides the file name. An ancillary section must always contain an ANC_SUNW_CHECKSUM before the first instance of ANC_SUNW_MEMBER, identifying the current object. Following that, there should be an ANC_SUNW_MEMBER for each object that makes up the complete set of objects. Each ANC_SUNW_MEMBER should be followed by an ANC_SUNW_CHECKSUM for that object. A typical ancillary section will therefore be structured as: TagMeaning ANC_SUNW_CHECKSUMChecksum of this object ANC_SUNW_MEMBERName of object #1 ANC_SUNW_CHECKSUMChecksum for object #1 . . . ANC_SUNW_MEMBERName of object N ANC_SUNW_CHECKSUMChecksum for object N ANC_SUNW_NULL An object can therefore identify itself by comparing the initial ANC_SUNW_CHECKSUM to each of the ones that follow, until it finds a match. Related Other Work The GNU developers have also encountered the need/desire to support separate debug information files, and use the solution detailed at http://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html. At the current time, the separate debug file is constructed by building the standard object first, and then copying the debug data out of it in a separate post processing step, Hence, it is limited to a total of 4GB of code and debug data, just as a single object file would be. They are aware of this, and I have seen online comments indicating that they may add direct support for generating these separate files to their link-editor. It is worth noting that the GNU objcopy utility is available on Solaris, and that the Studio dbx debugger is able to use these GNU style separate debug files even on Solaris. Although this is interesting in terms giving Linux users a familiar environment on Solaris, the 4GB limit means it is not an answer to the problem of very large 32-bit objects. We have also encountered issues with objcopy not understanding Solaris-specific ELF sections, when using this approach. The GNU community also has a current effort to adapt their DWARF debug sections in order to move them to separate files before passing the relocatable objects to the linker. The details of Project Fission can be found at http://gcc.gnu.org/wiki/DebugFission. The goal of this project appears to be to reduce the amount of data seen by the link-editor. The primary effort revolves around moving DWARF data to separate .dwo files so that the link-editor never encounters them. The details of modifying the DWARF data to be usable in this form are involved — please see the above URL for details.

Read the article
MacBook Pro Late 2009 SATA Resets, Slowness

- by A Student at a University

My MacBook Pro runs slower the longer it's on. I am getting kernel warnings. The resets correlate with AC power connects and disconnects. I don't know if the warnings do. (How do I tell?) Are these bus CRC errors? Or something else? Can this damage the drive or corrupt data? What is it seeing that motivates these? 02:37:16 :[ 0.791992] ahci 0000:00:0b.0: PCI INT A -> Link[LSI0] -> GSI 20 (level, low) -> IRQ 20 02:37:16 :[ 0.792053] ahci 0000:00:0b.0: controller can't do PMP, turning off CAP_PMP 02:37:16 :[ 0.792104] ahci 0000:00:0b.0: AHCI 0001.0200 32 slots 6 ports 1.5 Gbps 0x3 impl IDE mode 02:37:16 :[ 0.792107] ahci 0000:00:0b.0: flags: 64bit ncq sntf pm led pio slum part boh 02:37:16 :[ 0.813473] scsi0 : ahci 02:37:16 :[ 0.823340] scsi1 : ahci 02:37:16 :[ 0.848164] ata1: SATA max UDMA/133 abar m8192@0xe7484000 port 0xe7484100 irq 43 02:37:16 :[ 0.848166] ata2: SATA max UDMA/133 abar m8192@0xe7484000 port 0xe7484180 irq 43 02:37:16 :[ 1.190132] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:16 :[ 1.190153] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:16 :[ 1.213568] ata1.00: ATA-8: OCZ-VERTEX2, 1.23, max UDMA/133 02:37:16 :[ 1.213572] ata1.00: 195371568 sectors, multi 1: LBA48 NCQ (depth 31/32) 02:37:16 :[ 1.227293] ata2.00: ATA-8: ST9500420ASG, 0002SDM1, max UDMA/133 02:37:16 :[ 1.227297] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32) 02:37:16 :[ 1.229570] ata2.00: configured for UDMA/133 02:37:16 :[ 1.240133] ata2: hard resetting link 02:37:16 :[ 1.260738] ata1.00: configured for UDMA/133 02:37:16 :[ 1.280122] ata1: hard resetting link 02:37:16 :[ 1.470125] usb 2-5: new high speed USB device using ehci_hcd and address 3 02:37:16 :[ 1.550165] firewire_core: created device fw0: GUID 58b035fffea99f5c, S800 02:37:16 :[ 1.631306] Initializing USB Mass Storage driver... 02:37:16 :[ 1.631392] scsi6 : usb-storage 2-5:1.0 02:37:16 :[ 1.631454] usbcore: registered new interface driver usb-storage 02:37:16 :[ 1.631455] USB Mass Storage support registered. 02:37:16 :[ 1.960128] usb 4-1: new full speed USB device using ohci_hcd and address 2 02:37:16 :[ 1.990101] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:16 :[ 1.994215] ata2.00: configured for UDMA/133 02:37:16 :[ 1.994220] ata2: EH complete 02:37:16 :[ 2.030097] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:16 :[ 2.090773] ata1.00: configured for UDMA/133 02:37:16 :[ 2.090778] ata1: EH complete 02:37:16 :[ 2.090931] scsi 0:0:0:0: Direct-Access ATA OCZ-VERTEX2 1.23 PQ: 0 ANSI: 5 02:37:16 :[ 2.091045] sd 0:0:0:0: Attached scsi generic sg0 type 0 02:37:16 :[ 2.091121] sd 0:0:0:0: [sda] 195371568 512-byte logical blocks: (100 GB/93.1 GiB) 02:37:16 :[ 2.091159] scsi 1:0:0:0: Direct-Access ATA ST9500420ASG 0002 PQ: 0 ANSI: 5 02:37:16 :[ 2.091163] sd 0:0:0:0: [sda] Write Protect is off 02:37:16 :[ 2.091183] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA 02:37:16 :[ 2.091252] sd 1:0:0:0: Attached scsi generic sg1 type 0 02:37:16 :[ 2.091337] sda: 02:37:16 :[ 2.091446] sd 1:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB) 02:37:16 :[ 2.091580] sd 1:0:0:0: [sdb] Write Protect is off 02:37:16 :[ 2.091637] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA 02:37:16 :[ 2.091756] sdb: sda1 sda2 02:37:16 :[ 2.093140] sd 0:0:0:0: [sda] Attached SCSI disk 02:37:16 :[ 2.093505] sdb1 sdb2 sdb3 02:37:16 :[ 2.093773] sd 1:0:0:0: [sdb] Attached SCSI disk 02:37:16 :[ 2.693899] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) 02:37:16 :[ 5.483492] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro 02:37:16 :[ 7.905040] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) 02:37:25 :[ 19.553095] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:37:25 :[ 19.555266] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:37:25 :[ 19.641533] ata1: hard resetting link 02:37:25 :[ 19.642084] ata2: hard resetting link 02:37:26 :[ 20.392606] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:26 :[ 20.392610] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:26 :[ 20.396697] ata2.00: configured for UDMA/133 02:37:26 :[ 20.396703] ata2: EH complete 02:37:26 :[ 20.451491] ata1.00: configured for UDMA/133 02:37:26 :[ 20.451498] ata1: EH complete 02:37:30 :[ 24.563725] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:37:30 :[ 24.565939] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:37:30 :[ 24.627246] ata1: hard resetting link 02:37:30 :[ 24.632250] ata2: hard resetting link 02:37:31 :[ 25.372582] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:31 :[ 25.382615] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 02:37:31 :[ 25.386782] ata2.00: configured for UDMA/133 02:37:31 :[ 25.386788] ata2: EH complete 02:37:31 :[ 25.431668] ata1.00: configured for UDMA/133 02:37:31 :[ 25.431674] ata1: EH complete 02:45:54 :[ 529.141844] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 02:45:55 :[ 529.544529] EXT4-fs (dm-2): re-mounted. Opts: commit=0 02:45:55 :[ 529.622561] ata1: limiting SATA link speed to 1.5 Gbps 02:45:55 :[ 529.622583] ata1: hard resetting link 02:45:55 :[ 529.622609] ata2: limiting SATA link speed to 1.5 Gbps 02:45:55 :[ 529.622624] ata2: hard resetting link 02:45:56 :[ 530.380135] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:45:56 :[ 530.380157] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:45:56 :[ 530.384305] ata2.00: configured for UDMA/133 02:45:56 :[ 530.384314] ata2: EH complete 02:45:56 :[ 530.399225] ata1.00: configured for UDMA/133 02:45:56 :[ 530.399233] ata1: EH complete 02:45:58 :[ 532.395990] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:45:58 :[ 532.518270] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:45:58 :[ 532.590983] ata1: hard resetting link 02:45:58 :[ 532.591045] ata2: hard resetting link 02:45:59 :[ 533.340147] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:45:59 :[ 533.340168] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:45:59 :[ 533.344416] ata2.00: configured for UDMA/133 02:45:59 :[ 533.344424] ata2: EH complete 02:45:59 :[ 533.360839] ata1.00: configured for UDMA/133 02:45:59 :[ 533.360847] ata1: EH complete 02:45:59 :[ 533.584449] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 02:45:59 :[ 533.586999] EXT4-fs (dm-2): re-mounted. Opts: commit=0 02:45:59 :[ 533.660132] ata2: hard resetting link 02:45:59 :[ 533.660151] ata1: hard resetting link 02:46:00 :[ 534.412536] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:00 :[ 534.412562] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:00 :[ 534.416768] ata2.00: configured for UDMA/133 02:46:00 :[ 534.416777] ata2: EH complete 02:46:00 :[ 534.431396] ata1.00: configured for UDMA/133 02:46:00 :[ 534.431401] ata1: EH complete 02:46:03 :[ 537.384649] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:46:03 :[ 537.504214] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:46:03 :[ 537.586002] ata1: hard resetting link 02:46:03 :[ 537.586036] ata2: hard resetting link 02:46:04 :[ 538.330147] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:04 :[ 538.330168] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:04 :[ 538.334389] ata2.00: configured for UDMA/133 02:46:04 :[ 538.334398] ata2: EH complete 02:46:04 :[ 538.343511] ata1.00: configured for UDMA/133 02:46:04 :[ 538.343519] ata1: EH complete 02:46:04 :[ 538.456413] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 02:46:04 :[ 538.459404] EXT4-fs (dm-2): re-mounted. Opts: commit=0 02:46:04 :[ 538.540138] ata1.00: limiting speed to UDMA/100:PIO4 02:46:04 :[ 538.540159] ata1: hard resetting link 02:46:04 :[ 538.540202] ata2.00: limiting speed to UDMA/100:PIO4 02:46:04 :[ 538.540220] ata2: hard resetting link 02:46:05 :[ 539.290054] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:05 :[ 539.290041] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:05 :[ 539.294100] ata2.00: configured for UDMA/100 02:46:05 :[ 539.294106] ata2: EH complete 02:46:05 :[ 539.314125] ata1.00: configured for UDMA/100 02:46:05 :[ 539.314132] ------------[ cut here ]------------ 02:46:05 :[ 539.314140] WARNING: at /build/buildd/linux-2.6.35/drivers/ata/libata-eh.c:3638 ata_eh_finish+0xdf/0xf0() 02:46:05 :[ 539.314144] Hardware name: MacBookPro5,3 02:46:05 :[ 539.314146] Modules linked in: michael_mic arc4 xt_multiport binfmt_misc rfcomm sco bnep l2cap parport_pc ppdev nvidia(P) ipt_REJECT xt_recent snd_hda_codec_cirrus xt_limit xt_tcpudp ipt_addrtype xt_state snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi applesmc led_class ip6table_filter lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event ip6_tables input_polldev hid_apple snd_seq wl(P) snd_timer snd_seq_device snd joydev bcm5974 usbhid mbp_nvidia_bl uvcvideo btusb videodev v4l1_compat v4l2_compat_ioctl32 nf_nat_irc hid nf_conntrack_irc soundcore snd_page_alloc i2c_nforce2 coretemp lib80211 bluetooth nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack lp parport iptable_filter ip_tables x_tables usb_storage firewire_ohci firewire_core forcedeth crc_itu_t ahci libahci 02:46:05 :[ 539.314221] Pid: 202, comm: scsi_eh_0 Tainted: P 2.6.35-25-generic #44-Ubuntu 02:46:05 :[ 539.314224] Call Trace: 02:46:05 :[ 539.314233] [<ffffffff8106091f>] warn_slowpath_common+0x7f/0xc0 02:46:05 :[ 539.314237] [<ffffffff8106097a>] warn_slowpath_null+0x1a/0x20 02:46:05 :[ 539.314242] [<ffffffff813dc77f>] ata_eh_finish+0xdf/0xf0 02:46:05 :[ 539.314246] [<ffffffff813e441e>] sata_pmp_error_handler+0x2e/0x40 02:46:05 :[ 539.314256] [<ffffffffa00021bf>] ahci_error_handler+0x1f/0x90 [libahci] 02:46:05 :[ 539.314261] [<ffffffff813dd6d2>] ata_scsi_error+0x492/0x5e0 02:46:05 :[ 539.314266] [<ffffffff813b24cd>] scsi_error_handler+0x10d/0x190 02:46:05 :[ 539.314270] [<ffffffff813b23c0>] ? scsi_error_handler+0x0/0x190 02:46:05 :[ 539.314275] [<ffffffff8107f266>] kthread+0x96/0xa0 02:46:05 :[ 539.314280] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10 02:46:05 :[ 539.314284] [<ffffffff8107f1d0>] ? kthread+0x0/0xa0 02:46:05 :[ 539.314288] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10 02:46:05 :[ 539.314291] ---[ end trace 76dbffc2d5d49d9b ]--- 02:46:05 :[ 539.314296] ata1: EH complete 02:46:12 :[ 547.040117] ata1: hard resetting link 02:46:13 :[ 547.390144] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:13 :[ 547.408430] ata1.00: configured for UDMA/100 02:46:13 :[ 547.408438] ------------[ cut here ]------------ 02:46:13 :[ 547.408447] WARNING: at /build/buildd/linux-2.6.35/drivers/ata/libata-eh.c:3638 ata_eh_finish+0xdf/0xf0() 02:46:13 :[ 547.408451] Hardware name: MacBookPro5,3 02:46:13 :[ 547.408453] Modules linked in: michael_mic arc4 xt_multiport binfmt_misc rfcomm sco bnep l2cap parport_pc ppdev nvidia(P) ipt_REJECT xt_recent snd_hda_codec_cirrus xt_limit xt_tcpudp ipt_addrtype xt_state snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi applesmc led_class ip6table_filter lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event ip6_tables input_polldev hid_apple snd_seq wl(P) snd_timer snd_seq_device snd joydev bcm5974 usbhid mbp_nvidia_bl uvcvideo btusb videodev v4l1_compat v4l2_compat_ioctl32 nf_nat_irc hid nf_conntrack_irc soundcore snd_page_alloc i2c_nforce2 coretemp lib80211 bluetooth nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack lp parport iptable_filter ip_tables x_tables usb_storage firewire_ohci firewire_core forcedeth crc_itu_t ahci libahci 02:46:13 :[ 547.408528] Pid: 202, comm: scsi_eh_0 Tainted: P W 2.6.35-25-generic #44-Ubuntu 02:46:13 :[ 547.408531] Call Trace: 02:46:13 :[ 547.408540] [<ffffffff8106091f>] warn_slowpath_common+0x7f/0xc0 02:46:13 :[ 547.408544] [<ffffffff8106097a>] warn_slowpath_null+0x1a/0x20 02:46:13 :[ 547.408549] [<ffffffff813dc77f>] ata_eh_finish+0xdf/0xf0 02:46:13 :[ 547.408553] [<ffffffff813e441e>] sata_pmp_error_handler+0x2e/0x40 02:46:13 :[ 547.408563] [<ffffffffa00021bf>] ahci_error_handler+0x1f/0x90 [libahci] 02:46:13 :[ 547.408567] [<ffffffff813dd6d2>] ata_scsi_error+0x492/0x5e0 02:46:13 :[ 547.408572] [<ffffffff813b24cd>] scsi_error_handler+0x10d/0x190 02:46:13 :[ 547.408577] [<ffffffff813b23c0>] ? scsi_error_handler+0x0/0x190 02:46:13 :[ 547.408582] [<ffffffff8107f266>] kthread+0x96/0xa0 02:46:13 :[ 547.408587] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10 02:46:13 :[ 547.408591] [<ffffffff8107f1d0>] ? kthread+0x0/0xa0 02:46:13 :[ 547.408595] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10 02:46:13 :[ 547.408598] ---[ end trace 76dbffc2d5d49d9c ]--- 02:46:13 :[ 547.408620] ata1: EH complete 02:46:13 :[ 547.562470] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:46:13 :[ 547.671380] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:46:13 :[ 547.738198] ata1.00: limiting speed to UDMA/33:PIO4 02:46:13 :[ 547.738218] ata1: hard resetting link 02:46:13 :[ 547.738274] ata2: hard resetting link 02:46:14 :[ 548.482561] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:14 :[ 548.484083] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:14 :[ 548.486809] ata2.00: configured for UDMA/100 02:46:14 :[ 548.486818] ata2: EH complete 02:46:14 :[ 548.498998] ata1.00: configured for UDMA/33 02:46:14 :[ 548.499004] ata1: EH complete 02:46:18 :[ 552.410499] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:46:18 :[ 552.522521] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:46:18 :[ 552.529684] ata1: hard resetting link 02:46:18 :[ 552.529723] ata2: hard resetting link 02:46:19 :[ 553.280059] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:19 :[ 553.280068] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:19 :[ 553.284141] ata2.00: configured for UDMA/100 02:46:19 :[ 553.284150] ata2: EH complete 02:46:19 :[ 553.301629] ata1.00: configured for UDMA/33 02:46:19 :[ 553.301637] ata1: EH complete 02:46:21 :[ 556.078830] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 02:46:21 :[ 556.180361] EXT4-fs (dm-2): re-mounted. Opts: commit=0 02:46:22 :[ 556.262612] ata1: hard resetting link 02:46:22 :[ 556.262617] ata2: hard resetting link 02:46:22 :[ 557.010050] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:22 :[ 557.010070] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:22 :[ 557.014069] ata2.00: configured for UDMA/100 02:46:22 :[ 557.014075] ata2: EH complete 02:46:22 :[ 557.023646] ata1.00: configured for UDMA/33 02:46:22 :[ 557.023654] ata1: EH complete 02:46:30 :[ 565.047438] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:46:30 :[ 565.051554] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:46:30 :[ 565.108332] ata1: hard resetting link 02:46:30 :[ 565.108389] ata2.00: limiting speed to UDMA/33:PIO4 02:46:30 :[ 565.108406] ata2: hard resetting link 02:46:31 :[ 565.850048] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:31 :[ 565.850068] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:31 :[ 565.854304] ata2.00: configured for UDMA/33 02:46:31 :[ 565.854313] ata2: EH complete 02:46:31 :[ 565.868477] ata1.00: configured for UDMA/33 02:46:31 :[ 565.868485] ata1: EH complete 02:46:35 :[ 569.265469] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 02:46:35 :[ 569.268139] EXT4-fs (dm-2): re-mounted. Opts: commit=0 02:46:35 :[ 569.340079] ata1: hard resetting link 02:46:35 :[ 569.340113] ata2: hard resetting link 02:46:35 :[ 570.092568] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:35 :[ 570.092589] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:46:35 :[ 570.096828] ata2.00: configured for UDMA/33 02:46:35 :[ 570.096837] ata2: EH complete 02:46:35 :[ 570.110727] ata1.00: configured for UDMA/33 02:46:35 :[ 570.110735] ata1: EH complete 02:47:04 :[ 598.528232] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 02:47:04 :[ 598.653973] EXT4-fs (dm-2): re-mounted. Opts: commit=600 02:47:04 :[ 598.730854] ata1: hard resetting link 02:47:04 :[ 598.730910] ata2: hard resetting link 02:47:05 :[ 599.480136] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:47:05 :[ 599.480159] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 02:47:05 :[ 599.484206] ata2.00: configured for UDMA/33 02:47:05 :[ 599.484213] ata2: EH complete 02:47:05 :[ 599.496699] ata1.00: configured for UDMA/33 02:47:05 :[ 599.496707] ata1: EH complete 04:45:59 :[ 7733.756548] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 04:45:59 :[ 7733.882748] EXT4-fs (dm-2): re-mounted. Opts: commit=0 04:45:59 :[ 7733.960142] ata1: hard resetting link 04:45:59 :[ 7733.960189] ata2: hard resetting link 04:46:00 :[ 7734.701926] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 04:46:00 :[ 7734.719939] ata1.00: configured for UDMA/33 04:46:00 :[ 7734.719946] ata1: EH complete 04:46:00 :[ 7734.722547] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 04:46:00 :[ 7734.726652] ata2.00: configured for UDMA/33 04:46:00 :[ 7734.726659] ata2: EH complete 04:46:02 :[ 7736.656465] ACPI: EC: GPE storm detected, transactions will use polling mode 13:38:49 :[39704.188621] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 13:38:49 :[39704.280588] EXT4-fs (dm-2): re-mounted. Opts: commit=600 13:38:49 :[39704.360819] ata1: hard resetting link 13:38:49 :[39704.360882] ata2: hard resetting link 13:38:50 :[39705.112956] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 13:38:50 :[39705.114435] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 13:38:50 :[39705.118673] ata2.00: configured for UDMA/33 13:38:50 :[39705.118682] ata2: EH complete 13:38:50 :[39705.127076] ata1.00: configured for UDMA/33 13:38:50 :[39705.127084] ata1: EH complete 13:39:49 :[39764.142463] applesmc: F1Mn: write arg fail 13:48:11 :[40267.025145] applesmc: FS! : read arg fail 13:52:53 :[40548.596735] applesmc: FS! : read arg fail 13:53:58 :[40613.972856] applesmc: FS! : read arg fail 13:54:08 :[40624.057339] applesmc: FS! : read arg fail 13:58:20 :[40875.397749] applesmc: TC0D: read data fail 14:16:56 :[41991.722054] applesmc: Th2H: read data fail 14:22:32 :[42327.991522] applesmc: light sensor data length set to 10 14:26:19 :[42554.788886] applesmc: F1Mn: write arg fail 14:32:36 :[42931.860443] applesmc: TC0F: read data fail 14:34:32 :[43048.041469] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 14:34:33 :[43048.185850] EXT4-fs (dm-2): re-mounted. Opts: commit=0 14:34:33 :[43048.270184] ata1: hard resetting link 14:34:33 :[43048.270224] ata2: hard resetting link 14:34:33 :[43049.030049] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:33 :[43049.030065] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:33 :[43049.034106] ata2.00: configured for UDMA/33 14:34:33 :[43049.034112] ata2: EH complete 14:34:33 :[43049.056952] ata1.00: configured for UDMA/33 14:34:33 :[43049.056959] ------------[ cut here ]------------ 14:34:33 :[43049.056968] WARNING: at /build/buildd/linux-2.6.35/drivers/ata/libata-eh.c:3638 ata_eh_finish+0xdf/0xf0() 14:34:33 :[43049.056971] Hardware name: MacBookPro5,3 14:34:33 :[43049.056973] Modules linked in: michael_mic arc4 xt_multiport binfmt_misc rfcomm sco bnep l2cap parport_pc ppdev nvidia(P) ipt_REJECT xt_recent snd_hda_codec_cirrus xt_limit xt_tcpudp ipt_addrtype xt_state snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi applesmc led_class ip6table_filter lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event ip6_tables input_polldev hid_apple snd_seq wl(P) snd_timer snd_seq_device snd joydev bcm5974 usbhid mbp_nvidia_bl uvcvideo btusb videodev v4l1_compat v4l2_compat_ioctl32 nf_nat_irc hid nf_conntrack_irc soundcore snd_page_alloc i2c_nforce2 coretemp lib80211 bluetooth nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack lp parport iptable_filter ip_tables x_tables usb_storage firewire_ohci firewire_core forcedeth crc_itu_t ahci libahci 14:34:33 :[43049.057048] Pid: 202, comm: scsi_eh_0 Tainted: P W 2.6.35-25-generic #44-Ubuntu 14:34:33 :[43049.057052] Call Trace: 14:34:33 :[43049.057060] [<ffffffff8106091f>] warn_slowpath_common+0x7f/0xc0 14:34:33 :[43049.057064] [<ffffffff8106097a>] warn_slowpath_null+0x1a/0x20 14:34:33 :[43049.057069] [<ffffffff813dc77f>] ata_eh_finish+0xdf/0xf0 14:34:33 :[43049.057074] [<ffffffff813e441e>] sata_pmp_error_handler+0x2e/0x40 14:34:33 :[43049.057083] [<ffffffffa00021bf>] ahci_error_handler+0x1f/0x90 [libahci] 14:34:33 :[43049.057088] [<ffffffff813dd6d2>] ata_scsi_error+0x492/0x5e0 14:34:33 :[43049.057093] [<ffffffff813b24cd>] scsi_error_handler+0x10d/0x190 14:34:33 :[43049.057097] [<ffffffff813b23c0>] ? scsi_error_handler+0x0/0x190 14:34:33 :[43049.057102] [<ffffffff8107f266>] kthread+0x96/0xa0 14:34:33 :[43049.057107] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10 14:34:33 :[43049.057111] [<ffffffff8107f1d0>] ? kthread+0x0/0xa0 14:34:33 :[43049.057115] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10 14:34:33 :[43049.057118] ---[ end trace 76dbffc2d5d49d9d ]--- 14:34:33 :[43049.057123] ata1: EH complete 14:34:41 :[43057.012698] ata1: hard resetting link 14:34:42 :[43057.362780] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:42 :[43057.381432] ata1.00: configured for UDMA/33 14:34:42 :[43057.381441] ------------[ cut here ]------------ 14:34:42 :[43057.381450] WARNING: at /build/buildd/linux-2.6.35/drivers/ata/libata-eh.c:3638 ata_eh_finish+0xdf/0xf0() 14:34:42 :[43057.381453] Hardware name: MacBookPro5,3 14:34:42 :[43057.381455] Modules linked in: michael_mic arc4 xt_multiport binfmt_misc rfcomm sco bnep l2cap parport_pc ppdev nvidia(P) ipt_REJECT xt_recent snd_hda_codec_cirrus xt_limit xt_tcpudp ipt_addrtype xt_state snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi applesmc led_class ip6table_filter lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event ip6_tables input_polldev hid_apple snd_seq wl(P) snd_timer snd_seq_device snd joydev bcm5974 usbhid mbp_nvidia_bl uvcvideo btusb videodev v4l1_compat v4l2_compat_ioctl32 nf_nat_irc hid nf_conntrack_irc soundcore snd_page_alloc i2c_nforce2 coretemp lib80211 bluetooth nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack lp parport iptable_filter ip_tables x_tables usb_storage firewire_ohci firewire_core forcedeth crc_itu_t ahci libahci 14:34:42 :[43057.381530] Pid: 202, comm: scsi_eh_0 Tainted: P W 2.6.35-25-generic #44-Ubuntu 14:34:42 :[43057.381533] Call Trace: 14:34:42 :[43057.381542] [<ffffffff8106091f>] warn_slowpath_common+0x7f/0xc0 14:34:42 :[43057.381546] [<ffffffff8106097a>] warn_slowpath_null+0x1a/0x20 14:34:42 :[43057.381551] [<ffffffff813dc77f>] ata_eh_finish+0xdf/0xf0 14:34:42 :[43057.381556] [<ffffffff813e441e>] sata_pmp_error_handler+0x2e/0x40 14:34:42 :[43057.381565] [<ffffffffa00021bf>] ahci_error_handler+0x1f/0x90 [libahci] 14:34:42 :[43057.381569] [<ffffffff813dd6d2>] ata_scsi_error+0x492/0x5e0 14:34:42 :[43057.381575] [<ffffffff813b24cd>] scsi_error_handler+0x10d/0x190 14:34:42 :[43057.381579] [<ffffffff813b23c0>] ? scsi_error_handler+0x0/0x190 14:34:42 :[43057.381584] [<ffffffff8107f266>] kthread+0x96/0xa0 14:34:42 :[43057.381589] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10 14:34:42 :[43057.381594] [<ffffffff8107f1d0>] ? kthread+0x0/0xa0 14:34:42 :[43057.381598] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10 14:34:42 :[43057.381601] ---[ end trace 76dbffc2d5d49d9e ]--- 14:34:42 :[43057.381624] ata1: EH complete 14:34:42 :[43057.557887] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 14:34:42 :[43057.560517] EXT4-fs (dm-2): re-mounted. Opts: commit=600 14:34:42 :[43057.621194] ata1: hard resetting link 14:34:42 :[43057.621252] ata2: hard resetting link 14:34:43 :[43058.370141] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:43 :[43058.370162] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:43 :[43058.374407] ata2.00: configured for UDMA/33 14:34:43 :[43058.374415] ata2: EH complete 14:34:43 :[43058.381989] ata1.00: configured for UDMA/33 14:34:43 :[43058.381996] ata1: EH complete 14:34:43 :[43058.616228] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=600 14:34:43 :[43058.618931] EXT4-fs (dm-2): re-mounted. Opts: commit=600 14:34:43 :[43058.626687] ata1: hard resetting link 14:34:43 :[43058.626731] ata2: hard resetting link 14:34:44 :[43059.372908] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:44 :[43059.372932] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 14:34:44 :[43059.376997] ata2.00: configured for UDMA/33 14:34:44 :[43059.377003] ata2: EH complete 14:34:44 :[43059.392576] ata1.00: configured for UDMA/33 14:34:44 :[43059.392585] ata1: EH complete 15:48:19 :[47474.710860] ata1: hard resetting link 15:48:19 :[47474.710882] ata2: hard resetting link 15:48:20 :[47475.460144] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 15:48:20 :[47475.460169] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 15:48:20 :[47475.473709] ata1.00: configured for UDMA/33 15:48:20 :[47475.473717] ata1: EH complete 15:48:20 :[47475.727960] ata2.00: configured for UDMA/33 15:48:20 :[47475.727969] ata2: EH complete 16:29:39 :[49954.295017] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,commit=0 16:29:39 :[49954.622307] EXT4-fs (dm-2): re-mounted. Opts: commit=0 16:29:39 :[49954.710139] ata1: hard resetting link 16:29:39 :[49954.710174] ata2: hard resetting link 16:29:40 :[49955.460046] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 16:29:40 :[49955.460062] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 16:29:40 :[49955.464138] ata2.00: configured for UDMA/33 16:29:40 :[49955.464144] ata2: EH complete 16:29:40 :[49955.473251] ata1.00: configured for UDMA/33 16:29:40 :[49955.473258] ata1: EH complete

Read the article
ASP.NET Frameworks and Raw Throughput Performance

- by Rick Strahl

A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012 removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this: It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type. ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results: The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
CodePlex Daily Summary for Wednesday, November 23, 2011

CodePlex Daily Summary for Wednesday, November 23, 2011Popular ReleasesVisual Leak Detector for Visual C++ 2008/2010: v2.2.1: Enhancements: * strdup and _wcsdup functions support added. * Preliminary support for VS 11 added. Bugs Fixed: * Low performance after upgrading from VLD v2.1. * Memory leaks with static linking fixed (disabled calloc support). * Runtime error R6002 fixed because of wrong memory dump format. * version.h fixed in installer. * Some PVS studio warning fixed.NetSqlAzMan - .NET SQL Authorization Manager: 3.6.0.10: 3.6.0.10 22-Nov-2011 Update: Removed PreEmptive Platform integration (PreEmptive analytics) Removed all PreEmptive attributes Removed PreEmptive.dll assembly references from all projects Added first support to ADAM/AD LDS Thanks to PatBea. Work Item 9775: http://netsqlazman.codeplex.com/workitem/9775Developer Team Article System Management: DTASM v1.3: ?? ??? ???? 3 ????? ???? ???? ????? ??? : - ????? ?????? ????? ???? ?? ??? ???? ????? ?? ??? ? ?? ???? ?????? ???? ?? ???? ????? ?? . - ??? ?? ???? ????? ???? ????? ???? ???? ?? ????? , ?????? ????? ????? ?? ??? . - ??? ??????? ??? ??? ???? ?? ????? ????? ????? .SharePoint 2010 FBA Pack: SharePoint 2010 FBA Pack 1.2.0: Web parts are now fully customizable via html templates (Issue #323) FBA Pack is now completely localizable using resource files. Thank you David Chen for submitting the code as well as Chinese translations of the FBA Pack! The membership request web part now gives the option of having the user enter the password and removing the captcha (Issue # 447) The FBA Pack will now work in a zone that does not have FBA enabled (Another zone must have FBA enabled, and the zone must contain the me...SharePoint 2010 Education Demo Project: Release SharePoint SP1 for Education Solutions: This release includes updates to the Content Packs for SharePoint SP1. All Content Packs have been updated to install successfully under SharePoint SP1SQL Monitor - tracking sql server activities: SQLMon 4.1 alpha 6: 1. improved support for schema 2. added find reference when right click on object list 3. added object rename supportBugNET Issue Tracker: BugNET 0.9.126: First stable release of version 0.9. Upgrades from 0.8 are fully supported and upgrades to future releases will also be supported. This release is now compiled against the .NET 4.0 framework and is a requirement. Because of this the web.config has significantly changed. After upgrading, you will need to configure the authentication settings for user registration and anonymous access again. Please see our installation / upgrade instructions for more details: http://wiki.bugnetproject.c...Anno 2070 Assistant: v0.1.0 (STABLE): Version 0.1.0 Features Production Chains Eco Production Chains (Complete) Tycoon Production Chains (Disabled - Incomplete) Tech Production Chains (Disabled - Incomplete) Supply (Disabled - Incomplete) Calculator (Disabled - Incomplete) Building Layouts Eco Building Layouts (Complete) Tycoon Building Layouts (Disabled - Incomplete) Tech Building Layouts (Disabled - Incomplete) Credits (Complete)Free SharePoint 2010 Sites Templates: SharePoint Server 2010 Sites Templates: here is the list of sites templates to be downloadedVsTortoise - a TortoiseSVN add-in for Microsoft Visual Studio: VsTortoise Build 30 Beta: Note: This release does not work with custom VsTortoise toolbars. These get removed every time when you shutdown Visual Studio. (#7940) Build 30 (beta)New: Support for TortoiseSVN 1.7 added. (the download contains both setups, for TortoiseSVN 1.6 and 1.7) New: OpenModifiedDocumentDialog displays conflicted files now. New: OpenModifiedDocument allows to group items by changelist now. Fix: OpenModifiedDocumentDialog caused Visual Studio 2010 to freeze sometimes. Fix: The installer didn...nopCommerce. Open source shopping cart (ASP.NET MVC): nopcommerce 2.30: Highlight features & improvements: • Performance optimization. • Back in stock notifications. • Product special price support. • Catalog mode (based on customer role) To see the full list of fixes and changes please visit the release notes page (http://www.nopCommerce.com/releasenotes.aspx).WPF Converters: WPF Converters V1.2.0.0: support for enumerations, value types, and reference types in the expression converter's equality operators the expression converter now handles DependencyProperty.UnsetValue as argument values correctly (#4062) StyleCop conformance (more or less)Json.NET: Json.NET 4.0 Release 4: Change - JsonTextReader.Culture is now CultureInfo.InvariantCulture by default Change - KeyValurPairConverter no longer cares about the order of the key and value properties Change - Time zone conversions now use new TimeZoneInfo instead of TimeZone Fix - Fixed boolean values sometimes being capitalized when converting to XML Fix - Fixed error when deserializing ConcurrentDictionary Fix - Fixed serializing some Uris returning the incorrect value Fix - Fixed occasional error when...Media Companion: MC 3.423b Weekly: Ensure .NET 4.0 Full Framework is installed. (Available from http://www.microsoft.com/download/en/details.aspx?id=17718) Ensure the NFO ID fix is applied when transitioning from versions prior to 3.416b. (Details here) Replaced 'Rebuild' with 'Refresh' throughout entire code. Rebuild will now be known as Refresh. mc_com.exe has been fully updated TV Show Resolutions... Resolved issue #206 - having to hit save twice when updating runtime manually Shrunk cache size and lowered loading times f...Delta Engine: Delta Engine Beta Preview v0.9.1: v0.9.1 beta release with lots of refactoring, fixes, new samples and support for iOS, Android and WP7 (you need a Marketplace account however). If you want a binary release for the games (like v0.9.0), just say so in the Forum or here and we will quickly prepare one. It is just not much different from v0.9.0, so I left it out this time. See http://DeltaEngine.net/Wiki.Roadmap for details.ASP.net Awesome Samples (Web-Forms): 1.0 samples: Demos and Tutorials for ASP.net Awesome VS2008 are in .NET 3.5 VS2010 are in .NET 4.0 (demos for the ASP.net Awesome jQuery Ajax Controls)SharpMap - Geospatial Application Framework for the CLR: SharpMap-0.9-AnyCPU-Trunk-2011.11.17: This is a build of SharpMap from the 0.9 development trunk as per 2011-11-17 For most applications the AnyCPU release is the recommended, but in case you need an x86 build that is included to. For some dataproviders (GDAL/OGR, SqLite, PostGis) you need to also referense the SharpMap.Extensions assembly For SqlServer Spatial you need to reference the SharpMap.SqlServerSpatial assemblyAJAX Control Toolkit: November 2011 Release: AJAX Control Toolkit Release Notes - November 2011 Release Version 51116November 2011 release of the AJAX Control Toolkit. AJAX Control Toolkit .NET 4 - Binary – AJAX Control Toolkit for .NET 4 and sample site (Recommended). AJAX Control Toolkit .NET 3.5 - Binary – AJAX Control Toolkit for .NET 3.5 and sample site (Recommended). Notes: - The current version of the AJAX Control Toolkit is not compatible with ASP.NET 2.0. The latest version that is compatible with ASP.NET 2.0 can be found h...Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.36: Fix for issue #16908: string literals containing ASP.NET replacement syntax fail if the ASP.NET code contains the same character as the string literal delimiter. Also, we shouldn't be changing the delimiter for those literals or combining them with other literals; the developer may have specifically chosen the delimiter used because of possible content inserted by ASP.NET code. This logic is normally off; turn it on via the -aspnet command-line flag (or the Code.Settings.AllowEmbeddedAspNetBl...MVC Controls Toolkit: Mvc Controls Toolkit 1.5.5: Added: Now the DateRanteAttribute accepts complex expressions containing "Now" and "Today" as static minimum and maximum. Menu, MenuFor helpers capable of handling a "currently selected element". The developer can choose between using a standard nested menu based on a standard SimpleMenuItem class or specifying an item template based on a custom class. Added also helpers to build the tree structure containing all data items the menu takes infos from. Improved the pager. Now the developer ...New ProjectsActiveWorlds World Server Admin PowerShell SnapIn: The purpose of this PowerShell SnapIn is to provide a set of tools to administer the world server from PowerShell. It leverages the ActiveWorlds SDK .NET Wrapper to provide this functionality.Aigu: Enter special characters like you would on your mobile phone. For instance, if you want to type 'é', you just hold down 'e' and a menu will appear. Selected the desired character using the arrow keys and press 'enter'. Simple but powerful.Are you workaholic?: Are you a workaholic? Did your Doctor advice you not to stare at the computer monitor for a long time? Then this app is perfectly made for you. It runs in the background, and alerts you to take periodic rests for your eyes and body. What's more, It's open source (MS-PL).ATDIS PoC: privateAuto Version Web Assets: The AVWA project is an HTTP Module written in C# that is designed to allow for versioning of various web assets such as .CSS and .JS files. This allows you to publish new versions of these files without having to force the server or the client browsers to expire cache.Bachelor Thesis Algorithm Test Bed: Algorithm Test Bed for my Bachelor ThesisBase64: Simple application helps converting strings and files from or to Base64 string. You can use any encoding to convert while a sidebar previews decoded string for all other encodings.BoracayExpress: BoracayExpressC++ Framework for Test Driven Development: A testing framework for C++ written in C++.Class2Table: Class2Table aka Entity2Table. Easy tool that allows creation of SQL tables from .Net types.Code for Demos & Experiments: This is where I will post code from demos and presentationsCodeMaker: CodeMaker?????????： 1、?????????? 2、???? 3、????? 4、??Python????????? ConsoleCommand: ConsoleCommand provides certain .Net commands for access from javascript console engines. Included are commands to set the text and background colors, as well as list and extract resources compiled in a .Net dll. Converter: Character code conversion tools ???????? CryptoInator - self contained, self-encrypting, self-decrypting image viewer: Original developed to encrypt and store NemID images in Denmark. DAiBears: Something, something, botDelicious Notify Plugin: Lets you push a blog post straight to Delicious from Live WriterDeveloperFile: Compresses Javascripts using the YUI .NET project. Loops through the root folder and subfolders for files matching the debug extension and creates new files using the release extension. (File extensions must match exactly).DotNetNuke SharePoint File Explorer: A DotNetNuke SharePoint File ExplorerDouban FM: WP7 Douban FM appGame Lib: Game Library is a open-source game library to allow focusing on the fun part of a game. It is developed in C#, but will be ported to C++ and VB.net.Google reader notes to Delicious Export tool (WPF): Google reader discontinued note in reader features. Current google reader allows to export users old notes in JSON format, This App will parse the JSON file & upload it to it delicious , delicious is a good alternative for note in readerHtml Source Transmitter Control: This web control allows getting a source of a web page, that will displayed before submit. So, developer can store a view of the html page, that was before server exception. It helps to reproduce bugs and can be used with other logging systems.Ideopuzzle: A puzzle gameImageShack-Uploader: This project demonstrates how to upload files automated to imageshack.us and other image hosters with C#.Insert Acronym Tags: Lets you insert <acronym> and <abbr> tags into your blog entry more easily.Insert Quick Link: Allows you to paste a link into the Writer window and have the a window similar to the one in Writer where you can change what text is to appear, open in new window, etc.Insert Video Plugin: Allows you to insert a video into a blog entry from a multitude of different sitesIoCWrap: Provides a wrapper to the various IoC container implementations so that it is possible to switch to a different provider without changing any application code.kaveepoj: sharepoint projectKinect Quiz Engine: Fun quiz game for the Kinect.Klaverjas: Test application for testing different new technologies in .NET (WCF, DataServices, C# stuff, Entity...etc.)Man In The Middle: A cyberpunk themed action with puzzle and strategy elements. Made with XNA as part of a game development course at the IT University of Copenhagen by Bo Bendtsen, Jonas Flensbak, Daniel Kromand, Jess Rahbek & Darryl Woodford.MediaSelektor: Simple tool to select mediasMicajah Mindtouch Deki Wiki Copier: Small C# application to move data between 2 Deki Wiki installs or, more importantly, from a wik.is account to a locally installed systemMineFlagger: MineFlagger is a mine clearing game modeled after Microsoft’s Minesweeper. In addition to standard play, MineFlagger incorporates an AI for fun and training.myXbyqwrhjadsfasfhgf: myXbyqwrhjadsfasfhgfnatoop: natoopNauplius.KeyStore: Provides secure application key storage backed by SQL 2008 and Active Directory.ObjectDB: An object database written using C# 4 and Mono.Cecil.PaceR: PaceR is an attempt to encapsulate a lot of the common code functionality I use on different projects. Instead of recreating functionality from memory or worse, copying from older projects, I'd like to have a central location to maintain this common code. Parseq: Parseq is a Parser Combinator library written in C# (version 2.0).PowerShell Network Adapter Configuration module: PowerShell Network Adapter Configuration module is a PowerShell module which provides functions for managing network adapters using WMI.public traffic tracker: This is a university project for a .net course. We develop a public traffic tracker applications for Windows Phone 7 devices, that can give information about the actual positions of the nearest vehicle on a given line. The speciality is that we use only the GPS information of the users' WP7 devices, so this is a completely software solution without any hardware investment. The disatvantage is that for the real operation we would need a lot of active WP7 user.puyo: puyoRadioTroll: Projeto web Radio TrollRead Feed Community: Read Feed CommunityReviewer: Reviewer.dk - Dansk spil og anmeldelsessite.Rollout Sharepoint Solutions - ROSS: ROSS performs the following actions: - Delete sitecollection and restart services - 'Get Latest Version' from SourceSafe - Rebuild Solution - Install all wsp solutions - Create SiteCollections - Check for build en provisioning errors - Send email to developers if errors occurredSchool Management: school managementSQL File Executer: This project is a class library written in c# which is used for executing *.sql files in remote server. Simply one dll file. You include it in your web project, add using statement at the top of your page, pass the parameters inside. Rest, it will do.Startup Manager: Startup Manager launches all startup programs at a managed rate therefore meaning that your computer doesn't crash everytime it starts up and you can use it immediately.stetic: ...Test Infrastructure Guidance: The purpose of this project is to provide guidance to testers in using TFS effectively as an ALM solution. TFS is much more than a simple code repository. Used with Visual Studio it can form a powerful testing solution and remove a lot of pain in dealing with test infrastructure overhead.Tête-à-tête: Tete-a-tete is an address book with a built-in function to send electronic mail over the Internet.Tipeysh! - Add-in that helps you creating C/C++ header files on a single click: Are you also feel miserable when you need to create a new header file in your Visual Studio C/C++ project? Repeatedly choosing "new header file", then writing the annoying (but needed) "#ifndef" section, then writing the class name with it's "private", "protected" and "public" access modifiers... too much clicks and typewriting! Well, there is a solution: Tipeysh! is a simple, easy to use, very handy and configurable Visual Studio Add-In, compatible for both the 2005 and 2008 versions. Once ...UMN Dashboard Project: academic projUsersMOSS: UsersMOSS est une petite application permettant de consulter sur un serveur MOSS les sites web (SPWeb) les users (SPUser), et les groupes (SPGroup). Cette application utilise le modèle objet de MOSS pour inspecter le contenu des objets d'un serveur MOSS. Cette application est loin d'être professionnelle, ou même terminée, mais elle me rend très souvent service. Surtout ne l'utilisez pas sur un serveur de production car le gestion du GC n'est pas faite, ce qui peut provoquer des plantages de v...UtilityLibrary.Win32: UtilityLibrary.Win32UW iLearn: The iLearn activity inference platform is a suite of desktop and mobile tools for logging, modeling, and classifying sensor data for mobile devices. It was created at the University of Washington.VsDocGen: Dynamic javascript documentation generation directly from xml comment documented source code.Windows Live Spaces Photo Album plugin: This is going to be a plugin for Windows Live Writer that will allow you to browse a Windows Live Space Photo Album.Windows Live Writer Plugin for Amazon Books using CueCat: This Windows Live Writer Plugin is for users who use WLW and wish to use their CueCat to scan books. ItemLookups are run against Amazon via its AWS and book image, title, author, and publisher is returned. This project was first created by Scott Hanselman on MSDN's Coding4Fun! X7: X7 makes it easier for win7user to clean the system. You'll no longer have to delete useless stuff in your win7. It's developed in bat.xDT - Commander: Using this application, the user can assign shortcuts (short texts) for various links/URLs. These short texts will be typed into a Textbox to then launch/go to the target (similar to the "Run" program in Windows).

Read the article
ANTS Memory Profiler 7.0 Review

- by Michael B. McLaughlin

(This is my first review as a part of the GeeksWithBlogs.net Influencers program. It’s a program in which I (and the others who have been selected for it) get the opportunity to check out new products and services and write reviews about them. We don’t get paid for this, but we do generally get to keep a copy of the software or retain an account for some period of time on the service that we review. In this case I received a copy of Red Gate Software’s ANTS Memory Profiler 7.0, which was released in January. I don’t have any upgrade rights nor is my review guided, restrained, influenced, or otherwise controlled by Red Gate or anyone else. But I do get to keep the software license. I will always be clear about what I received whenever I do a review – I leave it up to you to decide whether you believe I can be objective. I believe I can be. If I used something and really didn’t like it, keeping a copy of it wouldn’t be worth anything to me. In that case though, I would simply uninstall/deactivate/whatever the software or service and tell the company what I didn’t like about it so they could (hopefully) make it better in the future. I don’t think it’d be polite to write up a terrible review, nor do I think it would be a particularly good use of my time. There are people who get paid for a living to review things, so I leave it to them to tell you what they think is bad and why. I’ll only spend my time telling you about things I think are good.) Overview of Common .NET Memory Problems When coming to land of managed memory from the wilds of unmanaged code, it’s easy to say to one’s self, “Wow! Now I never have to worry about memory problems again!” But this simply isn’t true. Managed code environments, such as .NET, make many, many things easier. You will never have to worry about memory corruption due to a bad pointer, for example (unless you’re working with unsafe code, of course). But managed code has its own set of memory concerns. For example, failing to unsubscribe from events when you are done with them leaves the publisher of an event with a reference to the subscriber. If you eliminate all your own references to the subscriber, then that memory is effectively lost since the GC won’t delete it because of the publishing object’s reference. When the publishing object itself becomes subject to garbage collection then you’ll get that memory back finally, but that could take a very long time depending of the life of the publisher. Another common source of resource leaks is failing to properly release unmanaged resources. When writing a class that contains members that hold unmanaged resources (e.g. any of the Stream-derived classes, IsolatedStorageFile, most classes ending in “Reader” or “Writer”), you should always implement IDisposable, making sure to use a properly written Dispose method. And when you are using an instance of a class that implements IDisposable, you should always make sure to use a 'using' statement in order to ensure that the object’s unmanaged resources are disposed of properly. (A ‘using’ statement is a nicer, cleaner looking, and easier to use version of a try-finally block. The compiler actually translates it as though it were a try-finally block. Note that Code Analysis warning 2202 (CA2202) will often be triggered by nested using blocks. A properly written dispose method ensures that it only runs once such that calling dispose multiple times should not be a problem. Nonetheless, CA2202 exists and if you want to avoid triggering it then you should write your code such that only the innermost IDisposable object uses a ‘using’ statement, with any outer code making use of appropriate try-finally blocks instead). Then, of course, there are situations where you are operating in a memory-constrained environment or else you want to limit or even eliminate allocations within a certain part of your program (e.g. within the main game loop of an XNA game) in order to avoid having the GC run. On the Xbox 360 and Windows Phone 7, for example, for every 1 MB of heap allocations you make, the GC runs; the added time of a GC collection can cause a game to drop frames or run slowly thereby making it look bad. Eliminating allocations (or else minimizing them and calling an explicit Collect at an appropriate time) is a common way of avoiding this (the other way is to simplify your heap so that the GC’s latency is low enough not to cause performance issues). ANTS Memory Profiler 7.0 When the opportunity to review Red Gate’s recently released ANTS Memory Profiler 7.0 arose, I jumped at it. In order to review it, I was given a free copy (which does not include upgrade rights for future versions) which I am allowed to keep. For those of you who are familiar with ANTS Memory Profiler, you can find a list of new features and enhancements here. If you are an experienced .NET developer who is familiar with .NET memory management issues, ANTS Memory Profiler is great. More importantly still, if you are new to .NET development or you have no experience or limited experience with memory profiling, ANTS Memory Profiler is awesome. From the very beginning, it guides you through the process of memory profiling. If you’re experienced and just want dive in however, it doesn’t get in your way. The help items GAHSFLASHDAJLDJA are well designed and located right next to the UI controls so that they are easy to find without being intrusive. When you first launch it, it presents you with a “Getting Started” screen that contains links to “Memory profiling video tutorials”, “Strategies for memory profiling”, and the “ANTS Memory Profiler forum”. I’m normally the kind of person who looks at a screen like that only to find the “Don’t show this again” checkbox. Since I was doing a review, though, I decided I should examine them. I was pleasantly surprised. The overview video clocks in at three minutes and fifty seconds. It begins by showing you how to get started profiling an application. It explains that profiling is done by taking memory snapshots periodically while your program is running and then comparing them. ANTS Memory Profiler (I’m just going to call it “ANTS MP” from here) analyzes these snapshots in the background while your application is running. It briefly mentions a new feature in Version 7, a new API that give you the ability to trigger snapshots from within your application’s source code (more about this below). You can also, and this is the more common way you would do it, take a memory snapshot at any time from within the ANTS MP window by clicking the “Take Memory Snapshot” button in the upper right corner. The overview video goes on to demonstrate a basic profiling session on an application that pulls information from a database and displays it. It shows how to switch which snapshots you are comparing, explains the different sections of the Summary view and what they are showing, and proceeds to show you how to investigate memory problems using the “Instance Categorizer” to track the path from an object (or set of objects) to the GC’s root in order to find what things along the path are holding a reference to it/them. For a set of objects, you can then click on it and get the “Instance List” view. This displays all of the individual objects (including their individual sizes, values, etc.) of that type which share the same path to the GC root. You can then click on one of the objects to generate an “Instance Retention Graph” view. This lets you track directly up to see the reference chain for that individual object. In the overview video, it turned out that there was an event handler which was holding on to a reference, thereby keeping a large number of strings that should have been freed in memory. Lastly the video shows the “Class List” view, which lets you dig in deeply to find problems that might not have been clear when following the previous workflow. Once you have at least one memory snapshot you can begin analyzing. The main interface is in the “Analysis” tab. You can also switch to the “Session Overview” tab, which gives you several bar charts highlighting basic memory data about the snapshots you’ve taken. If you hover over the individual bars (and the individual colors in bars that have more than one), you will see a detailed text description of what the bar is representing visually. The Session Overview is good for a quick summary of memory usage and information about the different heaps. You are going to spend most of your time in the Analysis tab, but it’s good to remember that the Session Overview is there to give you some quick feedback on basic memory usage stats. As described above in the summary of the overview video, there is a certain natural workflow to the Analysis tab. You’ll spin up your application and take some snapshots at various times such as before and after clicking a button to open a window or before and after closing a window. Taking these snapshots lets you examine what is happening with memory. You would normally expect that a lot of memory would be freed up when closing a window or exiting a document. By taking snapshots before and after performing an action like that you can see whether or not the memory is really being freed. If you already know an area that’s giving you trouble, you can run your application just like normal until just before getting to that part and then you can take a few strategic snapshots that should help you pin down the problem. Something the overview didn’t go into is how to use the “Filters” section at the bottom of ANTS MP together with the Class List view in order to narrow things down. The video tutorials page has a nice 3 minute intro video called “How to use the filters”. It’s a nice introduction and covers some of the basics. I’m going to cover a bit more because I think they’re a really neat, really helpful feature. Large programs can bring up thousands of classes. Even simple programs can instantiate far more classes than you might realize. In a basic .NET 4 WPF application for example (and when I say basic, I mean just MainWindow.xaml with a button added to it), the unfiltered Class List view will have in excess of 1000 classes (my simple test app had anywhere from 1066 to 1148 classes depending on which snapshot I was using as the “Current” snapshot). This is amazing in some ways as it shows you how in stark detail just how immensely powerful the WPF framework is. But hunting through 1100 classes isn’t productive, no matter how cool it is that there are that many classes instantiated and doing all sorts of awesome things. Let’s say you wanted to examine just the classes your application contains source code for (in my simple example, that would be the MainWindow and App). Under “Basic Filters”, click on “Classes with source” under “Show only…”. Voilà. Down from 1070 classes in the snapshot I was using as “Current” to 2 classes. If you then click on a class’s name, it will show you (to the right of the class name) two little icon buttons. Hover over them and you will see that you can click one to view the Instance Categorizer for the class and another to view the Instance List for the class. You can also show classes based on which heap they live on. If you chose both a Baseline snapshot and a Current snapshot then you can use the “Comparing snapshots” filters to show only: “New objects”; “Surviving objects”; “Survivors in growing classes”; or “Zombie objects” (if you aren’t sure what one of these means, you can click the helpful “?” in a green circle icon to bring up a popup that explains them and provides context). Remember that your selection(s) under the “Show only…” heading will still apply, so you should update those selections to make sure you are seeing the view you want. There are also links under the “What is my memory problem?” heading that can help you diagnose the problems you are seeing including one for “I don’t know which kind I have” for situations where you know generally that your application has some problems but aren’t sure what the behavior you have been seeing (OutOfMemoryExceptions, continually growing memory usage, larger memory use than expected at certain points in the program). The Basic Filters are not the only filters there are. “Filter by Object Type” gives you the ability to filter by: “Objects that are disposable”; “Objects that are/are not disposed”; “Objects that are/are not GC roots” (GC roots are things like static variables); and “Objects that implement _______”. “Objects that implement” is particularly neat. Once you check the box, you can then add one or more classes and interfaces that an object must implement in order to survive the filtering. Lastly there is “Filter by Reference”, which gives you the option to pare down the list based on whether an object is “Kept in memory exclusively by” a particular item, a class/interface, or a namespace; whether an object is “Referenced by” one or more of those choices; and whether an object is “Never referenced by” one or more of those choices. Remember that filtering is cumulative, so anything you had set in one of the filter sections still remains in effect unless and until you go back and change it. There’s quite a bit more to ANTS MP – it’s a very full featured product – but I think I touched on all of the most significant pieces. You can use it to debug: a .NET executable; an ASP.NET web application (running on IIS); an ASP.NET web application (running on Visual Studio’s built-in web development server); a Silverlight 4 browser application; a Windows service; a COM+ server; and even something called an XBAP (local XAML browser application). You can also attach to a .NET 4 process to profile an application that’s already running. The startup screen also has a large number of “Charting Options” that let you adjust which statistics ANTS MP should collect. The default selection is a good, minimal set. It’s worth your time to browse through the charting options to examine other statistics that may also help you diagnose a particular problem. The more statistics ANTS MP collects, the longer it will take to collect statistics. So just turning everything on is probably a bad idea. But the option to selectively add in additional performance counters from the extensive list could be a very helpful thing for your memory profiling as it lets you see additional data that might provide clues about a particular problem that has been bothering you. ANTS MP integrates very nicely with all versions of Visual Studio that support plugins (i.e. all of the non-Express versions). Just note that if you choose “Profile Memory” from the “ANTS” menu that it will launch profiling for whichever project you have set as the Startup project. One quick tip from my experience so far using ANTS MP: if you want to properly understand your memory usage in an application you’ve written, first create an “empty” version of the type of project you are going to profile (a WPF application, an XNA game, etc.) and do a quick profiling session on that so that you know the baseline memory usage of the framework itself. By “empty” I mean just create a new project of that type in Visual Studio then compile it and run it with profiling – don’t do anything special or add in anything (except perhaps for any external libraries you’re planning to use). The first thing I tried ANTS MP out on was a demo XNA project of an editor that I’ve been working on for quite some time that involves a custom extension to XNA’s content pipeline. The first time I ran it and saw the unmanaged memory usage I was convinced I had some horrible bug that was creating extra copies of texture data (the demo project didn’t have a lot of texture data so when I saw a lot of unmanaged memory I instantly figured I was doing something wrong). Then I thought to run an empty project through and when I saw that the amount of unmanaged memory was virtually identical, it dawned on me that the CLR itself sits in unmanaged memory and that (thankfully) there was nothing wrong with my code! Quite a relief. Earlier, when discussing the overview video, I mentioned the API that lets you take snapshots from within your application. I gave it a quick trial and it’s very easy to integrate and make use of and is a really nice addition (especially for projects where you want to know what, if any, allocations there are in a specific, complicated section of code). The only concern I had was that if I hadn’t watched the overview video I might never have known it existed. Even then it took me five minutes of hunting around Red Gate’s website before I found the “Taking snapshots from your code" article that explains what DLL you need to add as a reference and what method of what class you should call in order to take an automatic snapshot (including the helpful warning to wrap it in a try-catch block since, under certain circumstances, it can raise an exception, such as trying to call it more than 5 times in 30 seconds. The difficulty in discovering and then finding information about the automatic snapshots API was one thing I thought could use improvement. Another thing I think would make it even better would be local copies of the webpages it links to. Although I’m generally always connected to the internet, I imagine there are more than a few developers who aren’t or who are behind very restrictive firewalls. For them (and for me, too, if my internet connection happens to be down), it would be nice to have those documents installed locally or to have the option to download an additional “documentation” package that would add local copies. Another thing that I wish could be easier to manage is the Filters area. Finding and setting individual filters is very easy as is understanding what those filter do. And breaking it up into three sections (basic, by object, and by reference) makes sense. But I could easily see myself running a long profiling session and forgetting that I had set some filter a long while earlier in a different filter section and then spending quite a bit of time trying to figure out why some problem that was clearly visible in the data wasn’t showing up in, e.g. the instance list before remembering to check all the filters for that one setting that was only culling a few things from view. Some sort of indicator icon next to the filter section names that appears you have at least one filter set in that area would be a nice visual clue to remind me that “oh yeah, I told it to only show objects on the Gen 2 heap! That’s why I’m not seeing those instances of the SuperMagic class!” Something that would be nice (but that Red Gate cannot really do anything about) would be if this could be used in Windows Phone 7 development. If Microsoft and Red Gate could work together to make this happen (even if just on the WP7 emulator), that would be amazing. Especially given the memory constraints that apps and games running on mobile devices need to work within, a good memory profiler would be a phenomenally helpful tool. If anyone at Microsoft reads this, it’d be really great if you could make something like that happen. Perhaps even a (subsidized) custom version just for WP7 development. (For XNA games, of course, you can create a Windows version of the game and use ANTS MP on the Windows version in order to get a better picture of your memory situation. For Silverlight on WP7, though, there’s quite a bit of educated guess work and WeakReference creation followed by forced collections in order to find the source of a memory problem.) The only other thing I found myself wanting was a “Back” button. Between my Windows Phone 7, Zune, and other things, I’ve grown very used to having a “back stack” that lets me just navigate back to where I came from. The ANTS MP interface is surprisingly easy to use given how much it lets you do, and once you start using it for any amount of time, you learn all of the different areas such that you know where to go. And it does remember the state of the areas you were previously in, of course. So if you go to, e.g., the Instance Retention Graph from the Class List and then return back to the Class List, it will remember which class you had selected and all that other state information. Still, a “Back” button would be a welcome addition to a future release. Bottom Line ANTS Memory Profiler is not an inexpensive tool. But my time is valuable. I can easily see ANTS MP saving me enough time tracking down memory problems to justify it on a cost basis. More importantly to me, knowing what is happening memory-wise in my programs and having the confidence that my code doesn’t have any hidden time bombs in it that will cause it to OOM if I leave it running for longer than I do when I spin it up real quickly for debugging or just to see how a new feature looks and feels is a good feeling. It’s a feeling that I like having and want to continue to have. I got the current version for free in order to review it. Having done so, I’ve now added it to my must-have tools and will gladly lay out the money for the next version when it comes out. It has a 14 day free trial, so if you aren’t sure if it’s right for you or if you think it seems interesting but aren’t really sure if it’s worth shelling out the money for it, give it a try.

Read the article
Much Ado About Nothing: Stub Objects

- by user9154181

The Solaris 11 link-editor (ld) contains support for a new type of object that we call a stub object. A stub object is a shared object, built entirely from mapfiles, that supplies the same linking interface as the real object, while containing no code or data. Stub objects cannot be executed — the runtime linker will kill any process that attempts to load one. However, you can link to a stub object as a dependency, allowing the stub to act as a proxy for the real version of the object. You may well wonder if there is a point to producing an object that contains nothing but linking interface. As it turns out, stub objects are very useful for building large bodies of code such as Solaris. In the last year, we've had considerable success in applying them to one of our oldest and thorniest build problems. In this discussion, I will describe how we came to invent these objects, and how we apply them to building Solaris. This posting explains where the idea for stub objects came from, and details our long and twisty journey from hallway idea to standard link-editor feature. I expect that these details are mainly of interest to those who work on Solaris and its makefiles, those who have done so in the past, and those who work with other similar bodies of code. A subsequent posting will omit the history and background details, and instead discuss how to build and use stub objects. If you are mainly interested in what stub objects are, and don't care about the underlying software war stories, I encourage you to skip ahead. The Long Road To Stubs This all started for me with an email discussion in May of 2008, regarding a change request that was filed in 2002, entitled: 4631488 lib/Makefile is too patient: .WAITs should be reduced This CR encapsulates a number of cronic issues with Solaris builds: We build Solaris with a parallel make (dmake) that tries to build as much of the code base in parallel as possible. There is a lot of code to build, and we've long made use of parallelized builds to get the job done quicker. This is even more important in today's world of massively multicore hardware. Solaris contains a large number of executables and shared objects. Executables depend on shared objects, and shared objects can depend on each other. Before you can build an object, you need to ensure that the objects it needs have been built. This implies a need for serialization, which is in direct opposition to the desire to build everying in parallel. To accurately build objects in the right order requires an accurate set of make rules defining the things that depend on each other. This sounds simple, but the reality is quite complex. In practice, having programmers explicitly specify these dependencies is a losing strategy: It's really hard to get right. It's really easy to get it wrong and never know it because things build anyway. Even if you get it right, it won't stay that way, because dependencies between objects can change over time, and make cannot help you detect such drifing. You won't know that you got it wrong until the builds break. That can be a long time after the change that triggered the breakage happened, making it hard to connect the cause and the effect. Usually this happens just before a release, when the pressure is on, its hard to think calmly, and there is no time for deep fixes. As a poor compromise, the libraries in core Solaris were built using a set of grossly incomplete hand written rules, supplemented with a number of dmake .WAIT directives used to group the libraries into sets of non-interacting groups that can be built in parallel because we think they don't depend on each other. From time to time, someone will suggest that we could analyze the built objects themselves to determine their dependencies and then generate make rules based on those relationships. This is possible, but but there are complications that limit the usefulness of that approach: To analyze an object, you have to build it first. This is a classic chicken and egg scenario. You could analyze the results of a previous build, but then you're not necessarily going to get accurate rules for the current code. It should be possible to build the code without having a built workspace available. The analysis will take time, and remember that we're constantly trying to make builds faster, not slower. By definition, such an approach will always be approximate, and therefore only incremantally more accurate than the hand written rules described above. The hand written rules are fast and cheap, while this idea is slow and complex, so we stayed with the hand written approach. Solaris was built that way, essentially forever, because these are genuinely difficult problems that had no easy answer. The makefiles were full of build races in which the right outcomes happened reliably for years until a new machine or a change in build server workload upset the accidental balance of things. After figuring out what had happened, you'd mutter "How did that ever work?", add another incomplete and soon to be inaccurate make dependency rule to the system, and move on. This was not a satisfying solution, as we tend to be perfectionists in the Solaris group, but we didn't have a better answer. It worked well enough, approximately. And so it went for years. We needed a different approach — a new idea to cut the Gordian Knot. In that discussion from May 2008, my fellow linker-alien Rod Evans had the initial spark that lead us to a game changing series of realizations: The link-editor is used to link objects together, but it only uses the ELF metadata in the object, consisting of symbol tables, ELF versioning sections, and similar data. Notably, it does not look at, or understand, the machine code that makes an object useful at runtime. If you had an object that only contained the ELF metadata for a dependency, but not the code or data, the link-editor would find it equally useful for linking, and would never know the difference. Call it a stub object. In the core Solaris OS, we require all objects to be built with a link-editor mapfile that describes all of its publically available functions and data. Could we build a stub object using the mapfile for the real object? It ought to be very fast to build stub objects, as there are no input objects to process. Unlike the real object, stub objects would not actually require any dependencies, and so, all of the stubs for the entire system could be built in parallel. When building the real objects, one could link against the stub objects instead of the real dependencies. This means that all the real objects can be built built in parallel too, without any serialization. We could replace a system that requires perfect makefile rules with a system that requires no ordering rules whatsoever. The results would be considerably more robust. We immediately realized that this idea had potential, but also that there were many details to sort out, lots of work to do, and that perhaps it wouldn't really pan out. As is often the case, it would be necessary to do the work and see how it turned out. Following that conversation, I set about trying to build a stub object. We determined that a faithful stub has to do the following: Present the same set of global symbols, with the same ELF versioning, as the real object. Functions are simple — it suffices to have a symbol of the right type, possibly, but not necessarily, referencing a null function in its text segment. Copy relocations make data more complicated to stub. The possibility of a copy relocation means that when you create a stub, the data symbols must have the actual size of the real data. Any error in this will go uncaught at link time, and will cause tragic failures at runtime that are very hard to diagnose. For reasons too obscure to go into here, involving tentative symbols, it is also important that the data reside in bss, or not, matching its placement in the real object. If the real object has more than one symbol pointing at the same data item, we call these aliased symbols. All data symbols in the stub object must exhibit the same aliasing as the real object. We imagined the stub library feature working as follows: A command line option to ld tells it to produce a stub rather than a real object. In this mode, only mapfiles are examined, and any object or shared libraries on the command line are are ignored. The extra information needed (function or data, size, and bss details) would be added to the mapfile. When building the real object instead of the stub, the extra information for building stubs would be validated against the resulting object to ensure that they match. In exploring these ideas, I immediately run headfirst into the reality of the original mapfile syntax, a subject that I would later write about as The Problem(s) With Solaris SVR4 Link-Editor Mapfiles. The idea of extending that poor language was a non-starter. Until a better mapfile syntax became available, which seemed unlikely in 2008, the solution could not involve extentions to the mapfile syntax. Instead, we cooked up the idea (hack) of augmenting mapfiles with stylized comments that would carry the necessary information. A typical definition might look like: # DATA(i386) __iob 0x3c0 # DATA(amd64,sparcv9) __iob 0xa00 # DATA(sparc) __iob 0x140 iob; A further problem then became clear: If we can't extend the mapfile syntax, then there's no good way to extend ld with an option to produce stub objects, and to validate them against the real objects. The idea of having ld read comments in a mapfile and parse them for content is an unacceptable hack. The entire point of comments is that they are strictly for the human reader, and explicitly ignored by the tool. Taking all of these speed bumps into account, I made a new plan: A perl script reads the mapfiles, generates some small C glue code to produce empty functions and data definitions, compiles and links the stub object from the generated glue code, and then deletes the generated glue code. Another perl script used after both objects have been built, to compare the real and stub objects, using data from elfdump, and validate that they present the same linking interface. By June 2008, I had written the above, and generated a stub object for libc. It was a useful prototype process to go through, and it allowed me to explore the ideas at a deep level. Ultimately though, the result was unsatisfactory as a basis for real product. There were so many issues: The use of stylized comments were fine for a prototype, but not close to professional enough for shipping product. The idea of having to document and support it was a large concern. The ideal solution for stub objects really does involve having the link-editor accept the same arguments used to build the real object, augmented with a single extra command line option. Any other solution, such as our prototype script, will require makefiles to be modified in deeper ways to support building stubs, and so, will raise barriers to converting existing code. A validation script that rederives what the linker knew when it built an object will always be at a disadvantage relative to the actual linker that did the work. A stub object should be identifyable as such. In the prototype, there was no tag or other metadata that would let you know that they weren't real objects. Being able to identify a stub object in this way means that the file command can tell you what it is, and that the runtime linker can refuse to try and run a program that loads one. At that point, we needed to apply this prototype to building Solaris. As you might imagine, the task of modifying all the makefiles in the core Solaris code base in order to do this is a massive task, and not something you'd enter into lightly. The quality of the prototype just wasn't good enough to justify that sort of time commitment, so I tabled the project, putting it on my list of long term things to think about, and moved on to other work. It would sit there for a couple of years. Semi-coincidentally, one of the projects I tacked after that was to create a new mapfile syntax for the Solaris link-editor. We had wanted to do something about the old mapfile syntax for many years. Others before me had done some paper designs, and a great deal of thought had already gone into the features it should, and should not have, but for various reasons things had never moved beyond the idea stage. When I joined Sun in late 2005, I got involved in reviewing those things and thinking about the problem. Now in 2008, fresh from relearning for the Nth time why the old mapfile syntax was a huge impediment to linker progress, it seemed like the right time to tackle the mapfile issue. Paving the way for proper stub object support was not the driving force behind that effort, but I certainly had them in mind as I moved forward. The new mapfile syntax, which we call version 2, integrated into Nevada build snv_135 in in February 2010: 6916788 ld version 2 mapfile syntax PSARC/2009/688 Human readable and extensible ld mapfile syntax In order to prove that the new mapfile syntax was adequate for general purpose use, I had also done an overhaul of the ON consolidation to convert all mapfiles to use the new syntax, and put checks in place that would ensure that no use of the old syntax would creep back in. That work went back into snv_144 in June 2010: 6916796 OSnet mapfiles should use version 2 link-editor syntax That was a big putback, modifying 517 files, adding 18 new files, and removing 110 old ones. I would have done this putback anyway, as the work was already done, and the benefits of human readable syntax are obvious. However, among the justifications listed in CR 6916796 was this We anticipate adding additional features to the new mapfile language that will be applicable to ON, and which will require all sharable object mapfiles to use the new syntax. I never explained what those additional features were, and no one asked. It was premature to say so, but this was a reference to stub objects. By that point, I had already put together a working prototype link-editor with the necessary support for stub objects. I was pleased to find that building stubs was indeed very fast. On my desktop system (Ultra 24), an amd64 stub for libc can can be built in a fraction of a second: % ptime ld -64 -z stub -o stubs/libc.so.1 -G -hlibc.so.1 \ -ztext -zdefs -Bdirect ... real 0.019708910 user 0.010101680 sys 0.008528431 In order to go from prototype to integrated link-editor feature, I knew that I would need to prove that stub objects were valuable. And to do that, I knew that I'd have to switch the Solaris ON consolidation to use stub objects and evaluate the outcome. And in order to do that experiment, ON would first need to be converted to version 2 mapfiles. Sub-mission accomplished. Normally when you design a new feature, you can devise reasonably small tests to show it works, and then deploy it incrementally, letting it prove its value as it goes. The entire point of stub objects however was to demonstrate that they could be successfully applied to an extremely large and complex code base, and specifically to solve the Solaris build issues detailed above. There was no way to finesse the matter — in order to move ahead, I would have to successfully use stub objects to build the entire ON consolidation and demonstrate their value. In software, the need to boil the ocean can often be a warning sign that things are trending in the wrong direction. Conversely, sometimes progress demands that you build something large and new all at once. A big win, or a big loss — sometimes all you can do is try it and see what happens. And so, I spent some time staring at ON makefiles trying to get a handle on how things work, and how they'd have to change. It's a big and messy world, full of complex interactions, unspecified dependencies, special cases, and knowledge of arcane makefile features... ...and so, I backed away, put it down for a few months and did other work... ...until the fall, when I felt like it was time to stop thinking and pondering (some would say stalling) and get on with it. Without stubs, the following gives a simplified high level view of how Solaris is built: An initially empty directory known as the proto, and referenced via the ROOT makefile macro is established to receive the files that make up the Solaris distribution. A top level setup rule creates the proto area, and performs operations needed to initialize the workspace so that the main build operations can be launched, such as copying needed header files into the proto area. Parallel builds are launched to build the kernel (usr/src/uts), libraries (usr/src/lib), and commands. The install makefile target builds each item and delivers a copy to the proto area. All libraries and executables link against the objects previously installed in the proto, implying the need to synchronize the order in which things are built. Subsequent passes run lint, and do packaging. Given this structure, the additions to use stub objects are: A new second proto area is established, known as the stub proto and referenced via the STUBROOT makefile macro. The stub proto has the same structure as the real proto, but is used to hold stub objects. All files in the real proto are delivered as part of the Solaris product. In contrast, the stub proto is used to build the product, and then thrown away. A new target is added to library Makefiles called stub. This rule builds the stub objects. The ld command is designed so that you can build a stub object using the same ld command line you'd use to build the real object, with the addition of a single -z stub option. This means that the makefile rules for building the stub objects are very similar to those used to build the real objects, and many existing makefile definitions can be shared between them. A new target is added to the Makefiles called stubinstall which delivers the stub objects built by the stub rule into the stub proto. These rules reuse much of existing plumbing used by the existing install rule. The setup rule runs stubinstall over the entire lib subtree as part of its initialization. All libraries and executables link against the objects in the stub proto rather than the main proto, and can therefore be built in parallel without any synchronization. There was no small way to try this that would yield meaningful results. I would have to take a leap of faith and edit approximately 1850 makefiles and 300 mapfiles first, trusting that it would all work out. Once the editing was done, I'd type make and see what happened. This took about 6 weeks to do, and there were many dark days when I'd question the entire project, or struggle to understand some of the many twisted and complex situations I'd uncover in the makefiles. I even found a couple of new issues that required changes to the new stub object related code I'd added to ld. With a substantial amount of encouragement and help from some key people in the Solaris group, I eventually got the editing done and stub objects for the entire workspace built. I found that my desktop system could build all the stub objects in the workspace in roughly a minute. This was great news, as it meant that use of the feature is effectively free — no one was likely to notice or care about the cost of building them. After another week of typing make, fixing whatever failed, and doing it again, I succeeded in getting a complete build! The next step was to remove all of the make rules and .WAIT statements dedicated to controlling the order in which libraries under usr/src/lib are built. This came together pretty quickly, and after a few more speed bumps, I had a workspace that built cleanly and looked like something you might actually be able to integrate someday. This was a significant milestone, but there was still much left to do. I turned to doing full nightly builds. Every type of build (open, closed, OpenSolaris, export, domestic) had to be tried. Each type failed in a new and unique way, requiring some thinking and rework. As things came together, I became aware of things that could have been done better, simpler, or cleaner, and those things also required some rethinking, the seeking of wisdom from others, and some rework. After another couple of weeks, it was in close to final form. My focus turned towards the end game and integration. This was a huge workspace, and needed to go back soon, before changes in the gate would made merging increasingly difficult. At this point, I knew that the stub objects had greatly simplified the makefile logic and uncovered a number of race conditions, some of which had been there for years. I assumed that the builds were faster too, so I did some builds intended to quantify the speedup in build time that resulted from this approach. It had never occurred to me that there might not be one. And so, I was very surprised to find that the wall clock build times for a stock ON workspace were essentially identical to the times for my stub library enabled version! This is why it is important to always measure, and not just to assume. One can tell from first principles, based on all those removed dependency rules in the library makefile, that the stub object version of ON gives dmake considerably more opportunities to overlap library construction. Some hypothesis were proposed, and shot down: Could we have disabled dmakes parallel feature? No, a quick check showed things being build in parallel. It was suggested that we might be I/O bound, and so, the threads would be mostly idle. That's a plausible explanation, but system stats didn't really support it. Plus, the timing between the stub and non-stub cases were just too suspiciously identical. Are our machines already handling as much parallelism as they are capable of, and unable to exploit these additional opportunities? Once again, we didn't see the evidence to back this up. Eventually, a more plausible and obvious reason emerged: We build the libraries and commands (usr/src/lib, usr/src/cmd) in parallel with the kernel (usr/src/uts). The kernel is the long leg in that race, and so, wall clock measurements of build time are essentially showing how long it takes to build uts. Although it would have been nice to post a huge speedup immediately, we can take solace in knowing that stub objects simplify the makefiles and reduce the possibility of race conditions. The next step in reducing build time should be to find ways to reduce or overlap the uts part of the builds. When that leg of the build becomes shorter, then the increased parallelism in the libs and commands will pay additional dividends. Until then, we'll just have to settle for simpler and more robust. And so, I integrated the link-editor support for creating stub objects into snv_153 (November 2010) with 6993877 ld should produce stub objects PSARC/2010/397 ELF Stub Objects followed by the work to convert the ON consolidation in snv_161 (February 2011) with 7009826 OSnet should use stub objects 4631488 lib/Makefile is too patient: .WAITs should be reduced This was a huge putback, with 2108 modified files, 8 new files, and 2 removed files. Due to the size, I was allowed a window after snv_160 closed in which to do the putback. It went pretty smoothly for something this big, a few more preexisting race conditions would be discovered and addressed over the next few weeks, and things have been quiet since then. Conclusions and Looking Forward Solaris has been built with stub objects since February. The fact that developers no longer specify the order in which libraries are built has been a big success, and we've eliminated an entire class of build error. That's not to say that there are no build races left in the ON makefiles, but we've taken a substantial bite out of the problem while generally simplifying and improving things. The introduction of a stub proto area has also opened some interesting new possibilities for other build improvements. As this article has become quite long, and as those uses do not involve stub objects, I will defer that discussion to a future article.

Read the article
Towards Database Continuous Delivery – What Next after Continuous Integration? A Checklist

- by Ben Rees

.dbd-banner p{ font-size:0.75em; padding:0 0 10px; margin:0 } .dbd-banner p span{ color:#675C6D; } .dbd-banner p:last-child{ padding:0; } @media ALL and (max-width:640px){ .dbd-banner{ background:#f0f0f0; padding:5px; color:#333; margin-top: 5px; } } -- Database delivery patterns & practices STAGE 4 AUTOMATED DEPLOYMENT If you’ve been fortunate enough to get to the stage where you’ve implemented some sort of continuous integration process for your database updates, then hopefully you’re seeing the benefits of that investment – constant feedback on changes your devs are making, advanced warning of data loss (prior to the production release on Saturday night!), a nice suite of automated tests to check business logic, so you know it’s going to work when it goes live, and so on. But what next? What can you do to improve your delivery process further, moving towards a full continuous delivery process for your database? In this article I describe some of the issues you might need to tackle on the next stage of this journey, and how to plan to overcome those obstacles before they appear. Our Database Delivery Learning Program consists of four stages, really three – source controlling a database, running continuous integration processes, then how to set up automated deployment (the middle stage is split in two – basic and advanced continuous integration, making four stages in total). If you’ve managed to work through the first three of these stages – source control, basic, then advanced CI, then you should have a solid change management process set up where, every time one of your team checks in a change to your database (whether schema or static reference data), this change gets fully tested automatically by your CI server. But this is only part of the story. Great, we know that our updates work, that the upgrade process works, that the upgrade isn’t going to wipe our 4Tb of production data with a single DROP TABLE. But – how do you get this (fully tested) release live? Continuous delivery means being always ready to release your software at any point in time. There’s a significant gap between your latest version being tested, and it being easily releasable. Just a quick note on terminology – there’s a nice piece here from Atlassian on the difference between continuous integration, continuous delivery and continuous deployment. This piece also gives a nice description of the benefits of continuous delivery. These benefits have been summed up by Jez Humble at Thoughtworks as: “Continuous delivery is a set of principles and practices to reduce the cost, time, and risk of delivering incremental changes to users” There’s another really useful piece here on Simple-Talk about the need for continuous delivery and how it applies to the database written by Phil Factor – specifically the extra needs and complexities of implementing a full CD solution for the database (compared to just implementing CD for, say, a web app). So, hopefully you’re convinced of moving on the the next stage! The next step after CI is to get some sort of automated deployment (or “release management”) process set up. But what should I do next? What do I need to plan and think about for getting my automated database deployment process set up? Can’t I just install one of the many release management tools available and hey presto, I’m ready! If only it were that simple. Below I list some of the areas that it’s worth spending a little time on, where a little planning and prep could go a long way. It’s also worth pointing out, that this should really be an evolving process. Depending on your starting point of course, it can be a long journey from your current setup to a full continuous delivery pipeline. If you’ve got a CI mechanism in place, you’re certainly a long way down that path. Nevertheless, we’d recommend evolving your process incrementally. Pages 157 and 129-141 of the book on Continuous Delivery (by Jez Humble and Dave Farley) have some great guidance on building up a pipeline incrementally: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912 For now, in this post, we’ll look at the following areas for your checklist: You and Your Team Environments The Deployment Process Rollback and Recovery Development Practices You and Your Team It’s a cliché in the DevOps community that “It’s not all about processes and tools, really it’s all about a culture”. As stated in this DevOps report from Puppet Labs: “DevOps processes and tooling contribute to high performance, but these practices alone aren’t enough to achieve organizational success. The most common barriers to DevOps adoption are cultural: lack of manager or team buy-in, or the value of DevOps isn’t understood outside of a specific group”. Like most clichés, there’s truth in there – if you want to set up a database continuous delivery process, you need to get your boss, your department, your company (if relevant) onside. Why? Because it’s an investment with the benefits coming way down the line. But the benefits are huge – for HP, in the book A Practical Approach to Large-Scale Agile Development: How HP Transformed LaserJet FutureSmart Firmware, these are summarized as: -2008 to present: overall development costs reduced by 40% -Number of programs under development increased by 140% -Development costs per program down 78% -Firmware resources now driving innovation increased by a factor of 8 (from 5% working on new features to 40% But what does this mean? It means that, when moving to the next stage, to make that extra investment in automating your deployment process, it helps a lot if everyone is convinced that this is a good thing. That they understand the benefits of automated deployment and are willing to make the effort to transform to a new way of working. Incidentally, if you’re ever struggling to convince someone of the value I’d strongly recommend just buying them a copy of this book – a great read, and a very practical guide to how it can really work at a large org. I’ve spoken to many customers who have implemented database CI who describe their deployment process as “The point where automation breaks down. Up to that point, the CI process runs, untouched by human hand, but as soon as that’s finished we revert to manual.” This deployment process can involve, for example, a DBA manually comparing an environment (say, QA) to production, creating the upgrade scripts, reading through them, checking them against an Excel document emailed to him/her the night before, turning to page 29 in his/her notebook to double-check how replication is switched off and on for deployments, and so on and so on. Painful, error-prone and lengthy. But the point is, if this is something like your deployment process, telling your DBA “We’re changing everything you do and your toolset next week, to automate most of your role – that’s okay isn’t it?” isn’t likely to go down well. There’s some work here to bring him/her onside – to explain what you’re doing, why there will still be control of the deployment process and so on. Or of course, if you’re the DBA looking after this process, you have to do a similar job in reverse. You may have researched and worked out how you’d like to change your methodology to start automating your painful release process, but do the dev team know this? What if they have to start producing different artifacts for you? Will they be happy with this? Worth talking to them, to find out. As well as talking to your DBA/dev team, the other group to get involved before implementation is your manager. And possibly your manager’s manager too. As mentioned, unless there’s buy-in “from the top”, you’re going to hit problems when the implementation starts to get rocky (and what tool/process implementations don’t get rocky?!). You need to have support from someone senior in your organisation – someone you can turn to when you need help with a delayed implementation, lack of resources or lack of progress. Actions: Get your DBA involved (or whoever looks after live deployments) and discuss what you’re planning to do or, if you’re the DBA yourself, get the dev team up-to-speed with your plans, Get your boss involved too and make sure he/she is bought in to the investment. Environments Where are you going to deploy to? And really this question is – what environments do you want set up for your deployment pipeline? Assume everyone has “Production”, but do you have a QA environment? Dedicated development environments for each dev? Proper pre-production? I’ve seen every setup under the sun, and there is often a big difference between “What we want, to do continuous delivery properly” and “What we’re currently stuck with”. Some of these differences are: What we want What we’ve got Each developer with their own dedicated database environment A single shared “development” environment, used by everyone at once An Integration box used to test the integration of all check-ins via the CI process, along with a full suite of unit-tests running on that machine In fact if you have a CI process running, you’re likely to have some sort of integration server running (even if you don’t call it that!). Whether you have a full suite of unit tests running is a different question… Separate QA environment used explicitly for manual testing prior to release “We just test on the dev environments, or maybe pre-production” A proper pre-production (or “staging”) box that matches production as closely as possible Hopefully a pre-production box of some sort. But does it match production closely!? A production environment reproducible from source control A production box which has drifted significantly from anything in source control The big question is – how much time and effort are you going to invest in fixing these issues? In reality this just involves figuring out which new databases you’re going to create and where they’ll be hosted – VMs? Cloud-based? What about size/data issues – what data are you going to include on dev environments? Does it need to be masked to protect access to production data? And often the amount of work here really depends on whether you’re working on a new, greenfield project, or trying to update an existing, brownfield application. There’s a world if difference between starting from scratch with 4 or 5 clean environments (reproducible from source control of course!), and trying to re-purpose and tweak a set of existing databases, with all of their surrounding processes and quirks. But for a proper release management process, ideally you have: Dedicated development databases, An Integration server used for testing continuous integration and running unit tests. [NB: This is the point at which deployments are automatic, without human intervention. Each deployment after this point is a one-click (but human) action], QA – QA engineers use a one-click deployment process to automatically* deploy chosen releases to QA for testing, Pre-production. The environment you use to test the production release process, Production. * A note on the use of the word “automatic” – when carrying out automated deployments this does not mean that the deployment is happening without human intervention (i.e. that something is just deploying over and over again). It means that the process of carrying out the deployment is automatic in that it’s not a person manually running through a checklist or set of actions. The deployment still requires a single-click from a user. Actions: Get your environments set up and ready, Set access permissions appropriately, Make sure everyone understands what the environments will be used for (it’s not a “free-for-all” with all environments to be accessed, played with and changed by development). The Deployment Process As described earlier, most existing database deployment processes are pretty manual. The following is a description of a process we hear very often when we ask customers “How do your database changes get live? How does your manual process work?” Check pre-production matches production (use a schema compare tool, like SQL Compare). Sometimes done by taking a backup from production and restoring in to pre-prod, Again, use a schema compare tool to find the differences between the latest version of the database ready to go live (i.e. what the team have been developing). This generates a script, User (generally, the DBA), reviews the script. This often involves manually checking updates against a spreadsheet or similar, Run the script on pre-production, and check there are no errors (i.e. it upgrades pre-production to what you hoped), If all working, run the script on production.* * this assumes there’s no problem with production drifting away from pre-production in the interim time period (i.e. someone has hacked something in to the production box without going through the proper change management process). This difference could undermine the validity of your pre-production deployment test. Red Gate is currently working on a free tool to detect this problem – sign up here at www.sqllighthouse.com, if you’re interested in testing early versions. There are several variations on this process – some better, some much worse! How do you automate this? In particular, step 3 – surely you can’t automate a DBA checking through a script, that everything is in order!? The key point here is to plan what you want in your new deployment process. There are so many options. At one extreme, pure continuous deployment – whenever a dev checks something in to source control, the CI process runs (including extensive and thorough testing!), before the deployment process keys in and automatically deploys that change to the live box. Not for the faint hearted – and really not something we recommend. At the other extreme, you might be more comfortable with a semi-automated process – the pre-production/production matching process is automated (with an error thrown if these environments don’t match), followed by a manual intervention, allowing for script approval by the DBA. One he/she clicks “Okay, I’m happy for that to go live”, the latter stages automatically take the script through to live. And anything in between of course – and other variations. But we’d strongly recommended sitting down with a whiteboard and your team, and spending a couple of hours mapping out “What do we do now?”, “What do we actually want?”, “What will satisfy our needs for continuous delivery, but still maintaining some sort of continuous control over the process?” NB: Most of what we’re discussing here is about production deployments. It’s important to note that you will also need to map out a deployment process for earlier environments (for example QA). However, these are likely to be less onerous, and many customers opt for a much more automated process for these boxes. Actions: Sit down with your team and a whiteboard, and draw out the answers to the questions above for your production deployments – “What do we do now?”, “What do we actually want?”, “What will satisfy our needs for continuous delivery, but still maintaining some sort of continuous control over the process?” Repeat for earlier environments (QA and so on). Rollback and Recovery If only every deployment went according to plan! Unfortunately they don’t – and when things go wrong, you need a rollback or recovery plan for what you’re going to do in that situation. Once you move in to a more automated database deployment process, you’re far more likely to be deploying more frequently than before. No longer once every 6 months, maybe now once per week, or even daily. Hence the need for a quick rollback or recovery process becomes paramount, and should be planned for. NB: These are mainly scenarios for handling rollbacks after the transaction has been committed. If a failure is detected during the transaction, the whole transaction can just be rolled back, no problem. There are various options, which we’ll explore in subsequent articles, things like: Immediately restore from backup, Have a pre-tested rollback script (remembering that really this is a “roll-forward” script – there’s not really such a thing as a rollback script for a database!) Have fallback environments – for example, using a blue-green deployment pattern. Different options have pros and cons – some are easier to set up, some require more investment in infrastructure; and of course some work better than others (the key issue with using backups, is loss of the interim transaction data that has been added between the failed deployment and the restore). The best mechanism will be primarily dependent on how your application works and how much you need a cast-iron failsafe mechanism. Actions: Work out an appropriate rollback strategy based on how your application and business works, your appetite for investment and requirements for a completely failsafe process. Development Practices This is perhaps the more difficult area for people to tackle. The process by which you can deploy database updates is actually intrinsically linked with the patterns and practices used to develop that database and linked application. So you need to decide whether you want to implement some changes to the way your developers actually develop the database (particularly schema changes) to make the deployment process easier. A good example is the pattern “Branch by abstraction”. Explained nicely here, by Martin Fowler, this is a process that can be used to make significant database changes (e.g. splitting a table) in a step-wise manner so that you can always roll back, without data loss – by making incremental updates to the database backward compatible. Slides 103-108 of the following slidedeck, from Niek Bartholomeus explain the process: https://speakerdeck.com/niekbartho/orchestration-in-meatspace As these slides show, by making a significant schema change in multiple steps – where each step can be rolled back without any loss of new data – this affords the release team the opportunity to have zero-downtime deployments with considerably less stress (because if an increment goes wrong, they can roll back easily). There are plenty more great patterns that can be implemented – the book Refactoring Databases, by Scott Ambler and Pramod Sadalage is a great read, if this is a direction you want to go in: http://www.amazon.com/Refactoring-Databases-Evolutionary-paperback-Addison-Wesley/dp/0321774515 But the question is – how much of this investment are you willing to make? How often are you making significant schema changes that would require these best practices? Again, there’s a difference here between migrating old projects and starting afresh – with the latter it’s much easier to instigate best practice from the start. Actions: For your business, work out how far down the path you want to go, amending your database development patterns to “best practice”. It’s a trade-off between implementing quality processes, and the necessity to do so (depending on how often you make complex changes). Socialise these changes with your development group. No-one likes having “best practice” changes imposed on them, so good to introduce these ideas and the rationale behind them early. Summary The next stages of implementing a continuous delivery pipeline for your database changes (once you have CI up and running) require a little pre-planning, if you want to get the most out of the work, and for the implementation to go smoothly. We’ve covered some of the checklist of areas to consider – mainly in the areas of “Getting the team ready for the changes that are coming” and “Planning our your pipeline, environments, patterns and practices for development”, though there will be more detail, depending on where you’re coming from – and where you want to get to. This article is part of our database delivery patterns & practices series on Simple Talk. Find more articles for version control, automated testing, continuous integration & deployment.

Read the article
Informed TDD – Kata “To Roman Numerals”

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons. PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

Read the article
ZFS for Database Log Files

- by user12620111

I've been troubled by drop outs in CPU usage in my application server, characterized by the CPUs suddenly going from close to 90% CPU busy to almost completely CPU idle for a few seconds. Here is an example of a drop out as shown by a snippet of vmstat data taken while the application server is under a heavy workload. # vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s3 s4 s5 s6 in sy cs us sy id 1 0 0 130160176 116381952 0 16 0 0 0 0 0 0 0 0 0 207377 117715 203884 70 21 9 12 0 0 130160160 116381936 0 25 0 0 0 0 0 0 0 0 0 200413 117162 197250 70 20 9 11 0 0 130160176 116381920 0 16 0 0 0 0 0 0 1 0 0 203150 119365 200249 72 21 7 8 0 0 130160176 116377808 0 19 0 0 0 0 0 0 0 0 0 169826 96144 165194 56 17 27 0 0 0 130160176 116377800 0 16 0 0 0 0 0 0 0 0 1 10245 9376 9164 2 1 97 0 0 0 130160176 116377792 0 16 0 0 0 0 0 0 0 0 2 15742 12401 14784 4 1 95 0 0 0 130160176 116377776 2 16 0 0 0 0 0 0 1 0 0 19972 17703 19612 6 2 92 14 0 0 130160176 116377696 0 16 0 0 0 0 0 0 0 0 0 202794 116793 199807 71 21 8 9 0 0 130160160 116373584 0 30 0 0 0 0 0 0 18 0 0 203123 117857 198825 69 20 11 This behavior occurred consistently while the application server was processing synthetic transactions: HTTP requests from JMeter running on an external machine. I explored many theories trying to explain the drop outs, including: Unexpected JMeter behavior Network contention Java Garbage Collection Application Server thread pool problems Connection pool problems Database transaction processing Database I/O contention Graphing the CPU %idle led to a breakthrough: Several of the drop outs were 30 seconds apart. With that insight, I went digging through the data again and looking for other outliers that were 30 seconds apart. In the database server statistics, I found spikes in the iostat "asvc_t" (average response time of disk transactions, in milliseconds) for the disk drive that was being used for the database log files. Here is an example: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 2053.6 0.0 8234.3 0.0 0.2 0.0 0.1 0 24 c3t60080E5...F4F6d0s0 0.0 2162.2 0.0 8652.8 0.0 0.3 0.0 0.1 0 28 c3t60080E5...F4F6d0s0 0.0 1102.5 0.0 10012.8 0.0 4.5 0.0 4.1 0 69 c3t60080E5...F4F6d0s0 0.0 74.0 0.0 7920.6 0.0 10.0 0.0 135.1 0 100 c3t60080E5...F4F6d0s0 0.0 568.7 0.0 6674.0 0.0 6.4 0.0 11.2 0 90 c3t60080E5...F4F6d0s0 0.0 1358.0 0.0 5456.0 0.0 0.6 0.0 0.4 0 55 c3t60080E5...F4F6d0s0 0.0 1314.3 0.0 5285.2 0.0 0.7 0.0 0.5 0 70 c3t60080E5...F4F6d0s0 Here is a little more information about my database configuration: The database and application server were running on two different SPARC servers. Storage for the database was on a storage array connected via 8 gigabit Fibre Channel Data storage and log file were on different physical disk drives Reliable low latency I/O is provided by battery backed NVRAM Highly available: Two Fibre Channel links accessed via MPxIO Two Mirrored cache controllers The log file physical disks were mirrored in the storage device Database log files on a ZFS Filesystem with cutting-edge technologies, such as copy-on-write and end-to-end checksumming Why would I be getting service time spikes in my high-end storage? First, I wanted to verify that the database log disk service time spikes aligned with the application server CPU drop outs, and they did: At first, I guessed that the disk service time spikes might be related to flushing the write through cache on the storage device, but I was unable to validate that theory. After searching the WWW for a while, I decided to try using a separate log device: # zpool add ZFS-db-41 log c3t60080E500017D55C000015C150A9F8A7d0 The ZFS log device is configured in a similar manner as described above: two physical disks mirrored in the storage array. This change to the database storage configuration eliminated the application server CPU drop outs: Here is the zpool configuration: # zpool status ZFS-db-41 pool: ZFS-db-41 state: ONLINE scan: none requested config: NAME STATE ZFS-db-41 ONLINE c3t60080E5...F4F6d0 ONLINE logs c3t60080E5...F8A7d0 ONLINE Now, the I/O spikes look like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1053.5 0.0 4234.1 0.0 0.8 0.0 0.7 0 75 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1131.8 0.0 4555.3 0.0 0.8 0.0 0.7 0 76 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1167.6 0.0 4682.2 0.0 0.7 0.0 0.6 0 74 c3t60080E5...F8A7d0s0 0.0 162.2 0.0 19153.9 0.0 0.7 0.0 4.2 0 12 c3t60080E5...F4F6d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1247.2 0.0 4992.6 0.0 0.7 0.0 0.6 0 71 c3t60080E5...F8A7d0s0 0.0 41.0 0.0 70.0 0.0 0.1 0.0 1.6 0 2 c3t60080E5...F4F6d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1241.3 0.0 4989.3 0.0 0.8 0.0 0.6 0 75 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1193.2 0.0 4772.9 0.0 0.7 0.0 0.6 0 71 c3t60080E5...F8A7d0s0 We can see the steady flow of 4k writes to the ZIL device from O_SYNC database log file writes. The spikes are from flushing the transaction group. Like almost all problems that I run into, once I thoroughly understand the problem, I find that other people have documented similar experiences. Thanks to all of you who have documented alternative approaches. Saved for another day: now that the problem is obvious, I should try "zfs:zfs_immediate_write_sz" as recommended in the ZFS Evil Tuning Guide. References: The ZFS Intent Log Solaris ZFS, Synchronous Writes and the ZIL Explained ZFS Evil Tuning Guide: Cache Flushes ZFS Evil Tuning Guide: Tuning ZFS for Database Performance

Read the article
Why is iTunes starting and stopping play randomly, and how do I stop it?

- by Chris R

Since yesterday morning my copy of iTunes has been starting and stopping randomly. If iTunes is not running, then it opens and sometimes begins playing, other times sits idle. Eventually, after a random interval it will begin playing a song, and then stop, and so on... Needless to say, it's driving me mad. (Mac OSX, 10.6.3, on a new-ish (< 1 year old) 24" iMac) I've made five changes to my system that may or may not be connected to this: My office phone was replaced with a Linksys IP Phone, which necessitated a change to my networking; where previously my Mac was connected directly to the office network port, now it is connected through the phone. My network connection now uses auto link detection in lieu of forcing 100Mbit I unpaired my bluetooth headset. I removed the USB audio device associated with another headset. I upgraded to Safari 5. I don't use it as a primary browser, but it's often open to run web apps that I'm developing. All of these things happened in pretty close proximity to each other, so one or more of them may be the culprit. One other thing that may or may not be related; for some reason my built-in microphone is no longer picking up audio. It seems like this might be connected to the iTunes issue, because it happened around the same time. In terms of things that I've tried in order to solve this, I'm at a bit of a loss. I followed the instructions at http://developer.apple.com/mac/library/technotes/tn2004/tn2124.html#SECLAUNCHDLOGGING to enable detailed launchd logging to see if I could track down which process was asking iTunes to open (when it's not already open) but I wasn't able to make heads or tails of the output. I'm not even sure if I'm looking in the right place, to be honest; it actually acts like something is activating the application with AppleScript, but I have no processes running that are doing that, as far as I know. I'm running a few apps that have iTunes integration: Adium, iChat with Chax, Quicksilver. None of these have been changed lately, so I consider them low risks of causing this, but it's not impossible. Moreover, I'm not using any of those features intentionally. This is a snippet of launchd debug logging from around the time it just launched: 10-06-09 9:14:29 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:29 AM com.apple.launchd[1] KEVENT[0]: udata = 0x10002b230 data = 0x30 ident = 5 filter = EVFILT_READ flags = EV_ADD|EV_RECEIPT fflags = 0x0 10-06-09 9:14:29 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:29 AM com.apple.launchd[1] KEVENT[0]: udata = 0x100802000 data = 0x0 ident = 26 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_FORK 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.coreservicesd[26]) Dispatching kevent callback. 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.coreservicesd[26]) EVFILT_PROC event for job: 10-06-09 9:14:29 AM com.apple.launchd[1] KEVENT[0]: udata = 0x1004076f0 data = 0x0 ident = 26 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_FORK 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.coreservicesd[26]) fork()ed 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave) Conceived 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Created PID 22197 anonymously by PPID 26 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Looking up per user launchd for UID: 0 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Per user launchd job found for UID: 505 10-06-09 9:14:29 AM com.apple.launchd[1] System: Looking up service com.apple.system.notification_center 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.notification_center 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Looking up per user launchd for UID: 0 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Per user launchd job found for UID: 505 10-06-09 9:14:29 AM com.apple.launchd[1] System: Looking up service com.apple.system.DirectoryService.libinfo_v1 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.DirectoryService.libinfo_v1 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Looking up per user launchd for UID: 0 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Per user launchd job found for UID: 505 10-06-09 9:14:29 AM com.apple.launchd[1] System: Looking up service com.apple.system.DirectoryService.membership_v1 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.DirectoryService.membership_v1 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Looking up per user launchd for UID: 0 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Per user launchd job found for UID: 505 10-06-09 9:14:29 AM com.apple.launchd[1] System: Looking up service com.apple.CoreServices.coreservicesd 10-06-09 9:14:29 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.CoreServices.coreservicesd 10-06-09 9:14:29 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:29 AM com.apple.launchd[1] KEVENT[0]: udata = 0x100802000 data = 0x0 ident = 22197 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_EXIT 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Dispatching kevent callback. 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) EVFILT_PROC event for job: 10-06-09 9:14:29 AM com.apple.launchd[1] KEVENT[0]: udata = 0x100401720 data = 0x0 ident = 22197 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_EXIT 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22197]) Reaping 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave) Total rusage: utime 0.000000 stime 0.000000 maxrss 0 ixrss 0 idrss 0 isrss 0 minflt 0 majflt 0 nswap 0 inblock 0 oublock 0 msgsnd 0 msgrcv 0 nsignals 0 nvcsw 0 nivcsw 0 10-06-09 9:14:29 AM com.apple.launchd[1] (0x100401720.anonymous.lssave) Removed 10-06-09 9:14:30 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:30 AM com.apple.launchd[1] KEVENT[0]: udata = 0x100802000 data = 0x0 ident = 22197 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR|EV_EOF|EV_ONESHOT fflags = NOTE_REAP 10-06-09 9:14:32 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:32 AM com.apple.launchd[1] KEVENT[0]: udata = 0x10002b230 data = 0x30 ident = 5 filter = EVFILT_READ flags = EV_ADD|EV_RECEIPT fflags = 0x0 10-06-09 9:14:33 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:33 AM com.apple.launchd[1] KEVENT[0]: udata = 0x100802000 data = 0x0 ident = 143 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_FORK 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Dispatching kevent callback. 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) EVFILT_PROC event for job: 10-06-09 9:14:33 AM com.apple.launchd[1] KEVENT[0]: udata = 0x10041e9a0 data = 0x0 ident = 143 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_FORK 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) fork()ed 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.distributed_notifications.2 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.distributed_notifications.2 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.system.notification_center 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.notification_center 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.system.DirectoryService.libinfo_v1 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.DirectoryService.libinfo_v1 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.system.DirectoryService.membership_v1 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.DirectoryService.membership_v1 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.CoreServices.coreservicesd 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.CoreServices.coreservicesd 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.SystemConfiguration.configd 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.SystemConfiguration.configd 10-06-09 9:14:33 AM com.apple.launchd[1] System: Looking up service com.apple.audio.coreaudiod 10-06-09 9:14:33 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.audio.coreaudiod 10-06-09 9:14:34 AM com.apple.launchd[1] System: Looking up service com.apple.system.logger 10-06-09 9:14:34 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.logger 10-06-09 9:14:35 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:35 AM com.apple.launchd[1] KEVENT[0]: udata = 0x10002b230 data = 0x30 ident = 5 filter = EVFILT_READ flags = EV_ADD|EV_RECEIPT fflags = 0x0 10-06-09 9:14:35 AM com.apple.launchd[1] System: Looking up service com.apple.DiskArbitration.diskarbitrationd 10-06-09 9:14:35 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.DiskArbitration.diskarbitrationd 10-06-09 9:14:35 AM com.apple.launchd[1] System: Looking up service com.apple.system.logger 10-06-09 9:14:35 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.logger 10-06-09 9:14:36 AM com.apple.launchd[1] System: Looking up service com.apple.FSEvents 10-06-09 9:14:36 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.FSEvents 10-06-09 9:14:36 AM com.apple.launchd[1] System: Looking up service com.apple.SystemConfiguration.configd 10-06-09 9:14:36 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.SystemConfiguration.configd 10-06-09 9:14:38 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:38 AM com.apple.launchd[1] KEVENT[0]: udata = 0x10002b230 data = 0x30 ident = 5 filter = EVFILT_READ flags = EV_ADD|EV_RECEIPT fflags = 0x0 10-06-09 9:14:39 AM com.apple.launchd[1] Dispatching kevent... 10-06-09 9:14:39 AM com.apple.launchd[1] KEVENT[0]: udata = 0x100802000 data = 0x0 ident = 26 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_FORK 10-06-09 9:14:39 AM com.apple.launchd[1] (com.apple.coreservicesd[26]) Dispatching kevent callback. 10-06-09 9:14:39 AM com.apple.launchd[1] (com.apple.coreservicesd[26]) EVFILT_PROC event for job: 10-06-09 9:14:39 AM com.apple.launchd[1] KEVENT[0]: udata = 0x1004076f0 data = 0x0 ident = 26 filter = EVFILT_PROC flags = EV_ADD|EV_RECEIPT|EV_CLEAR fflags = NOTE_FORK 10-06-09 9:14:39 AM com.apple.launchd[1] (com.apple.coreservicesd[26]) fork()ed 10-06-09 9:14:39 AM com.apple.launchd[1] (0x100401720.anonymous.lssave) Conceived 10-06-09 9:14:39 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22211]) Created PID 22211 anonymously by PPID 26 10-06-09 9:14:39 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22211]) Looking up per user launchd for UID: 0 10-06-09 9:14:39 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22211]) Per user launchd job found for UID: 505 10-06-09 9:14:39 AM com.apple.launchd[1] System: Looking up service com.apple.system.notification_center 10-06-09 9:14:39 AM com.apple.launchd[1] (com.apple.launchd.peruser.505[143]) Mach service lookup: com.apple.system.notification_center 10-06-09 9:14:39 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22211]) Looking up per user launchd for UID: 0 10-06-09 9:14:39 AM com.apple.launchd[1] (0x100401720.anonymous.lssave[22211]) Per user launchd job found for UID: 505 10-06-09 9:14:39 AM com.apple.launchd[1] System: Looking up service com.apple.system.DirectoryService.libinfo_v1

Read the article
FreeBSD performance tuning. Sysctls, loader.conf, kernel.

- by SaveTheRbtz

I wanted to share knowledge of tuning FreeBSD via sysctls, so i'm posting them with comments. Based on Igor Sysoev (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections. Sysctls are for 7.x FreeBSD. Since 7.2 amd64 some of them are tuned well by default. Prior 7.0 some of them are boot only (set via /boot/loader.conf) or does not exist at all. Highload web server sysctls: # Max. backlog size kern.ipc.somaxconn=4096 # Shared memory // 7.2+ can use shared memory > 2Gb kern.ipc.shmmax=2147483648 # Sockets kern.ipc.maxsockets=204800 # Do not use lager sockbufs on 8.0 # ( http://old.nabble.com/Significant-performance-regression-for-increased-maxsockbuf-on-8.0-RELEASE-tt26745981.html#a26745981 ) kern.ipc.maxsockbuf=262144 # Recive clusters (on amd64 7.2+ 65k is default) # For such high value vm.kmem_size must be increased to 3G #kern.ipc.nmbclusters=229376 # Jumbo pagesize(4k/8k) clusters # Used as general packet storage for jumbo frames # can be monitored via `netstat -m` #kern.ipc.nmbjumbop=192000 # Jumbo 9k/16k clusters # If you are using them #kern.ipc.nmbjumbo9=24000 #kern.ipc.nmbjumbo16=10240 # Every socket is a file, so increase them kern.maxfiles=204800 kern.maxfilesperproc=200000 kern.maxvnodes=200000 # Turn off receive autotuning #net.inet.tcp.recvbuf_auto=0 # Small receive space, only usable on http-server, on file server this # should be increased to 65535 or even more #net.inet.tcp.recvspace=8192 # Small send space is useful for http servers that serve small files # Autotuned since 7.x net.inet.tcp.sendspace=16384 # This should be enabled if you going to use big spaces (>64k) #net.inet.tcp.rfc1323=1 # Turn this off on highspeed, lossless connections (LAN 1Gbit+) #net.inet.tcp.delayed_ack=0 # This feature is useful if you are serving data over modems, Gigabit Ethernet, # or even high speed WAN links (or any other link with a high bandwidth delay product), # especially if you are also using window scaling or have configured a large send window. # You can try setting it to 0 on fileserver with 1GBit+ interfaces # Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 ) #net.inet.tcp.inflight.enable=0 # Disable randomizing of ports to avoid false RST # Before usage check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf # (it's also says that port randomization auto-disables at some conn.rates, but I didn't tested it thou) #net.inet.ip.portrange.randomized=0 # Increase portrange # For outgoing connections only. Good for seed-boxes and ftp servers. net.inet.ip.portrange.first=1024 net.inet.ip.portrange.last=65535 # Security net.inet.ip.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.icmp.maskrepl=0 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.drop_synfin=1 # Security net.inet.udp.blackhole=1 net.inet.tcp.blackhole=2 # Increases default TTL, sometimes useful # Default is 64 net.inet.ip.ttl=128 # Lessen max segment life to conserve resources # ACK waiting time in miliseconds (default: 30000 from RFC) net.inet.tcp.msl=5000 # Max bumber of timewait sockets net.inet.tcp.maxtcptw=40960 # Don't use tw on local connections # As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy realization. # So disable it or now till get fixed #net.inet.tcp.nolocaltimewait=1 # FIN_WAIT_2 state fast recycle net.inet.tcp.fast_finwait2_recycle=1 # Time before tcp keepalive probe is sent # default is 2 hours (7200000) #net.inet.tcp.keepidle=60000 # Should be increased until net.inet.ip.intr_queue_drops is zero net.inet.ip.intr_queue_maxlen=4096 # Interrupt handling via multiple CPU, but with context switch. # You can play with it. Default is 1; #net.isr.direct=0 # This is for routers only #net.inet.ip.forwarding=1 #net.inet.ip.fastforwarding=1 # This speed ups dummynet when channel isn't saturated net.inet.ip.dummynet.io_fast=1 # Increase dummynet(4) hash #net.inet.ip.dummynet.hash_size=2048 #net.inet.ip.dummynet.max_chain_len # Should be increased when you have A LOT of files on server # (Increase until vfs.ufs.dirhash_mem becames lower) vfs.ufs.dirhash_maxmem=67108864 # Explicit Congestion Notification (see http://en.wikipedia.org/wiki/Explicit_Congestion_Notification) net.inet.tcp.ecn.enable=1 # Flowtable - flow caching mechanism # Useful for routers #net.inet.flowtable.enable=1 #net.inet.flowtable.nmbflows=65535 # Extreme polling tuning #kern.polling.burst_max=1000 #kern.polling.each_burst=1000 #kern.polling.reg_frac=100 #kern.polling.user_frac=1 #kern.polling.idle_poll=0 # IPFW dynamic rules and timeouts tuning # Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower net.inet.ip.fw.dyn_buckets=65536 net.inet.ip.fw.dyn_max=65536 net.inet.ip.fw.dyn_ack_lifetime=120 net.inet.ip.fw.dyn_syn_lifetime=10 net.inet.ip.fw.dyn_fin_lifetime=2 net.inet.ip.fw.dyn_short_lifetime=10 # Make packets pass firewall only once when using dummynet # i.e. packets going thru pipe are passing out from firewall with accept #net.inet.ip.fw.one_pass=1 # shm_use_phys Wires all shared pages, making them unswappable # Use this to lessen Virtual Memory Manager's work when using Shared Mem. # Useful for databases #kern.ipc.shm_use_phys=1 /boot/loader.conf: # Accept filters for data, http and DNS requests # Usefull when your software uses select() instead of kevent/kqueue or when you under DDoS # DNS accf available on 8.0+ accf_data_load="YES" accf_http_load="YES" accf_dns_load="YES" # Async IO system calls aio_load="YES" # Adds NCQ support in FreeBSD # WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+ # 8.0+ only #ahci_load= #siis_load= # Increase kernel memory size to 3G. # # Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM # Otherwise panic will happen on next reboot! # # It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc # Useful on highload stateful firewalls, proxies or ZFS fileservers # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #vm.kmem_size="3G" # Older versions of FreeBSD can't tune maxfiles on the fly #kern.maxfiles="200000" # Useful for databases # Sets maximum data size to 1G # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #kern.maxdsiz="1G" # Maximum buffer size(vfs.maxbufspace) # You can check current one via vfs.bufspace # Should be lowered/upped depending on server's load-type # Usually decreased to preserve kmem # (default is 200M) #kern.maxbcache="512M" # Sendfile buffers # For i386 only #kern.ipc.nsfbufs=10240 # syncache Hash table tuning net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=100 # Incresed hostcache net.inet.tcp.hostcache.hashsize="16384" net.inet.tcp.hostcache.bucketlimit="100" # TCP control-block Hash table tuning net.inet.tcp.tcbhashsize=4096 # Enable superpages, for 7.2+ only # Also read http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html vm.pmap.pg_ps_enabled=1 # Usefull if you are using Intel-Gigabit NIC #hw.em.rxd=4096 #hw.em.txd=4096 #hw.em.rx_process_limit="-1" # Also if you have ALOT interrupts on NIC - play with following parameters # NOTE: You should set them for every NIC #dev.em.0.rx_int_delay: 250 #dev.em.0.tx_int_delay: 250 #dev.em.0.rx_abs_int_delay: 250 #dev.em.0.tx_abs_int_delay: 250 # There is also multithreaded version of em drivers can be found here: # http://people.yandex-team.ru/~wawa/ # # for additional em monitoring and statistics use # `sysctl dev.em.0.stats=1 ; dmesg` # #Same tunings for igb #hw.igb.rxd=4096 #hw.igb.txd=4096 #hw.igb.rx_process_limit=100 # Some useful netisr tunables. See sysctl net.isr #net.isr.defaultqlimit=4096 #net.isr.maxqlimit: 10240 # Bind netisr threads to CPUs #net.isr.bindthreads=1 # Nicer boot logo =) loader_logo="beastie" And finally here is my additions to GENERIC kernel # Just some of them, see also # cat /sys/{i386,amd64,}/conf/NOTES # This one useful only on i386 #options KVA_PAGES=512 # You can play with HZ in environments with high interrupt rate (default is 1000) # 100 is for my notebook to prolong it's battery life #options HZ=100 # Polling is goot on network loads with high packet rates and low-end NICs # NB! Do not enable it if you want more than one netisr thread #options DEVICE_POLLING # Eliminate datacopy on socket read-write # To take advantage with zero copy sockets you should have an MTU of 8K(amd64) # (4k for i386). This req. is only for receiving data. # Read more in man zero_copy_sockets #options ZERO_COPY_SOCKETS # Support TCP sign. Used for IPSec options TCP_SIGNATURE options IPSEC # This ones can be loaded as modules. They described in loader.conf section #options ACCEPT_FILTER_DATA #options ACCEPT_FILTER_HTTP # Adding ipfw, also can be loaded as modules options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=10 options IPFIREWALL_DEFAULT_TO_ACCEPT options IPFIREWALL_FORWARD # Adding kernel NAT options IPFIREWALL_NAT options LIBALIAS # Traffic shaping options DUMMYNET # Divert, i.e. for userspace NAT options IPDIVERT # This is for OpenBSD's pf firewall device pf device pflog # pf's QoS - ALTQ options ALTQ options ALTQ_CBQ # Class Bases Queuing (CBQ) options ALTQ_RED # Random Early Detection (RED) options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler (HFSC) options ALTQ_PRIQ # Priority Queuing (PRIQ) options ALTQ_NOPCC # Required for SMP build # Pretty console # Manual can be found here http://forums.freebsd.org/showthread.php?t=6134 #options VESA #options SC_PIXEL_MODE # Disable reboot on Ctrl Alt Del #options SC_DISABLE_REBOOT # Change normal|kernel messages color options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK) # More scroll space options SC_HISTORY_SIZE=8192 # Adding hardware crypto device device crypto device cryptodev # Useful network interfaces device vlan device tap #Virtual Ethernet driver device gre #IP over IP tunneling device if_bridge #Bridge interface device pfsync #synchronization interface for PF device carp #Common Address Redundancy Protocol device enc #IPsec interface device lagg #Link aggregation interface device stf #IPv4-IPv6 port # Also for my notebook, but may be used with Opteron #device amdtemp # Support for ECMP. More than one route for destination # Works even with default route so one can use it as LB for two ISP # For now code is unstable and panics (panic: rtfree 2) on route deletions. #options RADIX_MPATH # Multicast routing #options MROUTING #options PIM # DTrace options KDTRACE_HOOKS # all architectures - enable general DTrace hooks options DDB_CTF # all architectures - kernel ELF linker loads CTF data #options KDTRACE_FRAME # amd64-only # Adaptive spining in lockmgr (8.x+) # See http://www.mail-archive.com/[email protected]/msg10782.html options ADAPTIVE_LOCKMGRS # UTF-8 in console (9.x+) #options TEKEN_UTF8 #options TEKEN_XTERM # NCQ support # WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+ #options ATA_CAM # FreeBSD 9+ # Deadlock resolver thread # For additional information see http://www.mail-archive.com/[email protected]/msg18124.html #options DEADLKRES PS. Also most of FreeBSD's limits can be monitored by # vmstat -z and # limits PPS. variety of network counters can be monitored via # netstat -s In FreeBSD-9 netstat's -Q option appeared, try following command to display netisr stats # netstat -Q PPPS. also see # man 7 tuning PPPPS. I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning. So here is the question: What tunings are you using on yours FreeBSD servers? You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (web, smb, gateway, etc) Let's share experience!

Read the article
high load average, high wait, dmesg raid error messages (debian nfs server)

- by John Stumbles

Debian 6 on HP proliant (2 CPU) with raid (2*1.5T RAID1 + 2*2T RAID1 joined RAID0 to make 3.5T) running mainly nfs & imapd (plus samba for windows share & local www for previewing web pages); with local ubuntu desktop client mounting $HOME, laptops accessing imap & odd files (e.g. videos) via nfs/smb; boxes connected 100baseT or wifi via home router/switch uname -a Linux prole 2.6.32-5-686 #1 SMP Wed Jan 11 12:29:30 UTC 2012 i686 GNU/Linux Setup has been working for months but prone to intermittently going very slow (user experience on desktop mounting $HOME from server, or laptop playing videos) and now consistently so bad I've had to delve into it to try to find what's wrong(!) Server seems OK at low load e.g. (laptop) client (with $HOME on local disk) connecting to server's imapd and nfs mounting RAID to access 1 file: top shows load ~ 0.1 or less, 0 wait but when (desktop) client mounts $HOME and starts user KDE session (all accessing server) then top shows e.g. top - 13:41:17 up 3:43, 3 users, load average: 9.29, 9.55, 8.27 Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.4%sy, 0.0%ni, 49.0%id, 49.7%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 903856k total, 851784k used, 52072k free, 171152k buffers Swap: 0k total, 0k used, 0k free, 476896k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3935 root 20 0 2456 1088 784 R 2 0.1 0:00.02 top 1 root 20 0 2028 680 584 S 0 0.1 0:01.14 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.12 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 7 root 20 0 0 0 0 S 0 0.0 0:00.16 ksoftirqd/1 8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 9 root 20 0 0 0 0 S 0 0.0 0:00.42 events/0 10 root 20 0 0 0 0 S 0 0.0 0:02.26 events/1 11 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset 12 root 20 0 0 0 0 S 0 0.0 0:00.00 khelper 13 root 20 0 0 0 0 S 0 0.0 0:00.00 netns 14 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr 15 root 20 0 0 0 0 S 0 0.0 0:00.00 pm 16 root 20 0 0 0 0 S 0 0.0 0:00.02 sync_supers 17 root 20 0 0 0 0 S 0 0.0 0:00.02 bdi-default 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/0 19 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/1 20 root 20 0 0 0 0 S 0 0.0 0:00.02 kblockd/0 21 root 20 0 0 0 0 S 0 0.0 0:00.08 kblockd/1 22 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpid 23 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_notify 24 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_hotplug 25 root 20 0 0 0 0 S 0 0.0 0:00.00 kseriod 28 root 20 0 0 0 0 S 0 0.0 0:04.19 kondemand/0 29 root 20 0 0 0 0 S 0 0.0 0:02.93 kondemand/1 30 root 20 0 0 0 0 S 0 0.0 0:00.00 khungtaskd 31 root 20 0 0 0 0 S 0 0.0 0:00.18 kswapd0 32 root 25 5 0 0 0 S 0 0.0 0:00.00 ksmd 33 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/0 34 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/1 35 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/0 36 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/1 203 root 20 0 0 0 0 S 0 0.0 0:00.00 ksuspend_usbd 204 root 20 0 0 0 0 S 0 0.0 0:00.00 khubd 205 root 20 0 0 0 0 S 0 0.0 0:00.00 ata/0 206 root 20 0 0 0 0 S 0 0.0 0:00.00 ata/1 207 root 20 0 0 0 0 S 0 0.0 0:00.14 ata_aux 208 root 20 0 0 0 0 S 0 0.0 0:00.01 scsi_eh_0 dmesg suggests there's a disk problem: .............. (previous episode) [13276.966004] raid1:md0: read error corrected (8 sectors at 489900360 on sdc7) [13276.966043] raid1: sdb7: redirecting sector 489898312 to another mirror [13279.569186] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13279.569211] ata4.00: irq_stat 0x40000008 [13279.569230] ata4.00: failed command: READ FPDMA QUEUED [13279.569257] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13279.569262] res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13279.569306] ata4.00: status: { DRDY ERR } [13279.569321] ata4.00: error: { UNC } [13279.575362] ata4.00: configured for UDMA/133 [13279.575388] ata4: EH complete [13283.169224] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13283.169246] ata4.00: irq_stat 0x40000008 [13283.169263] ata4.00: failed command: READ FPDMA QUEUED [13283.169289] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13283.169294] res 41/40:00:07:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13283.169331] ata4.00: status: { DRDY ERR } [13283.169345] ata4.00: error: { UNC } [13283.176071] ata4.00: configured for UDMA/133 [13283.176104] ata4: EH complete [13286.224814] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13286.224837] ata4.00: irq_stat 0x40000008 [13286.224853] ata4.00: failed command: READ FPDMA QUEUED [13286.224879] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13286.224884] res 41/40:00:06:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13286.224922] ata4.00: status: { DRDY ERR } [13286.224935] ata4.00: error: { UNC } [13286.231277] ata4.00: configured for UDMA/133 [13286.231303] ata4: EH complete [13288.802623] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13288.802646] ata4.00: irq_stat 0x40000008 [13288.802662] ata4.00: failed command: READ FPDMA QUEUED [13288.802688] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13288.802693] res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13288.802731] ata4.00: status: { DRDY ERR } [13288.802745] ata4.00: error: { UNC } [13288.808901] ata4.00: configured for UDMA/133 [13288.808927] ata4: EH complete [13291.380430] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13291.380453] ata4.00: irq_stat 0x40000008 [13291.380470] ata4.00: failed command: READ FPDMA QUEUED [13291.380496] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13291.380501] res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13291.380577] ata4.00: status: { DRDY ERR } [13291.380594] ata4.00: error: { UNC } [13291.386517] ata4.00: configured for UDMA/133 [13291.386543] ata4: EH complete [13294.347147] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13294.347169] ata4.00: irq_stat 0x40000008 [13294.347186] ata4.00: failed command: READ FPDMA QUEUED [13294.347211] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13294.347217] res 41/40:00:06:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13294.347254] ata4.00: status: { DRDY ERR } [13294.347268] ata4.00: error: { UNC } [13294.353556] ata4.00: configured for UDMA/133 [13294.353583] sd 3:0:0:0: [sdc] Unhandled sense code [13294.353590] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [13294.353599] sd 3:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor] [13294.353610] Descriptor sense data with sense descriptors (in hex): [13294.353616] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [13294.353635] 23 05 6a 06 [13294.353644] sd 3:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed [13294.353657] sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 23 05 6a 00 00 00 08 00 [13294.353675] end_request: I/O error, dev sdc, sector 587557382 [13294.353726] ata4: EH complete [13294.366953] raid1:md0: read error corrected (8 sectors at 489900544 on sdc7) [13294.366992] raid1: sdc7: redirecting sector 489898496 to another mirror and they're happening quite frequently, which I guess is liable to account for the performance problem(?) # dmesg | grep mirror [12433.561822] raid1: sdc7: redirecting sector 489900464 to another mirror [12449.428933] raid1: sdb7: redirecting sector 489900504 to another mirror [12464.807016] raid1: sdb7: redirecting sector 489900512 to another mirror [12480.196222] raid1: sdb7: redirecting sector 489900520 to another mirror [12495.585413] raid1: sdb7: redirecting sector 489900528 to another mirror [12510.974424] raid1: sdb7: redirecting sector 489900536 to another mirror [12526.374933] raid1: sdb7: redirecting sector 489900544 to another mirror [12542.619938] raid1: sdc7: redirecting sector 489900608 to another mirror [12559.431328] raid1: sdc7: redirecting sector 489900616 to another mirror [12576.553866] raid1: sdc7: redirecting sector 489900624 to another mirror [12592.065265] raid1: sdc7: redirecting sector 489900632 to another mirror [12607.621121] raid1: sdc7: redirecting sector 489900640 to another mirror [12623.165856] raid1: sdc7: redirecting sector 489900648 to another mirror [12638.699474] raid1: sdc7: redirecting sector 489900656 to another mirror [12655.610881] raid1: sdc7: redirecting sector 489900664 to another mirror [12672.255617] raid1: sdc7: redirecting sector 489900672 to another mirror [12672.288746] raid1: sdc7: redirecting sector 489900680 to another mirror [12672.332376] raid1: sdc7: redirecting sector 489900688 to another mirror [12672.362935] raid1: sdc7: redirecting sector 489900696 to another mirror [12674.201177] raid1: sdc7: redirecting sector 489900704 to another mirror [12698.045050] raid1: sdc7: redirecting sector 489900712 to another mirror [12698.089309] raid1: sdc7: redirecting sector 489900720 to another mirror [12698.111999] raid1: sdc7: redirecting sector 489900728 to another mirror [12698.134006] raid1: sdc7: redirecting sector 489900736 to another mirror [12719.034376] raid1: sdc7: redirecting sector 489900744 to another mirror [12734.545775] raid1: sdc7: redirecting sector 489900752 to another mirror [12734.590014] raid1: sdc7: redirecting sector 489900760 to another mirror [12734.624050] raid1: sdc7: redirecting sector 489900768 to another mirror [12734.647308] raid1: sdc7: redirecting sector 489900776 to another mirror [12734.664657] raid1: sdc7: redirecting sector 489900784 to another mirror [12734.710642] raid1: sdc7: redirecting sector 489900792 to another mirror [12734.721919] raid1: sdc7: redirecting sector 489900800 to another mirror [12734.744732] raid1: sdc7: redirecting sector 489900808 to another mirror [12734.779330] raid1: sdc7: redirecting sector 489900816 to another mirror [12782.604564] raid1: sdb7: redirecting sector 1242934216 to another mirror [12798.264153] raid1: sdc7: redirecting sector 1242935080 to another mirror [13245.832193] raid1: sdb7: redirecting sector 489898296 to another mirror [13261.376929] raid1: sdb7: redirecting sector 489898304 to another mirror [13276.966043] raid1: sdb7: redirecting sector 489898312 to another mirror [13294.366992] raid1: sdc7: redirecting sector 489898496 to another mirror although the arrays are still running on all disks - they haven't given up on any yet: # cat /proc/mdstat Personalities : [raid1] [raid0] md10 : active raid0 md0[0] md1[1] 3368770048 blocks super 1.2 512k chunks md1 : active raid1 sde2[2] sdd2[1] 1464087824 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb7[0] sdc7[2] 1904684920 blocks super 1.2 [2/2] [UU] unused devices: <none> So I think I have some idea what the problem is but I am not a linux sysadmin expert by the remotest stretch of the imagination and would really appreciate some clue checking here with my diagnosis and what do I need to do: obviously I need to source another drive for sdc. (I'm guessing I could buy a larger drive if the price is right: I'm thinking that one day I'll need to grow the size of the array and that would be one less drive to replace with a larger one) then use mdadm to fail out the existing sdc, remove it and fit the new drive fdisk the new drive with the same size partition for the array as the old one had use mdadm to add the new drive into the array that sound OK?

Read the article
Tomcat 7 taking ages to start up after upgrade

- by Lawrence

I recently updated my server installation from Tomcat 6 to Tomcat 7, in order to take advantage of better connection pooling. My project uses Hibernate, for object persistance, a Mysql 5.5.20 database, and memcached for caching. When I was using Tomcat 6, Tomcat would start in about 8 seconds. After moving to Tomcat 7, it now takes between 75 - 80 seconds to start (this is on a Macbook pro 15", core i7 2Ghz, 8Gb of RAM). The only thing that has really changed between during the move from Tomcat 6 to 7 has been my context.xml file, which controls the connection pooling information: <Context antiJARLocking="true" reloadable="true" path=""> <Resource name="jdbc/test-db" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" testOnBorrow="true" testOnReturn="false" testWhileIdle="true" validationQuery="SELECT 1" validationQueryTimeout="20000" validationInterval="30000" timeBetweenEvictionRunsMillis="60000" logValidationErrors="true" autoReconnect="true" username="webuser" password="xxxxxxx" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://databasename.us-east-1.rds.amazonaws.com:3306/test-db" maxActive="15" minIdle="2" maxIdle="10" maxWait="10000" maxAge="7200000"/> </Context> Now, as you can see, the database is running on Amazon RDS (where our live servers are), and thus is about 200ms round trip time away from my machine. I have already checked that I have security permissions to that database from my machine, (and anyway, it connects after 75 secs, so it cant be that). My initial thought was that Tomcat 7 and hibernate are doing something weird (like pre-instantiating a bunch of connections or something), and the latency to the database is amplifying the effects. While trying to diagnose the problem, I used jstack to get a stack trace of the Tomcat 7 server while its doing its startup thing. Here is the stack trace... Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.12-b01-434 mixed mode): "Attach Listener" daemon prio=9 tid=7fa4c0038800 nid=0x10c39a000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "Abandoned connection cleanup thread" daemon prio=5 tid=7fa4bb810000 nid=0x10f3ba000 in Object.wait() [10f3b9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f40a0070> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <7f40a0070> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at com.mysql.jdbc.NonRegisteringDriver$1.run(NonRegisteringDriver.java:93) "PoolCleaner[545768040:1352724902327]" daemon prio=5 tid=7fa4be852800 nid=0x10e772000 in Object.wait() [10e771000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f40c7c90> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <7f40c7c90> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "localhost-startStop-1" daemon prio=5 tid=7fa4bd034800 nid=0x10d66b000 runnable [10d668000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114) at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161) at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189) - locked <7f3673be0> (a com.mysql.jdbc.util.ReadAheadInputStream) at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2713) - locked <7f366a1c0> (a com.mysql.jdbc.JDBC4Connection) at com.mysql.jdbc.ConnectionImpl.configureClientCharacterSet(ConnectionImpl.java:1930) at com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3571) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2445) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2215) - locked <7f366a1c0> (a com.mysql.jdbc.JDBC4Connection) at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813) at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47) at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:399) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:334) at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:278) at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182) at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:699) at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:631) at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:485) at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:143) at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:116) - locked <7f34f0dc8> (a org.apache.tomcat.jdbc.pool.DataSource) at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:103) at org.apache.tomcat.jdbc.pool.DataSourceFactory.createDataSource(DataSourceFactory.java:539) at org.apache.tomcat.jdbc.pool.DataSourceFactory.getObjectInstance(DataSourceFactory.java:237) at org.apache.naming.factory.ResourceFactory.getObjectInstance(ResourceFactory.java:143) at javax.naming.spi.NamingManager.getObjectInstance(NamingManager.java:304) at org.apache.naming.NamingContext.lookup(NamingContext.java:843) at org.apache.naming.NamingContext.lookup(NamingContext.java:154) at org.apache.naming.NamingContext.lookup(NamingContext.java:831) at org.apache.naming.NamingContext.lookup(NamingContext.java:168) at org.apache.catalina.core.NamingContextListener.addResource(NamingContextListener.java:1061) at org.apache.catalina.core.NamingContextListener.createNamingContext(NamingContextListener.java:671) at org.apache.catalina.core.NamingContextListener.lifecycleEvent(NamingContextListener.java:270) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5173) - locked <7f46b07f0> (a org.apache.catalina.core.StandardContext) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) - locked <7f46b07f0> (a org.apache.catalina.core.StandardContext) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) "Catalina-startStop-1" daemon prio=5 tid=7fa4b7a5e800 nid=0x10d568000 waiting on condition [10d567000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <7f480e970> (a java.util.concurrent.FutureTask$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1123) - locked <7f453c630> (a org.apache.catalina.core.StandardHost) at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:800) - locked <7f453c630> (a org.apache.catalina.core.StandardHost) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) - locked <7f453c630> (a org.apache.catalina.core.StandardHost) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) "GC Daemon" daemon prio=2 tid=7fa4b9912800 nid=0x10d465000 in Object.wait() [10d464000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f4506d28> (a sun.misc.GC$LatencyLock) at sun.misc.GC$Daemon.run(GC.java:100) - locked <7f4506d28> (a sun.misc.GC$LatencyLock) "Low Memory Detector" daemon prio=5 tid=7fa4b480b800 nid=0x10c8ae000 runnable [00000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=9 tid=7fa4b480b000 nid=0x10c7ab000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=9 tid=7fa4b480a000 nid=0x10c6a8000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=9 tid=7fa4b4809800 nid=0x10c5a5000 runnable [00000000] java.lang.Thread.State: RUNNABLE "Surrogate Locker Thread (Concurrent GC)" daemon prio=5 tid=7fa4b4808800 nid=0x10c4a2000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=8 tid=7fa4b793f000 nid=0x10c297000 in Object.wait() [10c296000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f451c8f0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <7f451c8f0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=7fa4b793e000 nid=0x10c194000 in Object.wait() [10c193000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f452e168> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <7f452e168> (a java.lang.ref.Reference$Lock) "main" prio=5 tid=7fa4b7800800 nid=0x104329000 waiting on condition [104327000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <7f480e9a0> (a java.util.concurrent.FutureTask$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1123) - locked <7f451fd90> (a org.apache.catalina.core.StandardEngine) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:302) - locked <7f451fd90> (a org.apache.catalina.core.StandardEngine) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) - locked <7f451fd90> (a org.apache.catalina.core.StandardEngine) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:443) - locked <7f451fd90> (a org.apache.catalina.core.StandardEngine) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) - locked <7f453e810> (a org.apache.catalina.core.StandardService) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:732) - locked <7f4506d58> (a [Lorg.apache.catalina.Service;) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) - locked <7f44f7ba0> (a org.apache.catalina.core.StandardServer) at org.apache.catalina.startup.Catalina.start(Catalina.java:684) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:451) "VM Thread" prio=9 tid=7fa4b7939800 nid=0x10c091000 runnable "Gang worker#0 (Parallel GC Threads)" prio=9 tid=7fa4b7802000 nid=0x10772b000 runnable "Gang worker#1 (Parallel GC Threads)" prio=9 tid=7fa4b7802800 nid=0x10782e000 runnable "Gang worker#2 (Parallel GC Threads)" prio=9 tid=7fa4b7803000 nid=0x107931000 runnable "Gang worker#3 (Parallel GC Threads)" prio=9 tid=7fa4b7804000 nid=0x107a34000 runnable "Gang worker#4 (Parallel GC Threads)" prio=9 tid=7fa4b7804800 nid=0x107b37000 runnable "Gang worker#5 (Parallel GC Threads)" prio=9 tid=7fa4b7805000 nid=0x107c3a000 runnable "Gang worker#6 (Parallel GC Threads)" prio=9 tid=7fa4b7805800 nid=0x107d3d000 runnable "Gang worker#7 (Parallel GC Threads)" prio=9 tid=7fa4b7806800 nid=0x107e40000 runnable "Concurrent Mark-Sweep GC Thread" prio=9 tid=7fa4b78e3800 nid=0x10bd0b000 runnable "Gang worker#0 (Parallel CMS Threads)" prio=9 tid=7fa4b78e2800 nid=0x10b305000 runnable "Gang worker#1 (Parallel CMS Threads)" prio=9 tid=7fa4b78e3000 nid=0x10b408000 runnable "VM Periodic Task Thread" prio=10 tid=7fa4b4815800 nid=0x10c9b1000 waiting on condition "Exception Catcher Thread" prio=10 tid=7fa4b7801800 nid=0x104554000 runnable JNI global references: 919 The only thing I can figure out from this is that it looks like the mysql jdbc drivers might have something to do with the long start up (the various stack traces I took during the start up process all pretty much look the same as this). Could anyone shed some light on what might be causing this? Have I done something dense in my context.xml? Is hibernate perhaps to blame?

Read the article
I need advices: small memory footprint linux mail server with spam filtering

- by petermolnar

I have a VPS which is originally destined to be a webserver but some minimal mail capabilities are needed to be deployed as well, including sending and receiving as standalone server. The current setup is the following: Postfix reveices the mail, the users are in virtual tables, stored in MySQL on connection all servers are tested with policyd-weight service against some DNSBLs all mail is runs through SpamAssassin spamd with the help of spamc client the mail is then delivered with Dovecot 2' LDA (local delivery agent), virtual users as well As you saw... there's no virus scanner running, and that's for a reason: clamav eats all the memory possible and also, virus mails are all filtered out with this setup (I've tested the same with ClamAV enabled for 1,5 years, no virus mail ever got even to ClamAV) I don't use amavisd and I really don't want to. You only need that monster if you have plenty of memory and lots of simultaneous scanners. It's also a nightmare to fine tune by hand. I run policyd-weight instead of policyd and native DNSBLs in postfix. I don't like to send someone away because a single service listed them. Important statement: everything works fine. I receive very small amount of spam, nearly never get a false positive and most of the bad mail is stopped by policyd-weight. The only "problem" that I feel the services at total uses a bit much memory alltogether. I've already cut the modules of spamassassin (see below), but I'd really like to hear some advices how to cut the memory footprint as low as possible, mostly: what plugins SpamAssassin really needs and what are more or less useless, regarding to my current postfix & policyd-weight setup? SpamAssassin rules are also compiled with sa-compile (sa-update runs once a week from cron, compile runs right after that) These are some of the current configurations that may matter, please tell me if you need anything more. postfix/master.cf (parts only) dovecot unix - n n - - pipe flags=DRhu user=vmail:vmail argv=/usr/bin/spamc -e /usr/lib/dovecot/deliver -d ${recipient} -f {sender} postfix/main.cf (parts only) smtpd_helo_required = yes smtpd_helo_restrictions = permit_mynetworks, reject_invalid_hostname, permit smtpd_recipient_restrictions = permit_mynetworks, permit_sasl_authenticated, reject_invalid_hostname, reject_non_fqdn_hostname, reject_non_fqdn_recipient, reject_unknown_recipient_domain, reject_unauth_pipelining, reject_unauth_destination, check_policy_service inet:127.0.0.1:12525, permit policyd-weight.conf (parts only) $REJECTMSG = "550 Mail appeared to be SPAM or forged. Ask your Mail/DNS-Administrator to correct HELO and DNS MX settings or to get removed from DNSBLs"; $REJECTLEVEL = 4; $DEFER_STRING = 'IN_SPAMCOP= BOGUS_MX='; $DEFER_ACTION = '450'; $DEFER_LEVEL = 5; $DNSERRMSG = '450 No DNS entries for your MTA, HELO and Domain. Contact YOUR administrator'; # 1: ON, 0: OFF (default) # If ON request that ALL clients are only checked against RBLs $dnsbl_checks_only = 0; # 1: ON (default), 0: OFF # When set to ON it logs only RBLs which affect scoring (positive or negative) $LOG_BAD_RBL_ONLY = 1; ## DNSBL settings @dnsbl_score = ( # host, hit, miss, log name 'dnsbl.ahbl.org', 3, -1, 'dnsbl.ahbl.org', 'dnsbl.njabl.org', 3, -1, 'dnsbl.njabl.org', 'dnsbl.sorbs.net', 3, -1, 'dnsbl.sorbs.net', 'bl.spamcop.net', 3, -1, 'bl.spamcop.net', 'zen.spamhaus.org', 3, -1, 'zen.spamhaus.org', 'pbl.spamhaus.org', 3, -1, 'pbl.spamhaus.org', 'cbl.abuseat.org', 3, -1, 'cbl.abuseat.org', 'list.dsbl.org', 3, -1, 'list.dsbl.org', ); # If Client IP is listed in MORE DNSBLS than this var, it gets REJECTed immediately $MAXDNSBLHITS = 3; # alternatively, if the score of DNSBLs is ABOVE this level, reject immediately $MAXDNSBLSCORE = 9; $MAXDNSBLMSG = '550 Az levelezoszerveruk IP cime tul sok spamlistan talahato, kerjuk ellenorizze! / Your MTA is listed in too many DNSBLs; please check.'; ## RHSBL settings @rhsbl_score = ( 'multi.surbl.org', 4, 0, 'multi.surbl.org', 'rhsbl.ahbl.org', 4, 0, 'rhsbl.ahbl.org', 'dsn.rfc-ignorant.org', 4, 0, 'dsn.rfc-ignorant.org', # 'postmaster.rfc-ignorant.org', 0.1, 0, 'postmaster.rfc-ignorant.org', # 'abuse.rfc-ignorant.org', 0.1, 0, 'abuse.rfc-ignorant.org' ); # skip a RBL if this RBL had this many continuous errors $BL_ERROR_SKIP = 2; # skip a RBL for that many times $BL_SKIP_RELEASE = 10; ## cache stuff # must be a directory (add trailing slash) $LOCKPATH = '/var/run/policyd-weight/'; # socket path for the cache daemon. $SPATH = $LOCKPATH.'/polw.sock'; # how many seconds the cache may be idle before starting maintenance routines #NOTE: standard maintenance jobs happen regardless of this setting. $MAXIDLECACHE = 60; # after this number of requests do following maintenance jobs: checking for config changes $MAINTENANCE_LEVEL = 5; # negative (i.e. SPAM) result cache settings ################################## # set to 0 to disable caching for spam results. To this level the cache will be cleaned. $CACHESIZE = 2000; # at this number of entries cleanup takes place $CACHEMAXSIZE = 4000; $CACHEREJECTMSG = '550 temporarily blocked because of previous errors'; # after NTTL retries the cache entry is deleted $NTTL = 1; # client MUST NOT retry within this seconds in order to decrease TTL counter $NTIME = 30; # positve (i.,e. HAM) result cache settings ################################### # set to 0 to disable caching of HAM. To this number of entries the cache will be cleaned $POSCACHESIZE = 1000; # at this number of entries cleanup takes place $POSCACHEMAXSIZE = 2000; $POSCACHEMSG = 'using cached result'; #after PTTL requests the HAM entry must succeed one time the RBL checks again $PTTL = 60; # after $PTIME in HAM Cache the client must pass one time the RBL checks again. #Values must be nonfractal. Accepted time-units: s, m, h, d $PTIME = '3h'; # The client must pass this time the RBL checks in order to be listed as hard-HAM # After this time the client will pass immediately for PTTL within PTIME $TEMP_PTIME = '1d'; ## DNS settings # Retries for ONE DNS-Lookup $DNS_RETRIES = 1; # Retry-interval for ONE DNS-Lookup $DNS_RETRY_IVAL = 5; # max error count for unresponded queries in a complete policy query $MAXDNSERR = 3; $MAXDNSERRMSG = 'passed - too many local DNS-errors'; # persistent udp connection for DNS queries. #broken in Net::DNS version 0.51. Works with Net::DNS 0.53; DEFAULT: off $PUDP= 0; # Force the usage of Net::DNS for RBL lookups. # Normally policyd-weight tries to use a faster RBL lookup routine instead of Net::DNS $USE_NET_DNS = 0; # A list of space separated NS IPs # This overrides resolv.conf settings # Example: $NS = '1.2.3.4 1.2.3.5'; # DEFAULT: empty $NS = ''; # timeout for receiving from cache instance $IPC_TIMEOUT = 2; # If set to 1 policyd-weight closes connections to smtpd clients in order to avoid too many #established connections to one policyd-weight child $TRY_BALANCE = 0; # scores for checks, WARNING: they may manipulate eachother # or be factors for other scores. # HIT score, MISS Score @client_ip_eq_helo_score = (1.5, -1.25 ); @helo_score = (1.5, -2 ); @helo_score = (0, -2 ); @helo_from_mx_eq_ip_score= (1.5, -3.1 ); @helo_numeric_score= (2.5, 0 ); @from_match_regex_verified_helo= (1,-2 ); @from_match_regex_unverified_helo = (1.6, -1.5 ); @from_match_regex_failed_helo = (2.5, 0 ); @helo_seems_dialup = (1.5, 0 ); @failed_helo_seems_dialup= (2, 0 ); @helo_ip_in_client_subnet= (0,-1.2 ); @helo_ip_in_cl16_subnet = (0,-0.41 ); #@client_seems_dialup_score = (3.75, 0 ); @client_seems_dialup_score = (0, 0 ); @from_multiparted = (1.09, 0 ); @from_anon= (1.17, 0 ); @bogus_mx_score = (2.1, 0 ); @random_sender_score = (0.25, 0 ); @rhsbl_penalty_score = (3.1, 0 ); @enforce_dyndns_score = (3, 0 ); spamassassin/init.pre (I've put the .pre files together) loadplugin Mail::SpamAssassin::Plugin::Hashcash loadplugin Mail::SpamAssassin::Plugin::SPF loadplugin Mail::SpamAssassin::Plugin::Pyzor loadplugin Mail::SpamAssassin::Plugin::Razor2 loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold loadplugin Mail::SpamAssassin::Plugin::MIMEHeader loadplugin Mail::SpamAssassin::Plugin::ReplaceTags loadplugin Mail::SpamAssassin::Plugin::Check loadplugin Mail::SpamAssassin::Plugin::HTTPSMismatch loadplugin Mail::SpamAssassin::Plugin::URIDetail loadplugin Mail::SpamAssassin::Plugin::Bayes loadplugin Mail::SpamAssassin::Plugin::BodyEval loadplugin Mail::SpamAssassin::Plugin::DNSEval loadplugin Mail::SpamAssassin::Plugin::HTMLEval loadplugin Mail::SpamAssassin::Plugin::HeaderEval loadplugin Mail::SpamAssassin::Plugin::MIMEEval loadplugin Mail::SpamAssassin::Plugin::RelayEval loadplugin Mail::SpamAssassin::Plugin::URIEval loadplugin Mail::SpamAssassin::Plugin::WLBLEval loadplugin Mail::SpamAssassin::Plugin::VBounce loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody spamassassin/local.cf (parts) use_bayes 1 bayes_auto_learn 1 bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:db:127.0.0.1:3306 bayes_sql_username user bayes_sql_password pass bayes_ignore_header X-Bogosity bayes_ignore_header X-Spam-Flag bayes_ignore_header X-Spam-Status ### User settings user_scores_dsn DBI:mysql:db:127.0.0.1:3306 user_scores_sql_password user user_scores_sql_username pass user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = CONCAT('%',_DOMAIN_) ORDER BY username ASC # for better speed score DNS_FROM_AHBL_RHSBL 0 score __RFC_IGNORANT_ENVFROM 0 score DNS_FROM_RFC_DSN 0 score DNS_FROM_RFC_BOGUSMX 0 score __DNS_FROM_RFC_POST 0 score __DNS_FROM_RFC_ABUSE 0 score __DNS_FROM_RFC_WHOIS 0 UPDATE 01 As adaptr advised I remove policyd-weight and configured postfix postscreen, this resulted approximately -15-20 MB from RAM usage and a lot faster work. I'm not sure it's working at full capacity but it seems promising.

Read the article
PC powers off at random times

- by Timo Huovinen

Short Version After experiencing some problems with Mobo batteries my PC started to power off at random times, the power off is instant and sudden and does not restart afterwards, need help figuring out the cause. Facts: Powers off when PC is playing games Powers off when PC is idle Powers off when PC is in safe mode Powers off when PC is in BIOS Powers off when PC is booted through a Windows installation USB Replaced the motherboard battery several times Replaced the 650W PSU with a 750W PSU Replaced the RAM Swapped the RAM between slots Re-applied thermal paste to the CPU Checked if the motherboard touches the case Nothing is overclocked PC Specs PC specs: OS: Windows 7 Ultimate SP1 RAM: klingston 1333MHz 4GB stick CPU: AMD Phenom II x4 955 Mobo: Gigabyte 88GMA-UD2H rev 2.2 Motherboard battery: CR2032 3v HDD: 500GB Seagate ST3500418AS ATA Device Graphics: ATI/AMD Radeon HD 6870 Very Long version Around 10 months ago I built a brand new gaming PC. Around 6 months ago it's time setting in windows started resetting to the year 2010. I swapped the Motherboard battery for a new one of the exact same size and shape and voltage, and the problems disappeared...for around 2 weeks. Then the same problem happened again, time gets reset, I swapped the battery again, and the problem was gone for good and everything was great for about 3 months.. then another problem started happening, the PC started to power off suddenly and without warning at completely random times, sometimes the PC works for and hour, sometimes 5 minutes. So I read on the forums that it might be either the PSU or the motherboard Battery or RAM or HDD or the Graphics card or the CPU or the motherboard or the drivers or a Virus or Grounding issues, or something short circuiting, basically it can be anything... I spent some days researching, and decided to remove the possibility of a virus. I reset the CMOS, cleared all BIOS settings and reinstalled windows 7 after a full format of the HDD, but the random power off kept happening. I then disabled the restart on error option in windows and looked at the event log for error events, but they did not help me figure out the problem. Network list service depends on network location awareness the dependency group failed to start Source Kernel Power Event 41 Task Category 63 Source Disk Event ID 11 Task Category None The driver detected a controller error on device disk I took apart the PC, every little piece, re-applied some expensive thermal paste to the CPU, and double checked that none of the pieces are touching the PC case. The problem was gone, the PC no longer powered off randomly I re-attached the graphics card and all was good for 4 months... then the power off problem appeared again, but was happening at high intervals, the PC would shutdown once in 2 days on average, at random points in time, sometimes when it's idle all day long, sometimes when it's running CRYSIS 2. I checked the CPU temperature, because I know that AMD CPU's have a built in protection mechanism that switches off the PC if the CPU gets too hot, and the Temp was 50C system temp, and 45C CPU after running the PC all day long (I did not do tests to see if there are any temperature spikes, don't know how to do them) Originally the PSU that powered the PC was 650Watts and had one 4 pin cable to power the CPU, I replaced it with a new 750Watts PSU which has two 4 pin cables for the CPU, but the problem remained. I removed the graphics card and let the motherboard use the built in one, but the PC kept suddenly powering off at random times. I took apart the PC completely again, and re-applied thermal paste to the CPU, added lots of insulation, and checked for any type of short-circuit possibility again and again, but the problem remained. The problem was like that for some months. I replaced the Battery a couple of times over the time, changed lots of options in windows, and tried everything I could, but it kept powering off, so I stopped using the PC as much as I used to, just living with the random power offs from time to time, until a couple of days ago, when the power off happens almost immediately after powering on the PC. I replaced the RAM with a brand new one, but that did not help. Took apart the PC again, checked for anything anywhere that might cause it, found some small scratches on the very edge of the motherboard to the left of the PCI express x16 slot. This might cause the problem, I thought, but the scratch looks very superficial, not deep at all, and if the scratch did harm the motherboard, wouldn't it cause it to not start at all? And why did it start to power off a while ago, and then suddenly stop powering off? The scratches could not have vanished??? did chkdsk \d but it powered off when it was at 75% I removed the hard disks, the graphics card, while I fiddled with the BIOS settings, and suddenly the PC shut down while I was looking at the BIOS version. This makes me realize, it is not caused by: HDD, Windows, Drivers or the Graphics card I cleared the CMOS again, updated the BIOS from F5 to F6f beta, but that did not help, it might even seem that the PC powers off even sooner. The shutdown even happened to me while I booted through a windows 7 installation USB and was in the repair console. I removed one of the cables powering the CPU, now only one 4pin cable powers it, and it worked for 30mins after doing that, which makes me think that it's the CPU overheating, and because it gets less power, it overheats slower? The things that I am still considering: CPU overheating (does not seem to overheat, maybe false readings?) Motherboard short circuiting (faulty motherboard?) I desperately need some advice in what is faulty, is it a faulty Motherboard or an overheating CPU? or maybe something else? I have been breaking my head over this problem over a span of 6 months. I'm not sure if this is a good place to ask this question, if it is not, then tell me where I can get some experienced help. More info I have also discovered a mysterious piece that seems to have fallen out of the motherboard i119.photobucket.com/albums/o126/yurikolovsky/strangepiece.jpg What is it? Looks like each time that it powers off the datetime gets reset I also found another forum post tomshardware.co.uk/forum/… except I don't have Integrated PeripheralsUSB Keyboard Function option in BIOS :S Comments summary (asked by Random moderator) Q. tell me, if the computer restarts, is it immediately? Does it take a second and then restarts? Do you see (BSOD) or hear (PSU, short circuit) any suspicious when it happens? After reading trough it, it remains the mainboard that is faulty. – JohannesM A. Immediate power off, all the fans stop instantly, all the light turn off instantly, no sound or anything, and it remains off until I turn it back on. Thanks for the feedback, faulty motherboard is what I fear. Q. Try stress-testing the system with Prime95 and see if errors or shutdowns occur when the CPU is under full load. – speakr A. Prime95 heat stress test peaked CPU heat at 60C after 5mins, it powered off after 30mins of testing in the middle of the test with no errors, Prime95 Heat test or the stress-testing with low RAM usage (small or in-place FFTs) do not report errors while testing for 10-60 mins. The power off does not seem like it is affected by Prime95 at all Makes me wonder if it's a CPU or Motherboard issue at all. Q. I had similar random/intermittent problems with my old board. It gave one of a few different symptoms: keyboard and/or mouse would die and/or the RAM wouldn't work and/or it would shut down. It was in bad shape. One problems was that my old PSU had literally burned the connector on it (browned around the pins), another was that a broken lead inside the layers of the PCB would work sometimes if it happened to be hot or if I bent the board—by jamming a hunk of wood behind it. I managed to keep the board alive for several years, but eventually nothing I did would make it work correctly anymore. – Synetech A. I will try that as the last resort, ok? ;) Q. Have you tried a different power cord, surge protector, outlet (on a different circuit). It's worth a shot just to ensure it's not subpar wiring or a week circuit (dips in power may cause shutdown if the PSU can't pull enough juice from the wall). – Kyle A. yes, I attached the PC to an entirely different outlet on a different circuit and the problem persists. After connecting it to a different outlet after starting the PC it gave me 3 long beeps and 1 short one, then the PC immediately proceeded to boot up normally. Q. Re-check your mainboard manual and all PSU connections to your mainboard to be sure that nothing is missing (e.g. 12V ATX 4-pin/6-pin connector). If you can provoke shutdowns with Prime95, then consider buying new hardware -- a stable system should run Prime95 for 24h without any errors. Prime95 mentions errors in the log when they occur and gives a summary after the stress test was stopped manually (e.g. "0 errors, 0 warnings", if all is fine) – speakr A. Re-checked, there are no more PSU connectors that I can physically connect, except the one ATX 4-pin (there are 2 that power the CPU) that I disconnected on purpose, I have reconnected it but the problem persists. Q. With one PC I had a short curcuit. The power button on the front plate had its cables soldered, but not isolated, and the contacts were very close to the metal case. A heavier touch was enough to cause a shutdown. The PC's vibration could be enough – ott-- A. yes, it seems to switch off with even the lightest touch, I switched on the PC, then pulled out the front panel power cable that connects to the motherboard so the power button does not work anymore, after 5 mins of working like that, with the power button completely disconnected, just sitting idle, the PC powered off again, I don't think it's the power button. Q. I wonder if you dare to operate components without the case, that is remove motherboard, power, disk ( just put the motherboard on a wooden desk). Don't bend the adapters when running like that. – ott-- A. yes, I do dare to do that, but only tomorrow, too tired/late right now.

Read the article
MySQL query, 2 similar servers, 2 minute difference in execution times

- by mr12086

I had a similar question on stack overflow, but it seems to be more server/mysql setup related than coding. The queries below all execute instantly on our development server where as they can take upto 2 minutes 20 seconds. The query execution time seems to be affected by home ambiguous the LIKE string's are. If they closely match a country that has few matches it will take less time, and if you use something like 'ge' for germany - it will take longer to execute. But this doesn't always work out like that, at times its quite erratic. Sending data appears to be the culprit but why and what does that mean. Also memory on production looks to be quite low (free memory)? Production: Intel Quad Xeon E3-1220 3.1GHz 4GB DDR3 2x 1TB SATA in RAID1 Network speed 100Mb Ubuntu Development Intel Core i3-2100, 2C/4T, 3.10GHz 500 GB SATA - No RAID 4GB DDR3 UPDATE 2 : mysqltuner output: [prod] -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.61-0ubuntu0.10.04.1 [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 103M (Tables: 180) [--] Data in InnoDB tables: 491M (Tables: 19) [!!] Total fragmented tables: 38 -------- Security Recommendations ------------------------------------------- [OK] All database users have passwords assigned -------- Performance Metrics ------------------------------------------------- [--] Up for: 77d 4h 6m 1s (53M q [7.968 qps], 14M conn, TX: 87B, RX: 12B) [--] Reads / Writes: 98% / 2% [--] Total buffers: 58.0M global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 463.8M (11% of installed RAM) [OK] Slow queries: 0% (12K/53M) [OK] Highest usage of available connections: 22% (34/151) [OK] Key buffer size / total MyISAM indexes: 16.0M/10.6M [OK] Key buffer hit rate: 98.7% (162M cached / 2M reads) [OK] Query cache efficiency: 20.7% (7M cached / 36M selects) [!!] Query cache prunes per day: 3934 [OK] Sorts requiring temporary tables: 1% (3K temp sorts / 230K sorts) [!!] Joins performed without indexes: 71068 [OK] Temporary tables created on disk: 24% (3M on disk / 13M total) [OK] Thread cache hit rate: 99% (690 created / 14M connections) [!!] Table cache hit rate: 0% (64 open / 85M opened) [OK] Open file limit used: 12% (128/1K) [OK] Table locks acquired immediately: 99% (16M immediate / 16M locks) [!!] InnoDB data size / buffer pool: 491.9M/8.0M -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance Enable the slow query log to troubleshoot bad queries Adjust your join queries to always utilize indexes Increase table_cache gradually to avoid file descriptor limits Variables to adjust: query_cache_size (> 16M) join_buffer_size (> 128.0K, or always use indexes with joins) table_cache (> 64) innodb_buffer_pool_size (>= 491M) [dev] -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.62-0ubuntu0.11.10.1 [!!] Switch to 64-bit OS - MySQL cannot currently use all of your RAM -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 185M (Tables: 632) [--] Data in InnoDB tables: 967M (Tables: 38) [!!] Total fragmented tables: 73 -------- Security Recommendations ------------------------------------------- [OK] All database users have passwords assigned -------- Performance Metrics ------------------------------------------------- [--] Up for: 1d 2h 26m 9s (5K q [0.058 qps], 1K conn, TX: 4M, RX: 1M) [--] Reads / Writes: 99% / 1% [--] Total buffers: 58.0M global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 463.8M (11% of installed RAM) [OK] Slow queries: 0% (0/5K) [OK] Highest usage of available connections: 1% (2/151) [OK] Key buffer size / total MyISAM indexes: 16.0M/18.6M [OK] Key buffer hit rate: 99.9% (60K cached / 36 reads) [OK] Query cache efficiency: 44.5% (1K cached / 2K selects) [OK] Query cache prunes per day: 0 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 44 sorts) [OK] Temporary tables created on disk: 24% (162 on disk / 666 total) [OK] Thread cache hit rate: 99% (2 created / 1K connections) [!!] Table cache hit rate: 1% (64 open / 4K opened) [OK] Open file limit used: 8% (88/1K) [OK] Table locks acquired immediately: 100% (1K immediate / 1K locks) [!!] InnoDB data size / buffer pool: 967.7M/8.0M -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance Enable the slow query log to troubleshoot bad queries Increase table_cache gradually to avoid file descriptor limits Variables to adjust: table_cache (> 64) innodb_buffer_pool_size (>= 967M) UPDATE 1: When testing the queries listed here there is usually no more than one other query taking place, and usually none. Because production is actually handling apache requests that development gets very few of as it's only myself and 1 other who accesses it - could the 4GB of RAM be getting exhausted by using the single machine for both apache and mysql server? Production: sudo hdparm -tT /dev/sda /dev/sda: Timing cached reads: 24872 MB in 2.00 seconds = 12450.72 MB/sec Timing buffered disk reads: 368 MB in 3.00 seconds = 122.49 MB/sec sudo hdparm -tT /dev/sdb /dev/sdb: Timing cached reads: 24786 MB in 2.00 seconds = 12407.22 MB/sec Timing buffered disk reads: 350 MB in 3.00 seconds = 116.53 MB/sec Server version(mysql + ubuntu versions): 5.1.61-0ubuntu0.10.04.1 Development: sudo hdparm -tT /dev/sda /dev/sda: Timing cached reads: 10632 MB in 2.00 seconds = 5319.40 MB/sec Timing buffered disk reads: 400 MB in 3.01 seconds = 132.85 MB/sec Server version(mysql + ubuntu versions): 5.1.62-0ubuntu0.11.10.1 ORIGINAL DATA : This query is NOT the query in question but is related so ill post it. SELECT f.form_question_has_answer_id FROM form_question_has_answer f INNER JOIN project_company_has_user p ON f.form_question_has_answer_user_id = p.project_company_has_user_user_id INNER JOIN company c ON p.project_company_has_user_company_id = c.company_id INNER JOIN project p2 ON p.project_company_has_user_project_id = p2.project_id INNER JOIN user u ON p.project_company_has_user_user_id = u.user_id INNER JOIN form f2 ON p.project_company_has_user_project_id = f2.form_project_id WHERE (f2.form_template_name = 'custom' AND p.project_company_has_user_garbage_collection = 0 AND p.project_company_has_user_project_id = '29') AND (LCASE(c.company_country) LIKE '%ge%' OR LCASE(c.company_country) LIKE '%abcde%') AND f.form_question_has_answer_form_id = '174' And the explain plan for the above query is, run on both dev and production produce the same plan. +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+-------------+ | 1 | SIMPLE | p2 | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index | | 1 | SIMPLE | f | ref | form_question_has_answer_form_id,form_question_has_answer_user_id | form_question_has_answer_form_id | 4 | const | 796 | Using where | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.f.form_question_has_answer_user_id | 1 | Using index | | 1 | SIMPLE | p | ref | project_company_has_user_unique_key,project_company_has_user_user_id,project_company_has_user_company_id,project_company_has_user_project_id | project_company_has_user_user_id | 4 | new_klarents.f.form_question_has_answer_user_id | 1 | Using where | | 1 | SIMPLE | f2 | ref | form_project_id | form_project_id | 4 | const | 15 | Using where | | 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.p.project_company_has_user_company_id | 1 | Using where | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+-------------+ This query takes 2 minutes ~20 seconds to execute. The query that is ACTUALLY being run on the server is this one: SELECT COUNT(*) AS num_results FROM (SELECT f.form_question_has_answer_id FROM form_question_has_answer f INNER JOIN project_company_has_user p ON f.form_question_has_answer_user_id = p.project_company_has_user_user_id INNER JOIN company c ON p.project_company_has_user_company_id = c.company_id INNER JOIN project p2 ON p.project_company_has_user_project_id = p2.project_id INNER JOIN user u ON p.project_company_has_user_user_id = u.user_id INNER JOIN form f2 ON p.project_company_has_user_project_id = f2.form_project_id WHERE (f2.form_template_name = 'custom' AND p.project_company_has_user_garbage_collection = 0 AND p.project_company_has_user_project_id = '29') AND (LCASE(c.company_country) LIKE '%ge%' OR LCASE(c.company_country) LIKE '%abcde%') AND f.form_question_has_answer_form_id = '174' GROUP BY f.form_question_has_answer_id;) dctrn_count_query; With explain plans (again same on dev and production): +----+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+------------------------------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | | 2 | DERIVED | p2 | const | PRIMARY | PRIMARY | 4 | | 1 | Using index | | 2 | DERIVED | f | ref | form_question_has_answer_form_id,form_question_has_answer_user_id | form_question_has_answer_form_id | 4 | | 797 | Using where | | 2 | DERIVED | p | ref | project_company_has_user_unique_key,project_company_has_user_user_id,project_company_has_user_company_id,project_company_has_user_project_id,project_company_has_user_garbage_collection | project_company_has_user_user_id | 4 | new_klarents.f.form_question_has_answer_user_id | 1 | Using where | | 2 | DERIVED | f2 | ref | form_project_id | form_project_id | 4 | | 15 | Using where | | 2 | DERIVED | c | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.p.project_company_has_user_company_id | 1 | Using where | | 2 | DERIVED | u | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.p.project_company_has_user_user_id | 1 | Using where; Using index | +----+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+------------------------------+ On the production server the information I have is as follows. Upon execution: +-------------+ | num_results | +-------------+ | 3 | +-------------+ 1 row in set (2 min 14.28 sec) Show profile: +--------------------------------+------------+ | Status | Duration | +--------------------------------+------------+ | starting | 0.000016 | | checking query cache for query | 0.000057 | | Opening tables | 0.004388 | | System lock | 0.000003 | | Table lock | 0.000036 | | init | 0.000030 | | optimizing | 0.000016 | | statistics | 0.000111 | | preparing | 0.000022 | | executing | 0.000004 | | Sorting result | 0.000002 | | Sending data | 136.213836 | | end | 0.000007 | | query end | 0.000002 | | freeing items | 0.004273 | | storing result in query cache | 0.000010 | | logging slow query | 0.000001 | | logging slow query | 0.000002 | | cleaning up | 0.000002 | +--------------------------------+------------+ On development the results are as follows. +-------------+ | num_results | +-------------+ | 3 | +-------------+ 1 row in set (0.08 sec) Again the profile for this query: +--------------------------------+----------+ | Status | Duration | +--------------------------------+----------+ | starting | 0.000022 | | checking query cache for query | 0.000148 | | Opening tables | 0.000025 | | System lock | 0.000008 | | Table lock | 0.000101 | | optimizing | 0.000035 | | statistics | 0.001019 | | preparing | 0.000047 | | executing | 0.000008 | | Sorting result | 0.000005 | | Sending data | 0.086565 | | init | 0.000015 | | optimizing | 0.000006 | | executing | 0.000020 | | end | 0.000004 | | query end | 0.000004 | | freeing items | 0.000028 | | storing result in query cache | 0.000005 | | removing tmp table | 0.000008 | | closing tables | 0.000008 | | logging slow query | 0.000002 | | cleaning up | 0.000005 | +--------------------------------+----------+ If i remove user and/or project innerjoins the query is reduced to 30s. Last bit of information I have: Mysqlserver and Apache are on the same box, there is only one box for production. Production output from top: before & after. top - 15:43:25 up 78 days, 12:11, 4 users, load average: 1.42, 0.99, 0.78 Tasks: 162 total, 2 running, 160 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 50.4%sy, 0.0%ni, 49.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4037868k total, 3772580k used, 265288k free, 243704k buffers Swap: 3905528k total, 265384k used, 3640144k free, 1207944k cached top - 15:44:31 up 78 days, 12:13, 4 users, load average: 1.94, 1.23, 0.87 Tasks: 160 total, 2 running, 157 sleeping, 0 stopped, 1 zombie Cpu(s): 0.2%us, 50.6%sy, 0.0%ni, 49.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4037868k total, 3834300k used, 203568k free, 243736k buffers Swap: 3905528k total, 265384k used, 3640144k free, 1207804k cached But this isn't a good representation of production's normal status so here is a grab of it from today outside of executing the queries. top - 11:04:58 up 79 days, 7:33, 4 users, load average: 0.39, 0.58, 0.76 Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie Cpu(s): 3.3%us, 2.8%sy, 0.0%ni, 93.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4037868k total, 3676136k used, 361732k free, 271480k buffers Swap: 3905528k total, 268736k used, 3636792k free, 1063432k cached Development: This one doesn't change during or after. top - 15:47:07 up 110 days, 22:11, 7 users, load average: 0.17, 0.07, 0.06 Tasks: 210 total, 2 running, 208 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4111972k total, 1821100k used, 2290872k free, 238860k buffers Swap: 4183036k total, 66472k used, 4116564k free, 921072k cached

Read the article
Implementation of ZipCrypto / Zip 2.0 encryption in java

- by gomesla

I'm trying o implement the zipcrypto / zip 2.0 encryption algoritm to deal with encrypted zip files as discussed in http://www.pkware.com/documents/casestudies/APPNOTE.TXT I believe I've followed the specs but just can't seem to get it working. I'm fairly sure the issue has to do with my interpretation of the crc algorithm. The documentation states CRC-32: (4 bytes) The CRC-32 algorithm was generously contributed by David Schwaderer and can be found in his excellent book "C Programmers Guide to NetBIOS" published by Howard W. Sams & Co. Inc. The 'magic number' for the CRC is 0xdebb20e3. The proper CRC pre and post conditioning is used, meaning that the CRC register is pre-conditioned with all ones (a starting value of 0xffffffff) and the value is post-conditioned by taking the one's complement of the CRC residual. Here is the snippet that I'm using for the crc32 public class PKZIPCRC32 { private static final int CRC32_POLYNOMIAL = 0xdebb20e3; private int crc = 0xffffffff; private int CRCTable[]; public PKZIPCRC32() { buildCRCTable(); } private void buildCRCTable() { int i, j; CRCTable = new int[256]; for (i = 0; i <= 255; i++) { crc = i; for (j = 8; j > 0; j--) if ((crc & 1) == 1) crc = (crc >>> 1) ^ CRC32_POLYNOMIAL; else crc >>>= 1; CRCTable[i] = crc; } } private int crc32(byte buffer[], int start, int count, int lastcrc) { int temp1, temp2; int i = start; crc = lastcrc; while (count-- != 0) { temp1 = crc >>> 8; temp2 = CRCTable[(crc ^ buffer[i++]) & 0xFF]; crc = temp1 ^ temp2; } return crc; } public int crc32(int crc, byte buffer) { return crc32(new byte[] { buffer }, 0, 1, crc); } } Below is my complete code. Can anyone see what I'm doing wrong. package org.apache.commons.compress.archivers.zip; import java.io.IOException; import java.io.InputStream; public class ZipCryptoInputStream extends InputStream { public class PKZIPCRC32 { private static final int CRC32_POLYNOMIAL = 0xdebb20e3; private int crc = 0xffffffff; private int CRCTable[]; public PKZIPCRC32() { buildCRCTable(); } private void buildCRCTable() { int i, j; CRCTable = new int[256]; for (i = 0; i <= 255; i++) { crc = i; for (j = 8; j > 0; j--) if ((crc & 1) == 1) crc = (crc >>> 1) ^ CRC32_POLYNOMIAL; else crc >>>= 1; CRCTable[i] = crc; } } private int crc32(byte buffer[], int start, int count, int lastcrc) { int temp1, temp2; int i = start; crc = lastcrc; while (count-- != 0) { temp1 = crc >>> 8; temp2 = CRCTable[(crc ^ buffer[i++]) & 0xFF]; crc = temp1 ^ temp2; } return crc; } public int crc32(int crc, byte buffer) { return crc32(new byte[] { buffer }, 0, 1, crc); } } private static final long ENCRYPTION_KEY_1 = 0x12345678; private static final long ENCRYPTION_KEY_2 = 0x23456789; private static final long ENCRYPTION_KEY_3 = 0x34567890; private InputStream baseInputStream = null; private final PKZIPCRC32 checksumEngine = new PKZIPCRC32(); private long[] keys = null; public ZipCryptoInputStream(ZipArchiveEntry zipEntry, InputStream inputStream, String passwd) throws Exception { baseInputStream = inputStream; // Decryption // ---------- // PKZIP encrypts the compressed data stream. Encrypted files must // be decrypted before they can be extracted. // // Each encrypted file has an extra 12 bytes stored at the start of // the data area defining the encryption header for that file. The // encryption header is originally set to random values, and then // itself encrypted, using three, 32-bit keys. The key values are // initialized using the supplied encryption password. After each byte // is encrypted, the keys are then updated using pseudo-random number // generation techniques in combination with the same CRC-32 algorithm // used in PKZIP and described elsewhere in this document. // // The following is the basic steps required to decrypt a file: // // 1) Initialize the three 32-bit keys with the password. // 2) Read and decrypt the 12-byte encryption header, further // initializing the encryption keys. // 3) Read and decrypt the compressed data stream using the // encryption keys. // Step 1 - Initializing the encryption keys // ----------------------------------------- // // Key(0) <- 305419896 // Key(1) <- 591751049 // Key(2) <- 878082192 // // loop for i <- 0 to length(password)-1 // update_keys(password(i)) // end loop // // Where update_keys() is defined as: // // update_keys(char): // Key(0) <- crc32(key(0),char) // Key(1) <- Key(1) + (Key(0) & 000000ffH) // Key(1) <- Key(1) * 134775813 + 1 // Key(2) <- crc32(key(2),key(1) >> 24) // end update_keys // // Where crc32(old_crc,char) is a routine that given a CRC value and a // character, returns an updated CRC value after applying the CRC-32 // algorithm described elsewhere in this document. keys = new long[] { ENCRYPTION_KEY_1, ENCRYPTION_KEY_2, ENCRYPTION_KEY_3 }; for (int i = 0; i < passwd.length(); ++i) { update_keys((byte) passwd.charAt(i)); } // Step 2 - Decrypting the encryption header // ----------------------------------------- // // The purpose of this step is to further initialize the encryption // keys, based on random data, to render a plaintext attack on the // data ineffective. // // Read the 12-byte encryption header into Buffer, in locations // Buffer(0) thru Buffer(11). // // loop for i <- 0 to 11 // C <- buffer(i) ^ decrypt_byte() // update_keys(C) // buffer(i) <- C // end loop // // Where decrypt_byte() is defined as: // // unsigned char decrypt_byte() // local unsigned short temp // temp <- Key(2) | 2 // decrypt_byte <- (temp * (temp ^ 1)) >> 8 // end decrypt_byte // // After the header is decrypted, the last 1 or 2 bytes in Buffer // should be the high-order word/byte of the CRC for the file being // decrypted, stored in Intel low-byte/high-byte order. Versions of // PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is // used on versions after 2.0. This can be used to test if the password // supplied is correct or not. byte[] encryptionHeader = new byte[12]; baseInputStream.read(encryptionHeader); for (int i = 0; i < encryptionHeader.length; i++) { encryptionHeader[i] ^= decrypt_byte(); update_keys(encryptionHeader[i]); } } protected byte decrypt_byte() { byte temp = (byte) (keys[2] | 2); return (byte) ((temp * (temp ^ 1)) >> 8); } @Override public int read() throws IOException { // // Step 3 - Decrypting the compressed data stream // ---------------------------------------------- // // The compressed data stream can be decrypted as follows: // // loop until done // read a character into C // Temp <- C ^ decrypt_byte() // update_keys(temp) // output Temp // end loop int read = baseInputStream.read(); read ^= decrypt_byte(); update_keys((byte) read); return read; } private final void update_keys(byte ch) { keys[0] = checksumEngine.crc32((int) keys[0], ch); keys[1] = keys[1] + (byte) keys[0]; keys[1] = keys[1] * 134775813 + 1; keys[2] = checksumEngine.crc32((int) keys[2], (byte) (keys[1] >> 24)); } }

Read the article
How to stop an IOException error using whilst using a combination of jython, pyro and ant?

- by Kelso

So the wonderful low down on this doozie of a problem: short version: We are building a distribution system for this item of software we're using. Basically we take out build artifact, store it on an ftp server which passes it to multiple clients which execute scripts to patch their servers. Long version: 1 distribution server multiple client servers software: jython 2.5.1, ant 1.8.0, pyro 3.10 The distribution server has an FTP server and a PYRO client running on it. Each client server has a PRYO server running on it. When the PYRO client is told to start the patch procedure then it reads a machine list which contains a list of all the client servers. Then connects to each of the PYRO servers one by one and execute the patch procedure. The procedure is: getPatch (gets the latest patch for that server), StopServer (stops the software that may or maynot be accessing what needs to be patched), Apply patch, StartServer. Each of the processes calls an ANT script that passes with some folder names and other config passes around. The fun part happens when you go to apply the patch. See below for error log. I had to remove the folder names because of NDA reasons. This is where it gets interesting. Running each section of the procedure individually. i.e. running getPatch, StopServer, etc. one at a time manually. This bug doesn't happen. Physically goign to the machine and running the processes it doesn't happen. Only when we call all 4 of the processes one after the other. It occurs during the ApplyPatch phase when an ANT replace script is called on multiple files. We think it might have something to do with the JVM keeping hold of the file for a split second or 2. however this is meant to have been patched according to the bug notes on ant. so in short: distribution server == jython == pyro connection == client server == jython == ant script Error Log: <*snip>\ant\deploy.xml:12: IOException in <*snip>\bin\startGs.sh - java.io.IOException:Failed to delete <*snip>\bin\rep4698373081723114968.tmp while trying to rename it. at org.apache.tools.ant.taskdefs.Replace.processFile(Replace.java:709) at org.apache.tools.ant.taskdefs.Replace.execute(Replace.java:548) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:398) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441) at org.apache.tools.ant.Extaskdefs.SubAnt.execute(SubAnt.java:302) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:398) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) at net.sf.antcontrib.logic.IfTask.execute(IfTask.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.TaskAdapter.execute(TaskAdapter.java:154) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) it at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:398) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Parallel$TaskRunnable.run(Parallel.java:433) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Failed to delete <*snip\bin\rep4698373081723114968.tmp while trying to rename it. at org.apache.tools.ant.util.FileUtils.rename(FileUtils.java:1248) at org.apache.tools.ant.taskdefs.Replace.processFile(Replace.java:702) ... 125 more Any help would be appreciated.

Read the article
Storage model for various user setting and attributes in database?

- by dvd

I'm currently trying to upgrade a user management system for one web application. This web application is used to provide remote access to various networking equipment for educational purposes. All equipment is assigned to various pods, which users can book for periods of time. The current system is very simple - just 2 user types: administrators and students. All their security and other attributes are mostly hardcoded. I want to change this to something like the following model: user <-- (1..n)profile <--- (1..n) attributes. I.e. user can be assigned several profiles and each profile can have multiple attributes. At runtime all profiles and attributes are merged into single active profile. Some examples of attributes i'm planning to implement: EXPIRATION_DATE - single value, value type: date, specifies when user account will expire; ACCESS_POD - single value, value type: ref to object of Pod class, specifies which pod the user is allowed to book, user profile can have multiple such attributes with different values; TIME_QUOTA - single value, value type: integer, specifies maximum length of time for which student can reserve equipment. CREDIT_CHARGING - multi valued, specifies how much credits will be assigned to user over period of time. (Reservation of devices will cost credits, which will regenerate over time); Security permissions and user preferences can end up as profile or user attributes too: i.e CAN_CREATE_USERS, CAN_POST_NEWS, CAN_EDIT_DEVICES, FONT_SIZE, etc.. This way i could have, for example: students of course A will have profiles STUDENT (with basic attributes) and PROFILE A (wich grants acces to pod A). Students of course B will have profiles: STUDENT, PROFILE B(wich grants to pod B and have increased time quotas). I'm using Spring and Hibernate frameworks for this application and MySQL for database. For this web application i would like to stay within boundaries of these tools. The problem is, that i can't figure out how to best represent all these attributes in database. I also want to create some kind of unified way of retrieveing these attributes and their values. Here is the model i've come up with. Base classes. public abstract class Attribute{ private Long id; Attribute() {} abstract public String getName(); public Long getId() {return id; } void setId(Long id) {this.id = id;} } public abstract class SimpleAttribute extends Attribute{ public abstract Serializable getValue(); abstract void setValue(Serializable s); @Override public boolean equals(Object obj) { ... } @Override public int hashCode() { ... } } Simple attributes can have only one value of any type (including object and enum). Here are more specific attributes: public abstract class IntAttribute extends SimpleAttribute { private Integer value; public Integer getValue() { return value; } void setValue(Integer value) { this.value = value;} void setValue(Serializable s) { setValue((Integer)s); } } public class MaxOrdersAttribute extends IntAttribute { public String getName() { return "Maximum outstanding orders"; } } public final class CreditRateAttribute extends IntAttribute { public String getName() { return "Credit Regeneration Rate"; } } All attributes stored stored using Hibenate variant "table per class hierarchy". Mapping: <class name="ru.mirea.rea.model.abac.Attribute" table="ATTRIBUTES" abstract="true" > <id name="id" column="id"> <generator class="increment" /> </id> <discriminator column="attributeType" type="string"/> <subclass name="ru.mirea.rea.model.abac.SimpleAttribute" abstract="true"> <subclass name="ru.mirea.rea.model.abac.IntAttribute" abstract="true" > <property name="value" column="intVal" type="integer"/> <subclass name="ru.mirea.rea.model.abac.CreditRateAttribute" discriminator-value="CreditRate" /> <subclass name="ru.mirea.rea.model.abac.MaxOrdersAttribute" discriminator-value="MaxOrders" /> </subclass> <subclass name="ru.mirea.rea.model.abac.DateAttribute" abstract="true" > <property name="value" column="dateVal" type="timestamp"/> <subclass name="ru.mirea.rea.model.abac.ExpirationDateAttribute" discriminator-value="ExpirationDate" /> </subclass> <subclass name="ru.mirea.rea.model.abac.PodAttribute" abstract="true" > <many-to-one name="value" column="podVal" class="ru.mirea.rea.model.pods.Pod"/> <subclass name="ru.mirea.rea.model.abac.PodAccessAttribute" discriminator-value="PodAccess" lazy="false"/> </subclass> <subclass name="ru.mirea.rea.model.abac.SecurityPermissionAttribute" discriminator-value="SecurityPermission" lazy="false"> <property name="value" column="spVal" type="ru.mirea.rea.db.hibernate.customTypes.SecurityPermissionType"/> </subclass> </subclass> </class> SecurityPermissionAttribute uses enumeration of various permissions as it's value. Several types of attributes imlement GrantedAuthority interface and can be used with Spring Security for authentication and authorization. Attributes can be created like this: public final class AttributeManager { public <T extends SimpleAttribute> T createSimpleAttribute(Class<T> c, Serializable value) { Session session = HibernateUtil.getCurrentSession(); T att = null; ... att = c.newInstance(); att.setValue(value); session.save(att); session.flush(); ... return att; } public <T extends SimpleAttribute> List<T> findSimpleAttributes(Class<T> c) { List<T> result = new ArrayList<T>(); Session session = HibernateUtil.getCurrentSession(); List<T> temp = session.createCriteria(c).list(); result.addAll(temp); return result; } } And retrieved through User Profiles to which they are assigned. I do not expect that there would be very large amount of rows in the ATTRIBUTES table, but are there any serious drawbacks of such design?

Read the article
Out of memory exception during scrolling of listview

- by user1761316

I am using facebook data like postedpicture,profile picture,name,message in my listview.I am getting an OOM error while doing fast scrolling of listview. I am also having scrollviewlistener in my application that loads more data when the scrollbar reaches the bottom of the screen.I just want to know whether I need to change anything in this class. imageLoader.DisplayImage(postimage.get(position).replace(" ", "%20"), postimg) ; I am using the above line to call the method in this imageloader class to set the bitmap to imageview. Here is my imageloader class import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; import java.util.Collections; import java.util.Map; import java.util.Stack; import java.util.WeakHashMap; import com.stellent.beerbro.Wall; import android.app.Activity; import android.content.Context; import android.graphics.Bitmap; import android.graphics.Bitmap.Config; import android.graphics.BitmapFactory; import android.graphics.Canvas; import android.graphics.Paint; import android.graphics.PorterDuff.Mode; import android.graphics.PorterDuffXfermode; import android.graphics.Rect; import android.graphics.RectF; import android.graphics.drawable.BitmapDrawable; import android.util.Log; import android.widget.ImageView; public class ImageLoader { MemoryCache memoryCache=new MemoryCache(); FileCache fileCache; private Map<ImageView, String> imageViews=Collections.synchronizedMap(new WeakHashMap<ImageView, String>()); public ImageLoader(Context context){ //Make the background thead low priority. This way it will not affect the UI performance photoLoaderThread.setPriority(Thread.NORM_PRIORITY-1); fileCache=new FileCache(context); } // final int stub_id=R.drawable.stub; public void DisplayImage(String url,ImageView imageView) { imageViews.put(imageView, url); System.gc(); // Bitmap bitmap=createScaledBitmap(memoryCache.get(url), 100,100,0); Bitmap bitmap=memoryCache.get(url); // Bitmap bitmaps=bitmap.createScaledBitmap(bitmap, 0, 100, 100); if(bitmap!=null) { imageView.setBackgroundDrawable(new BitmapDrawable(bitmap)); // imageView.setImageBitmap(getRoundedCornerBitmap( bitmap, 10,70,70)); // imageView.setImageBitmap(bitmap); // Log.v("first", "first"); } else { queuePhoto(url, imageView); // Log.v("second", "second"); } } private Bitmap createScaledBitmap(Bitmap bitmap, int i, int j, int k) { // TODO Auto-generated method stub return null; } private void queuePhoto(String url, ImageView imageView) { //This ImageView may be used for other images before. So there may be some old tasks in the queue. We need to discard them. photosQueue.Clean(imageView); PhotoToLoad p=new PhotoToLoad(url, imageView); synchronized(photosQueue.photosToLoad){ photosQueue.photosToLoad.push(p); photosQueue.photosToLoad.notifyAll(); } //start thread if it's not started yet if(photoLoaderThread.getState()==Thread.State.NEW) photoLoaderThread.start(); } public Bitmap getBitmap(String url) { File f=fileCache.getFile(url); //from SD cache Bitmap b = decodeFile(f); if(b!=null) return b; //from web try { Bitmap bitmap=null; URL imageUrl = new URL(url); HttpURLConnection conn = (HttpURLConnection)imageUrl.openConnection(); conn.setConnectTimeout(30000); conn.setReadTimeout(30000); InputStream is=conn.getInputStream(); OutputStream os = new FileOutputStream(f); Utils.CopyStream(is, os); os.close(); bitmap = decodeFile(f); return bitmap; } catch (Exception ex){ ex.printStackTrace(); return null; } }//Lalit //decodes image and scales it to reduce memory consumption private Bitmap decodeFile(File f){ try { //decode image size BitmapFactory.Options o = new BitmapFactory.Options(); o.inJustDecodeBounds = true; BitmapFactory.decodeStream(new FileInputStream(f),null,o); //Find the correct scale value. It should be the power of 2. final int REQUIRED_SIZE=Wall.width; final int REQUIRED_SIZE1=Wall.height; // final int REQUIRED_SIZE=250; // int width_tmp=o.outWidth, height_tmp=o.outHeight; int scale=1; // while(o.outWidth/scale/2>=REQUIRED_SIZE && o.outHeight/scale/2>=REQUIRED_SIZE) //// scale*=2; while(true){ if(width_tmp/2<REQUIRED_SIZE && height_tmp/2<REQUIRED_SIZE1) break; width_tmp/=2; height_tmp/=2; scale*=2; } //decode with inSampleSize BitmapFactory.Options o2 = new BitmapFactory.Options(); o2.inSampleSize=scale; // o2.inSampleSize=2; return BitmapFactory.decodeStream(new FileInputStream(f), null, o2); } catch (FileNotFoundException e) {} return null; } //Task for the queue private class PhotoToLoad { public String url; public ImageView imageView; public PhotoToLoad(String u, ImageView i){ url=u; imageView=i; } } PhotosQueue photosQueue=new PhotosQueue(); public void stopThread() { photoLoaderThread.interrupt(); } //stores list of photos to download class PhotosQueue { private Stack<PhotoToLoad> photosToLoad=new Stack<PhotoToLoad>(); //removes all instances of this ImageView public void Clean(ImageView image) { for(int j=0 ;j<photosToLoad.size();){ if(photosToLoad.get(j).imageView==image) photosToLoad.remove(j); else ++j; } } } class PhotosLoader extends Thread { public void run() { try { while(true) { //thread waits until there are any images to load in the queue if(photosQueue.photosToLoad.size()==0) synchronized(photosQueue.photosToLoad){ photosQueue.photosToLoad.wait(); } if(photosQueue.photosToLoad.size()!=0) { PhotoToLoad photoToLoad; synchronized(photosQueue.photosToLoad){ photoToLoad=photosQueue.photosToLoad.pop(); } Bitmap bmp=getBitmap(photoToLoad.url); memoryCache.put(photoToLoad.url, bmp); String tag=imageViews.get(photoToLoad.imageView); if(tag!=null && tag.equals(photoToLoad.url)){ BitmapDisplayer bd=new BitmapDisplayer(bmp, photoToLoad.imageView); Activity a=(Activity)photoToLoad.imageView.getContext(); a.runOnUiThread(bd); } } if(Thread.interrupted()) break; } } catch (InterruptedException e) { //allow thread to exit } } } PhotosLoader photoLoaderThread=new PhotosLoader(); //Used to display bitmap in the UI thread class BitmapDisplayer implements Runnable { Bitmap bitmap; ImageView imageView; public BitmapDisplayer(Bitmap b, ImageView i){bitmap=b;imageView=i;} public void run() { if(bitmap!=null) imageView.setBackgroundDrawable(new BitmapDrawable(bitmap)); } } public void clearCache() { memoryCache.clear(); fileCache.clear(); } public static Bitmap getRoundedCornerBitmap(Bitmap bitmap, int pixels,int width,int height) { Bitmap output = Bitmap.createBitmap(width,height, Config.ARGB_8888); Canvas canvas = new Canvas(output); final int color = 0xff424242; final Paint paint = new Paint(); final Rect rect = new Rect(0, 0, bitmap.getWidth(), bitmap.getHeight()); final RectF rectF = new RectF(rect); final float roundPx = pixels; paint.setAntiAlias(true); canvas.drawARGB(0, 0, 0, 0); paint.setColor(color); canvas.drawRoundRect(rectF, roundPx, roundPx, paint); paint.setXfermode(new PorterDuffXfermode(Mode.SRC_IN)); canvas.drawBitmap(bitmap, rect, rect, paint); return output; } }

Read the article
WCF timeout exception detailed investigation

- by Jason Kealey

We have an application that has a WCF service (*.svc) running on IIS7 and various clients querying the service. The server is running Win 2008 Server. The clients are running either Windows 2008 Server or Windows 2003 server. I am getting the following exception, which I have seen can in fact be related to a large number of potential WCF issues. System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:59.9320000. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://www.domain.com/WebServices/myservice.svc/gzip' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. I have increased the timeout to 30min and the error still occurred. This tells me that something else is at play, because the quantity of data could never take 30min to upload or download. The error comes and goes. At the moment, it is more frequent. It does not seem to matter if I have 3 clients running simultaneously or 100, it still occurs once in a while. Most of the time, there are no timeouts but I still get a few per hour. The error comes from any of the methods that are invoked. One of these methods does not have parameters and returns a bit of data. Another takes in lots of data as a parameter but executes asynchronously. The errors always originate from the client and never reference any code on the server in the stack trace. It always ends with: at System.Net.HttpWebRequest.GetResponse() at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) On the server: I've tried (and currently have) the following binding settings: maxBufferSize="2147483647" maxReceivedMessageSize="2147483647" maxBufferPoolSize="2147483647" It does not seem to have an impact. I've tried (and currently have) the following throttling settings: <serviceThrottling maxConcurrentCalls="1500" maxConcurrentInstances="1500" maxConcurrentSessions="1500"/> It does not seem to have an impact. I currently have the following settings for the WCF service. [ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Single)] I ran with ConcurrencyMode.Multiple for a while, and the error still occurred. I've tried restarting IIS, restarting my underlying SQL Server, restarting the machine. All of these don't seem to have an impact. I've tried disabling the Windows firewall. It does not seem to have an impact. On the client, I have these settings: maxReceivedMessageSize="2147483647" <system.net> <connectionManagement> <add address="*" maxconnection="16"/> </connectionManagement> </system.net> My client closes its connections: var client = new MyClient(); try { return client.GetConfigurationOptions(); } finally { client.Close(); } I have changed the registry settings to allow more outgoing connections: MaxConnectionsPerServer=24, MaxConnectionsPer1_0Server=32. I have now just recently tried SvcTraceViewer.exe. I managed to catch one exception on the client end. I see that its duration is 1 minute. Looking at the server side trace, I can see that the server is not aware of this exception. The maximum duration I can see is 10 seconds. I have looked at active database connections using exec sp_who on the server. I only have a few (2-3). I have looked at TCP connections from one client using TCPview. It usually is around 2-3 and I have seen up to 5 or 6. Simply put, I am stumped. I have tried everything I could find, and must be missing something very simple that a WCF expert would be able to see. It is my gut feeling that something is blocking my clients at the low-level (TCP), before the server actually receives the message and/or that something is queuing the messages at the server level and never letting them process. If you have any performance counters I should look at, please let me know. (please indicate what values are bad, as some of these counters are hard to decypher). Also, how could I log the WCF message size? Finally, are there any tools our there that would allow me to test how many connections I can establish between my client and server (independently from my application) Thanks for your time! Extra information added June 20th: My WCF application does something similar to the following. while (true) { Step1GetConfigurationSettingsFromServerViaWCF(); // can change between calls Step2GetWorkUnitFromServerViaWCF(); DoWorkLocally(); // takes 5-15minutes. Step3SendBackResultsToServerViaWCF(); } Using WireShark, I did see that when the error occurs, I have a five TCP retransmissions followed by a TCP reset later on. My guess is the RST is coming from WCF killing the connection. The exception report I get is from Step3 timing out. I discovered this by looking at the tcp stream "tcp.stream eq 192". I then expanded my filter to "tcp.stream eq 192 and http and http.request.method eq POST" and saw 6 POSTs during this stream. This seemed odd, so I checked with another stream such as tcp.stream eq 100. I had three POSTs, which seems a bit more normal because I am doing three calls. However, I do close my connection after every WCF call, so I would have expected one call per stream (but I don't know much about TCP). Investigating a bit more, I dumped the http packet load to disk to look at what these six calls where. 1) Step3 2) Step1 3) Step2 4) Step3 - corrupted 5) Step1 6) Step2 My guess is two concurrent clients are using the same connection, that is why I saw duplicates. However, I still have a few more issues that I can't comprehend: a) Why is the packet corrupted? Random network fluke - maybe? The load is gzipped using this sample code: http://msdn.microsoft.com/en-us/library/ms751458.aspx - Could the code be buggy once in a while when used concurrently? I should test without the gzip library. b) Why would I see step 1 & step 2 running AFTER the corrupted operation timed out? It seems to me as if these operations should not have occurred. Maybe I am not looking at the right stream because my understanding of TCP is flawed. I have other streams that occur at the same time. I should investigate other streams - a quick glance at streams 190-194 show that the Step3 POST have proper payload data (not corrupted). Pushing me to look at the gzip library again.

Read the article
How do I do event handling in php with html?

- by TheAmazingKnight

I am constructing a simple quiz using html and php. The problem I'm having is that I'm not sure how to do event handlers since it's my first time doing php. Simply, the user click on Take Quiz and it brings up the quiz, then submit the quiz using the same page and show score results. The quiz was written in HTML as shown below: HTML CODE: <section> <h1 id="theme"><span class="initial">Q</span>uiz</h1> <div id="message">Why not go ahead and take the quiz to test your knowledge based on what you've learned in Smartphone Photography. There are only 5 questions surrounding the content of this site. Good Luck! :) <br/> <button type="button" name="name" onclick="takeQuiz()">Take Quiz!</button> </div> </section> </div> <form action="grade.php" method="post" id="quiz">  <h3>1. How many percent of modern camera phones use CMOS?</h3> <div> <input type="radio" name="question-1-answers" id="question-1-answers-A" value="A" /> <label for="question-1-answers-A">A) 20%</label> <br/> <input type="radio" name="question-1-answers" id="question-1-answers-B" value="B" /> <label for="question-1-answers-B">B) 80%</label> <br/> <input type="radio" name="question-1-answers" id="question-1-answers-C" value="C" /> <label for="question-1-answers-C">C) 50%</label> <br/> <input type="radio" name="question-1-answers" id="question-1-answers-D" value="D" /> <label for="question-1-answers-D">D) 90%</label> </div>  <h3>2. Which type of camera setting(s) is best for greater control and flexibility in terms of focusing on a subject?</h3> <div> <input type="radio" name="question-2-answers" id="question-2-answers-A" value="A" /> <label for="question-2-answers-A">A) Manual Focus</label> <br/> <input type="radio" name="question-2-answers" id="question-2-answers-B" value="B" /> <label for="question-2-answers-B">B) Auto Focus</label> <br/> <input type="radio" name="question-2-answers" id="question-2-answers-C" value="C" /> <label for="question-2-answers-C">C) Both A and B</label> <br/> <input type="radio" name="question-2-answers" id="question-2-answers-D" value="D" /> <label for="question-2-answers-D">D) Neither</label> </div>  <h3>3. What are the three properties included in an exposure triangle?</h3> <div> <input type="radio" name="question-3-answers" id="question-3-answers-A" value="A" /> <label for="question-3-answers-A">A) White Balance, ISO, Low Light</label> <br/> <input type="radio" name="question-3-answers" id="question-3-answers-B" value="B" /> <label for="question-3-answers-B">B) Shutter Speed, Exposure, ISO</label> <br/> <input type="radio" name="question-3-answers" id="question-3-answers-C" value="C" /> <label for="question-3-answers-C">C) Aperture, ISO, Exposure</label> <br/> <input type="radio" name="question-3-answers" id="question-3-answers-D" value="D" /> <label for="question-3-answers-D">D) ISO, Aperture, Shutter Speed</label> </div>  <h3>4. The higher the ISO, the more noise it produces in an image.</h3> <div> <input type="radio" name="question-4-answers" id="question-4-answers-A" value="A" /> <label for="question-4-answers-A">A) True</label> <br/> <input type="radio" name="question-4-answers" id="question-4-answers-B" value="B" /> <label for="question-4-answers-B">B) False</label> </div>  <h3>5. What is the name of the smartphone you've seen all over this site?</h3> <div> <input type="radio" name="question-5-answers" id="question-5-answers-A" value="A" /> <label for="question-5-answers-A">A) Nokia Pureview 808</label> <br/> <input type="radio" name="question-5-answers" id="question-5-answers-B" value="B" /> <label for="question-5-answers-B">B) Nokia Lumia 1020</label> <br/> <input type="radio" name="question-5-answers" id="question-5-answers-C" value="C" /> <label for="question-5-answers-C">C) Nokia Lumia 925</label> <br/> <input type="radio" name="question-5-answers" id="question-5-answers-D" value="D" /> <label for="question-5-answers-D">D) Nokia Lumia 920</label> </div> <input type="submit" value="Submit Quiz" /> </form> Then I have a php file named grade.php which will print the results of the quiz grade. PHP CODE: <?php // create variables linking the questions' answers from the form $answer1 = $_POST['question-1-answers']; $answer2 = $_POST['question-2-answers']; $answer3 = $_POST['question-3-answers']; $answer4 = $_POST['question-4-answers']; $answer5 = $_POST['question-5-answers']; $totalCorrect = 0; // Set up if-statements and determine the correct answers to be POSTed if($answer1 == "D") { $totalCorrect++; } if($answer2 == "A") { $totalCorrect++; } if($answer3 == "D") { $totalCorrect++; } if($answer4 == "A") { $totalCorrect++; } if($answer5 == "B") { $totalCorrect++; } //display the results echo "<div id='results'>$totalCorrect / 5 correct</div>"; ?>

Read the article

Search Results

Search found 7216 results on 289 pages for 'low cost'.

Page 287/289 | < Previous Page | 283 284 285 286 287 288 289 | Next Page >

- by Rick Strahl

- by Simon Thorpe

- by FransBouma

- by Ali Bahrami

- by A Student at a University

- by Rick Strahl

- by Michael B. McLaughlin

- by user9154181

- by Ben Rees

- by Ralf Westphal

- by user12620111

- by Chris R

- by SaveTheRbtz

- by John Stumbles

- by Lawrence

- by petermolnar

- by Timo Huovinen

- by mr12086

- by gomesla

- by Kelso

- by dvd

- by user1761316

- by Jason Kealey

- by TheAmazingKnight

< Previous Page | 283 284 285 286 287 288 289 | Next Page >