Search Results

Search found 2736 results on 110 pages for 'semantic meaning'.

Page 110/110 | < Previous Page | 106 107 108 109 110

JBoss https on port other than 8080 not working

- by MilindaD

We have a server with two JBoss instances where one runs on 8080, the other on 8081. We need to have HTTPS enabled for the 8081 server, firstly we tried enabling https on the 8080 port instance by generating the keystore and editing the server.xml and it successfully worked. However when we tried the same thing for 8081 it did not, note that we removed https for the 8080 server first before enabling it for 8081. This is what was used for both server.xml for 8080 and 8081. The only difference was that the port was changed from 8080 to 8081 when trying to enable https for 8081 port instance. What am I doing wrong and what needs to be changed? NOTE : When I meant enabled for 8080 I meant when you visit https:// URL:8484 you will actually be visiting the 8080 port instance. However when ssl is enabled for 8081 and I visit https:// URL:8484 I get that the web page is unavailable. COMMENTLESS VERSION <Server> <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" /> <Listener className="org.apache.catalina.core.JasperListener" /> <Service name="jboss.web">  <Connector port="8080" address="${jboss.bind.address}" maxThreads="350" maxHttpHeaderSize="8192" emptySessionPath="true" protocol="HTTP/1.1" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" compression="on" ompressableMimeType="text/html,text/css,text/javascript,application/json,text/xml,text/plain,application/x-javascript,application/javascript"/> <Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true" maxThreads="150" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS" address="${jboss.bind.address}" keystoreFile="${jboss.server.home.dir}/conf/supun1.keystore" keystorePass="aaaaaa" truststoreFile="${jboss.server.home.dir}/conf/supun1.keystore" truststorePass="aaaaaa" />  <Connector port="8009" address="${jboss.bind.address}" protocol="AJP/1.3" emptySessionPath="true" enableLookups="false" redirectPort="8443" /> <Engine name="jboss.web" defaultHost="localhost" jvmRoute="khms1"> <Realm className="org.jboss.web.tomcat.security.JBossSecurityMgrRealm" certificatePrincipal="org.jboss.security.auth.certs.SubjectDNMapping" allRolesMode="authOnly" /> <Host name="localhost" autoDeploy="false" deployOnStartup="false" deployXML="false" configClass="org.jboss.web.tomcat.security.config.JBossContextConfig" > <Valve className="org.jboss.web.tomcat.service.sso.ClusteredSingleSignOn" /> <Valve className="org.jboss.web.tomcat.service.jca.CachedConnectionValve" cachedConnectionManagerObjectName="jboss.jca:service=CachedConnectionManager" transactionManagerObjectName="jboss:service=TransactionManager" /> </Host> </Engine> </Service> </Server> WITH COMMENTS VERSION <Server>  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />  <Listener className="org.apache.catalina.core.JasperListener" />  <Service name="jboss.web">  <Connector port="8080" address="${jboss.bind.address}" maxThreads="350" maxHttpHeaderSize="8192" emptySessionPath="true" protocol="HTTP/1.1" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" compression="on" ompressableMimeType="text/html,text/css,text/javascript,application/json,text/xml,text/plain,application/x-javascript,application/javascript"/>   <Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true" maxThreads="150" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS" address="${jboss.bind.address}" keystoreFile="${jboss.server.home.dir}/conf/supun1.keystore" keystorePass="aaaaaa" truststoreFile="${jboss.server.home.dir}/conf/supun1.keystore" truststorePass="aaaaaa" />  <Connector port="8009" address="${jboss.bind.address}" protocol="AJP/1.3" emptySessionPath="true" enableLookups="false" redirectPort="8443" /> <Engine name="jboss.web" defaultHost="localhost" jvmRoute="khms1">  <Realm className="org.jboss.web.tomcat.security.JBossSecurityMgrRealm" certificatePrincipal="org.jboss.security.auth.certs.SubjectDNMapping" allRolesMode="authOnly" />  <Host name="localhost" autoDeploy="false" deployOnStartup="false" deployXML="false" configClass="org.jboss.web.tomcat.security.config.JBossContextConfig" >        <Valve className="org.jboss.web.tomcat.service.sso.ClusteredSingleSignOn" />  <Valve className="org.jboss.web.tomcat.service.jca.CachedConnectionValve" cachedConnectionManagerObjectName="jboss.jca:service=CachedConnectionManager" transactionManagerObjectName="jboss:service=TransactionManager" /> </Host> </Engine> </Service> </Server>

Read the article
Tip/Trick: Fix Common SEO Problems Using the URL Rewrite Extension

- by ScottGu

Search engine optimization (SEO) is important for any publically facing web-site. A large % of traffic to sites now comes directly from search engines, and improving your site’s search relevancy will lead to more users visiting your site from search engine queries. This can directly or indirectly increase the money you make through your site. This blog post covers how you can use the free Microsoft URL Rewrite Extension to fix a bunch of common SEO problems that your site might have. It takes less than 15 minutes (and no code changes) to apply 4 simple URL Rewrite rules to your site, and in doing so cause search engines to drive more visitors and traffic to your site. The techniques below work equally well with both ASP.NET Web Forms and ASP.NET MVC based sites. They also works with all versions of ASP.NET (and even work with non-ASP.NET content). [In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu] Measuring the SEO of your website with the Microsoft SEO Toolkit A few months ago I blogged about the free SEO Toolkit that we’ve shipped. This useful tool enables you to automatically crawl/scan your site for SEO correctness, and it then flags any SEO issues it finds. I highly recommend downloading and using the tool against any public site you work on. It makes it easy to spot SEO issues you might have in your site, and pinpoint ways to optimize it further. Below is a simple example of a report I ran against one of my sites (www.scottgu.com) prior to applying the URL Rewrite rules I’ll cover later in this blog post: Search Relevancy and URL Splitting Two of the important things that search engines evaluate when assessing your site’s “search relevancy” are: How many other sites link to your content. Search engines assume that if a lot of people around the web are linking to your content, then it is likely useful and so weight it higher in relevancy. The uniqueness of the content it finds on your site. If search engines find that the content is duplicated in multiple places around the Internet (or on multiple URLs on your site) then it is likely to drop the relevancy of the content. One of the things you want to be very careful to avoid when building public facing sites is to not allow different URLs to retrieve the same content within your site. Doing so will hurt with both of the situations above. In particular, allowing external sites to link to the same content with multiple URLs will cause your link-count and page-ranking to be split up across those different URLs (and so give you a smaller page rank than what it would otherwise be if it was just one URL). Not allowing external sites to link to you in different ways sounds easy in theory – but you might wonder what exactly this means in practice and how you avoid it. 4 Really Common SEO Problems Your Sites Might Have Below are 4 really common scenarios that can cause your site to inadvertently expose multiple URLs for the same content. When this happens external sites linking to yours will end up splitting their page links across multiple URLs - and as a result cause you to have a lower page ranking with search engines than you deserve. SEO Problem #1: Default Document IIS (and other web servers) supports the concept of a “default document”. This allows you to avoid having to explicitly specify the page you want to serve at either the root of the web-site/application, or within a sub-directory. This is convenient – but means that by default this content is available via two different publically exposed URLs (which is bad). For example: http://scottgu.com/ http://scottgu.com/default.aspx SEO Problem #2: Different URL Casings Web developers often don’t realize URLs are case sensitive to search engines on the web. This means that search engines will treat the following links as two completely different URLs: http://scottgu.com/Albums.aspx http://scottgu.com/albums.aspx SEO Problem #3: Trailing Slashes Consider the below two URLs – they might look the same at first, but they are subtly different. The trailing slash creates yet another situation that causes search engines to treat the URLs as different and so split search rankings: http://scottgu.com http://scottgu.com/ SEO Problem #4: Canonical Host Names Sometimes sites support scenarios where they support a web-site with both a leading “www” hostname prefix as well as just the hostname itself. This causes search engines to treat the URLs as different and split search rankling: http://scottgu.com/albums.aspx/ http://www.scottgu.com/albums.aspx/ How to Easily Fix these SEO Problems in 10 minutes (or less) using IIS Rewrite If you haven’t been careful when coding your sites, chances are you are suffering from one (or more) of the above SEO problems. Addressing these issues will improve your search engine relevancy ranking and drive more traffic to your site. The “good news” is that fixing the above 4 issues is really easy using the URL Rewrite Extension. This is a completely free Microsoft extension available for IIS 7.x (on Windows Server 2008, Windows Server 2008 R2, Windows 7 and Windows Vista). The great thing about using the IIS Rewrite extension is that it allows you to fix the above problems *without* having to change any code within your applications. You can easily install the URL Rewrite Extension in under 3 minutes using the Microsoft Web Platform Installer (a free tool we ship that automates setting up web servers and development machines). Just click the green “Install Now” button on the URL Rewrite Spotlight page to install it on your Windows Server 2008, Windows 7 or Windows Vista machine: Once installed you’ll find that a new “URL Rewrite” icon is available within the IIS 7 Admin Tool: Double-clicking the icon will open up the URL Rewrite admin panel – which will display the list of URL Rewrite rules configured for a particular application or site: Notice that our rewrite rule list above is currently empty (which is the default when you first install the extension). We can click the “Add Rule…” link button in the top-right of the panel to add and enable new URL Rewriting logic for our site. Scenario 1: Handling Default Document Scenarios One of the SEO problems I discussed earlier in this post was the scenario where the “default document” feature of IIS causes you to inadvertently expose two URLs for the same content on your site. For example: http://scottgu.com/ http://scottgu.com/default.aspx We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the second URL to instead go to the first one. We will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. Let’s look at how we can create such a rule. We’ll begin by clicking the “Add Rule” link in the screenshot above. This will cause the below dialog to display: We’ll select the “Blank Rule” template within the “Inbound rules” section to create a new custom URL Rewriting rule. This will display an empty pane like below: Don’t worry – setting up the above rule is easy. The following 4 steps explain how to do so: Step 1: Name the Rule Our first step will be to name the rule we are creating. Naming it with a descriptive name will make it easier to find and understand later. Let’s name this rule our “Default Document URL Rewrite” rule: Step 2: Setup the Regular Expression that Matches this Rule Our second step will be to specify a regular expression filter that will cause this rule to execute when an incoming URL matches the regex pattern. Don’t worry if you aren’t good with regular expressions - I suck at them too. The trick is to know someone who is good at them or copy/paste them from a web-site. Below we are going to specify the following regular expression as our pattern rule: (.*?)/?Default\.aspx$ This pattern will match any URL string that ends with Default.aspx. The "(.*?)" matches any preceding character zero or more times. The "/?" part says to match the slash symbol zero or one times. The "$" symbol at the end will ensure that the pattern will only match strings that end with Default.aspx. Combining all these regex elements allows this rule to work not only for the root of your web site (e.g. http://scottgu.com/default.aspx) but also for any application or subdirectory within the site (e.g. http://scottgu.com/photos/default.aspx. Because the “ignore case” checkbox is selected it will match both “Default.aspx” as well as “default.aspx” within the URL. One nice feature built-into the rule editor is a “Test pattern” button that you can click to bring up a dialog that allows you to test out a few URLs with the rule you are configuring: Above I've added a “products/default.aspx” URL and clicked the “Test” button. This will give me immediate feedback on whether the rule will execute for it. Step 3: Setup a Permanent Redirect Action We’ll then setup an action to occur when our regular expression pattern matches the incoming URL: In the dialog above I’ve changed the “Action Type” drop down to be a “Redirect” action. The “Redirect Type” will be a HTTP 301 Permanent redirect – which means search engines will follow it. I’ve also set the “Redirect URL” property to be: {R:1}/ This indicates that we want to redirect the web client requesting the original URL to a new URL that has the originally requested URL path - minus the "Default.aspx" in it. For example, requests for http://scottgu.com/default.aspx will be redirected to http://scottgu.com/, and requests for http://scottgu.com/photos/default.aspx will be redirected to http://scottgu.com/photos/ The "{R:N}" regex construct, where N >= 0, is called a back-reference and N is the back-reference index. In the case of our pattern "(.*?)/?Default\.aspx$", if the input URL is "products/Default.aspx" then {R:0} will contain "products/Default.aspx" and {R:1} will contain "products". We are going to use this {R:1}/ value to be the URL we redirect users to. Step 4: Apply and Save the Rule Our final step is to click the “Apply” button in the top right hand of the IIS admin tool – which will cause the tool to persist the URL Rewrite rule into our application’s root web.config file (under a <system.webServer/rewrite> configuration section): <configuration> <system.webServer> <rewrite> <rules> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Because IIS 7.x and ASP.NET share the same web.config files, you can actually just copy/paste the above code into your web.config files using Visual Studio and skip the need to run the admin tool entirely. This also makes adding/deploying URL Rewrite rules with your ASP.NET applications really easy. Step 5: Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://scottgu.com/ http://scottgu.com/default.aspx Notice that the second URL automatically redirects to the first one. Because it is a permanent redirect, search engines will follow the URL and should update the page ranking of http://scottgu.com to include links to http://scottgu.com/default.aspx as well. Scenario 2: Different URL Casing Another common SEO problem I discussed earlier in this post is that URLs are case sensitive to search engines on the web. This means that search engines will treat the following links as two completely different URLs: http://scottgu.com/Albums.aspx http://scottgu.com/albums.aspx We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the first URL to instead go to the second (all lower-case) one. Like before, we will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. To create such a rule we’ll click the “Add Rule” link in the URL Rewrite admin tool again. This will cause the “Add Rule” dialog to appear again: Unlike the previous scenario (where we created a “Blank Rule”), with this scenario we can take advantage of a built-in “Enforce lowercase URLs” rule template. When we click the “ok” button we’ll see the following dialog which asks us if we want to create a rule that enforces the use of lowercase letters in URLs: When we click the “Yes” button we’ll get a pre-written rule that automatically performs a permanent redirect if an incoming URL has upper-case characters in it – and automatically send users to a lower-case version of the URL: We can click the “Apply” button to use this rule “as-is” and have it apply to all incoming URLs to our site. Because my www.scottgu.com site uses ASP.NET Web Forms, I’m going to make one small change to the rule we generated above – which is to add a condition that will ensure that URLs to ASP.NET’s built-in “WebResource.axd” handler are excluded from our case-sensitivity URL Rewrite logic. URLs to the WebResource.axd handler will only come from server-controls emitted from my pages – and will never be linked to from external sites. While my site will continue to function fine if we redirect these URLs to automatically be lower-case – doing so isn’t necessary and will add an extra HTTP redirect to many of my pages. The good news is that adding a condition that prevents my URL Rewriting rule from happening with certain URLs is easy. We simply need to expand the “Conditions” section of the form above We can then click the “Add” button to add a condition clause. This will bring up the “Add Condition” dialog: Above I’ve entered {URL} as the Condition input – and said that this rule should only execute if the URL does not match a regex pattern which contains the string “WebResource.axd”. This will ensure that WebResource.axd URLs to my site will be allowed to execute just fine without having the URL be re-written to be all lower-case. Note: If you have static resources (like references to .jpg, .css, and .js files) within your site that currently use upper-case characters you’ll probably want to add additional condition filter clauses so that URLs to them also don’t get redirected to be lower-case (just add rules for patterns like .jpg, .gif, .js, etc). Your site will continue to work fine if these URLs get redirected to be lower case (meaning the site won’t break) – but it will cause an extra HTTP redirect to happen on your site for URLs that don’t need to be redirected for SEO reasons. So setting up a condition clause makes sense to add. When I click the “ok” button above and apply our lower-case rewriting rule the admin tool will save the following additional rule to our web.config file: <configuration> <system.webServer> <rewrite> <rules> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> <rule name="Lower Case URLs" stopProcessing="true"> <match url="[A-Z]" ignoreCase="false" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{ToLower:{URL}}" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://scottgu.com/Albums.aspx http://scottgu.com/albums.aspx Notice that the first URL (which has a capital “A”) automatically does a redirect to a lower-case version of the URL. Scenario 3: Trailing Slashes Another common SEO problem I discussed earlier in this post is the scenario of trailing slashes within URLs. The trailing slash creates yet another situation that causes search engines to treat the URLs as different and so split search rankings: http://scottgu.com http://scottgu.com/ We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the first URL (that does not have a trailing slash) to instead go to the second one that does. Like before, we will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. To create such a rule we’ll click the “Add Rule” link in the URL Rewrite admin tool again. This will cause the “Add Rule” dialog to appear again: The URL Rewrite admin tool has a built-in “Append or remove the trailing slash symbol” rule template. When we select it and click the “ok” button we’ll see the following dialog which asks us if we want to create a rule that automatically redirects users to a URL with a trailing slash if one isn’t present: Like within our previous lower-casing rewrite rule we’ll add one additional condition clause that will exclude WebResource.axd URLs from being processed by this rule. This will avoid an unnecessary redirect for happening for those URLs. When we click the “OK” button we’ll get a pre-written rule that automatically performs a permanent redirect if the URL doesn’t have a trailing slash – and if the URL is not processed by either a directory or a file. This will save the following additional rule to our web.config file: <configuration> <system.webServer> <rewrite> <rules> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> <rule name="Lower Case URLs" stopProcessing="true"> <match url="[A-Z]" ignoreCase="false" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{ToLower:{URL}}" /> </rule> <rule name="Trailing Slash" stopProcessing="true"> <match url="(.*[^/])$" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{R:1}/" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://scottgu.com http://scottgu.com/ Notice that the first URL (which has no trailing slash) automatically does a redirect to a URL with the trailing slash. Because it is a permanent redirect, search engines will follow the URL and update the page ranking. Scenario 4: Canonical Host Names The final SEO problem I discussed earlier are scenarios where a site works with both a leading “www” hostname prefix as well as just the hostname itself. This causes search engines to treat the URLs as different and split search rankling: http://www.scottgu.com/albums.aspx http://scottgu.com/albums.aspx We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the first URL (that has a www prefix) to instead go to the second URL. Like before, we will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. To create such a rule we’ll click the “Add Rule” link in the URL Rewrite admin tool again. This will cause the “Add Rule” dialog to appear again: The URL Rewrite admin tool has a built-in “Canonical domain name” rule template. When we select it and click the “ok” button we’ll see the following dialog which asks us if we want to create a redirect rule that automatically redirects users to a primary host name URL: Above I’m entering the primary URL address I want to expose to the web: scottgu.com. When we click the “OK” button we’ll get a pre-written rule that automatically performs a permanent redirect if the URL has another leading domain name prefix. This will save the following additional rule to our web.config file: <configuration> <system.webServer> <rewrite> <rules> <rule name="Cannonical Hostname"> <match url="(.*)" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{HTTP_HOST}" pattern="^scottgu\.com$" negate="true" /> </conditions> <action type="Redirect" url="http://scottgu.com/{R:1}" /> </rule> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> <rule name="Lower Case URLs" stopProcessing="true"> <match url="[A-Z]" ignoreCase="false" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{ToLower:{URL}}" /> </rule> <rule name="Trailing Slash" stopProcessing="true"> <match url="(.*[^/])$" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{R:1}/" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://www.scottgu.com/albums.aspx http://scottgu.com/albums.aspx Notice that the first URL (which has the “www” prefix) now automatically does a redirect to the second URL which does not have the www prefix. Because it is a permanent redirect, search engines will follow the URL and update the page ranking. 4 Simple Rules for Improved SEO The above 4 rules are pretty easy to setup and should take less than 15 minutes to configure on existing sites you already have. The beauty of using a solution like the URL Rewrite Extension is that you can take advantage of it without having to change code within your web-site – and without having to break any existing links already pointing at your site. Users who follow existing links will be automatically redirected to the new URLs you wish to publish. And search engines will start to give your site a higher search relevancy ranking – which will list your site higher in search results and drive more traffic to it. Customizing your URL Rewriting rules further is easy to-do either by editing the web.config file directly, or alternatively, just double click the URL Rewrite icon within the IIS 7.x admin tool and it will list all the active rules for your web-site or application: Clicking any of the rules above will open the rules editor back up and allow you to tweak/customize/save them further. Summary Measuring and improving SEO is something every developer building a public-facing web-site needs to think about and focus on. If you haven’t already, download and use the SEO Toolkit to analyze the SEO of your sites today. New URL Routing features in ASP.NET MVC and ASP.NET Web Forms 4 make it much easier to build applications that have more control over the URLs that are published. Tools like the URL Rewrite Extension that I’ve talked about in this blog post make it much easier to improve the URLs that are published from sites you already have built today – without requiring you to change a lot of code. The URL Rewrite Extension provides a bunch of additional great capabilities – far beyond just SEO - as well. I’ll be covering these additional capabilities more in future blog posts. Hope this helps, Scott

Read the article
A way of doing real-world test-driven development (and some thoughts about it)

- by Thomas Weller

Lately, I exchanged some arguments with Derick Bailey about some details of the red-green-refactor cycle of the Test-driven development process. In short, the issue revolved around the fact that it’s not enough to have a test red or green, but it’s also important to have it red or green for the right reasons. While for me, it’s sufficient to initially have a NotImplementedException in place, Derick argues that this is not totally correct (see these two posts: Red/Green/Refactor, For The Right Reasons and Red For The Right Reason: Fail By Assertion, Not By Anything Else). And he’s right. But on the other hand, I had no idea how his insights could have any practical consequence for my own individual interpretation of the red-green-refactor cycle (which is not really red-green-refactor, at least not in its pure sense, see the rest of this article). This made me think deeply for some days now. In the end I found out that the ‘right reason’ changes in my understanding depending on what development phase I’m in. To make this clear (at least I hope it becomes clear…) I started to describe my way of working in some detail, and then something strange happened: The scope of the article slightly shifted from focusing ‘only’ on the ‘right reason’ issue to something more general, which you might describe as something like 'Doing real-world TDD in .NET , with massive use of third-party add-ins’. This is because I feel that there is a more general statement about Test-driven development to make: It’s high time to speak about the ‘How’ of TDD, not always only the ‘Why’. Much has been said about this, and me myself also contributed to that (see here: TDD is not about testing, it's about how we develop software). But always justifying what you do is very unsatisfying in the long run, it is inherently defensive, and it costs time and effort that could be used for better and more important things. And frankly: I’m somewhat sick and tired of repeating time and again that the test-driven way of software development is highly preferable for many reasons - I don’t want to spent my time exclusively on stating the obvious… So, again, let’s say it clearly: TDD is programming, and programming is TDD. Other ways of programming (code-first, sometimes called cowboy-coding) are exceptional and need justification. – I know that there are many people out there who will disagree with this radical statement, and I also know that it’s not a description of the real world but more of a mission statement or something. But nevertheless I’m absolutely sure that in some years this statement will be nothing but a platitude. Side note: Some parts of this post read as if I were paid by Jetbrains (the manufacturer of the ReSharper add-in – R#), but I swear I’m not. Rather I think that Visual Studio is just not production-complete without it, and I wouldn’t even consider to do professional work without having this add-in installed... The three parts of a software component Before I go into some details, I first should describe my understanding of what belongs to a software component (assembly, type, or method) during the production process (i.e. the coding phase). Roughly, I come up with the three parts shown below: First, we need to have some initial sort of requirement. This can be a multi-page formal document, a vague idea in some programmer’s brain of what might be needed, or anything in between. In either way, there has to be some sort of requirement, be it explicit or not. – At the C# micro-level, the best way that I found to formulate that is to define interfaces for just about everything, even for internal classes, and to provide them with exhaustive xml comments. The next step then is to re-formulate these requirements in an executable form. This is specific to the respective programming language. - For C#/.NET, the Gallio framework (which includes MbUnit) in conjunction with the ReSharper add-in for Visual Studio is my toolset of choice. The third part then finally is the production code itself. It’s development is entirely driven by the requirements and their executable formulation. This is the delivery, the two other parts are ‘only’ there to make its production possible, to give it a decent quality and reliability, and to significantly reduce related costs down the maintenance timeline. So while the first two parts are not really relevant for the customer, they are very important for the developer. The customer (or in Scrum terms: the Product Owner) is not interested at all in how the product is developed, he is only interested in the fact that it is developed as cost-effective as possible, and that it meets his functional and non-functional requirements. The rest is solely a matter of the developer’s craftsmanship, and this is what I want to talk about during the remainder of this article… An example To demonstrate my way of doing real-world TDD, I decided to show the development of a (very) simple Calculator component. The example is deliberately trivial and silly, as examples always are. I am totally aware of the fact that real life is never that simple, but I only want to show some development principles here… The requirement As already said above, I start with writing down some words on the initial requirement, and I normally use interfaces for that, even for internal classes - the typical question “intf or not” doesn’t even come to mind. I need them for my usual workflow and using them automatically produces high componentized and testable code anyway. To think about their usage in every single situation would slow down the production process unnecessarily. So this is what I begin with: namespace Calculator { /// <summary> /// Defines a very simple calculator component for demo purposes. /// </summary> public interface ICalculator { /// <summary> /// Gets the result of the last successful operation. /// </summary> /// <value>The last result.</value> /// <remarks> /// Will be <see langword="null" /> before the first successful operation. /// </remarks> double? LastResult { get; } } // interface ICalculator } // namespace Calculator So, I’m not beginning with a test, but with a sort of code declaration - and still I insist on being 100% test-driven. There are three important things here: Starting this way gives me a method signature, which allows to use IntelliSense and AutoCompletion and thus eliminates the danger of typos - one of the most regular, annoying, time-consuming, and therefore expensive sources of error in the development process. In my understanding, the interface definition as a whole is more of a readable requirement document and technical documentation than anything else. So this is at least as much about documentation than about coding. The documentation must completely describe the behavior of the documented element. I normally use an IoC container or some sort of self-written provider-like model in my architecture. In either case, I need my components defined via service interfaces anyway. - I will use the LinFu IoC framework here, for no other reason as that is is very simple to use. The ‘Red’ (pt. 1) First I create a folder for the project’s third-party libraries and put the LinFu.Core dll there. Then I set up a test project (via a Gallio project template), and add references to the Calculator project and the LinFu dll. Finally I’m ready to write the first test, which will look like the following: namespace Calculator.Test { [TestFixture] public class CalculatorTest { private readonly ServiceContainer container = new ServiceContainer(); [Test] public void CalculatorLastResultIsInitiallyNull() { ICalculator calculator = container.GetService<ICalculator>(); Assert.IsNull(calculator.LastResult); } } // class CalculatorTest } // namespace Calculator.Test This is basically the executable formulation of what the interface definition states (part of). Side note: There’s one principle of TDD that is just plain wrong in my eyes: I’m talking about the Red is 'does not compile' thing. How could a compiler error ever be interpreted as a valid test outcome? I never understood that, it just makes no sense to me. (Or, in Derick’s terms: this reason is as wrong as a reason ever could be…) A compiler error tells me: Your code is incorrect, but nothing more. Instead, the ‘Red’ part of the red-green-refactor cycle has a clearly defined meaning to me: It means that the test works as intended and fails only if its assumptions are not met for some reason. Back to our Calculator. When I execute the above test with R#, the Gallio plugin will give me this output: So this tells me that the test is red for the wrong reason: There’s no implementation that the IoC-container could load, of course. So let’s fix that. With R#, this is very easy: First, create an ICalculator - derived type: Next, implement the interface members: And finally, move the new class to its own file: So far my ‘work’ was six mouse clicks long, the only thing that’s left to do manually here, is to add the Ioc-specific wiring-declaration and also to make the respective class non-public, which I regularly do to force my components to communicate exclusively via interfaces: This is what my Calculator class looks like as of now: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get { throw new NotImplementedException(); } } } } Back to the test fixture, we have to put our IoC container to work: [TestFixture] public class CalculatorTest { #region Fields private readonly ServiceContainer container = new ServiceContainer(); #endregion // Fields #region Setup/TearDown [FixtureSetUp] public void FixtureSetUp() { container.LoadFrom(AppDomain.CurrentDomain.BaseDirectory, "Calculator.dll"); } ... Because I have a R# live template defined for the setup/teardown method skeleton as well, the only manual coding here again is the IoC-specific stuff: two lines, not more… The ‘Red’ (pt. 2) Now, the execution of the above test gives the following result: This time, the test outcome tells me that the method under test is called. And this is the point, where Derick and I seem to have somewhat different views on the subject: Of course, the test still is worthless regarding the red/green outcome (or: it’s still red for the wrong reasons, in that it gives a false negative). But as far as I am concerned, I’m not really interested in the test outcome at this point of the red-green-refactor cycle. Rather, I only want to assert that my test actually calls the right method. If that’s the case, I will happily go on to the ‘Green’ part… The ‘Green’ Making the test green is quite trivial. Just make LastResult an automatic property: [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get; private set; } } One more round… Now on to something slightly more demanding (cough…). Let’s state that our Calculator exposes an Add() method: ... /// <summary> /// Adds the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the additon.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Add(double operand1, double operand2); } // interface ICalculator A remark: I sometimes hear the complaint that xml comment stuff like the above is hard to read. That’s certainly true, but irrelevant to me, because I read xml code comments with the CR_Documentor tool window. And using that, it looks like this: Apart from that, I’m heavily using xml code comments (see e.g. here for a detailed guide) because there is the possibility of automating help generation with nightly CI builds (using MS Sandcastle and the Sandcastle Help File Builder), and then publishing the results to some intranet location. This way, a team always has first class, up-to-date technical documentation at hand about the current codebase. (And, also very important for speeding up things and avoiding typos: You have IntelliSense/AutoCompletion and R# support, and the comments are subject to compiler checking…). Back to our Calculator again: Two more R# – clicks implement the Add() skeleton: ... public double Add(double operand1, double operand2) { throw new NotImplementedException(); } } // class Calculator As we have stated in the interface definition (which actually serves as our requirement document!), the operands are not allowed to be negative. So let’s start implementing that. Here’s the test: [Test] [Row(-0.5, 2)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } As you can see, I’m using a data-driven unit test method here, mainly for these two reasons: Because I know that I will have to do the same test for the second operand in a few seconds, I save myself from implementing another test method for this purpose. Rather, I only will have to add another Row attribute to the existing one. From the test report below, you can see that the argument values are explicitly printed out. This can be a valuable documentation feature even when everything is green: One can quickly review what values were tested exactly - the complete Gallio HTML-report (as it will be produced by the Continuous Integration runs) shows these values in a quite clear format (see below for an example). Back to our Calculator development again, this is what the test result tells us at the moment: So we’re red again, because there is not yet an implementation… Next we go on and implement the necessary parameter verification to become green again, and then we do the same thing for the second operand. To make a long story short, here’s the test and the method implementation at the end of the second cycle: // in CalculatorTest: [Test] [Row(-0.5, 2)] [Row(295, -123)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } // in Calculator: public double Add(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } throw new NotImplementedException(); } So far, we have sheltered our method from unwanted input, and now we can safely operate on the parameters without further caring about their validity (this is my interpretation of the Fail Fast principle, which is regarded here in more detail). Now we can think about the method’s successful outcomes. First let’s write another test for that: [Test] [Row(1, 1, 2)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } Again, I’m regularly using row based test methods for these kinds of unit tests. The above shown pattern proved to be extremely helpful for my development work, I call it the Defined-Input/Expected-Output test idiom: You define your input arguments together with the expected method result. There are two major benefits from that way of testing: In the course of refining a method, it’s very likely to come up with additional test cases. In our case, we might add tests for some edge cases like ‘one of the operands is zero’ or ‘the sum of the two operands causes an overflow’, or maybe there’s an external test protocol that has to be fulfilled (e.g. an ISO norm for medical software), and this results in the need of testing against additional values. In all these scenarios we only have to add another Row attribute to the test. Remember that the argument values are written to the test report, so as a side-effect this produces valuable documentation. (This can become especially important if the fulfillment of some sort of external requirements has to be proven). So your test method might look something like that in the end: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 2)] [Row(0, 999999999, 999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, double.MaxValue)] [Row(4, double.MaxValue - 2.5, double.MaxValue)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } And this will produce the following HTML report (with Gallio): Not bad for the amount of work we invested in it, huh? - There might be scenarios where reports like that can be useful for demonstration purposes during a Scrum sprint review… The last requirement to fulfill is that the LastResult property is expected to store the result of the last operation. I don’t show this here, it’s trivial enough and brings nothing new… And finally: Refactor (for the right reasons) To demonstrate my way of going through the refactoring portion of the red-green-refactor cycle, I added another method to our Calculator component, namely Subtract(). Here’s the code (tests and production): // CalculatorTest.cs: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtract(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, result); } [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtractGivesExpectedLastResult(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, calculator.LastResult); } ... // ICalculator.cs: /// <summary> /// Subtracts the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the subtraction.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Subtract(double operand1, double operand2); ... // Calculator.cs: public double Subtract(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } return (this.LastResult = operand1 - operand2).Value; } Obviously, the argument validation stuff that was produced during the red-green part of our cycle duplicates the code from the previous Add() method. So, to avoid code duplication and minimize the number of code lines of the production code, we do an Extract Method refactoring. One more time, this is only a matter of a few mouse clicks (and giving the new method a name) with R#: Having done that, our production code finally looks like that: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { #region ICalculator public double? LastResult { get; private set; } public double Add(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 + operand2).Value; } public double Subtract(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 - operand2).Value; } #endregion // ICalculator #region Implementation (Helper) private static void ThrowIfOneOperandIsInvalid(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } } #endregion // Implementation (Helper) } // class Calculator } // namespace Calculator But is the above worth the effort at all? It’s obviously trivial and not very impressive. All our tests were green (for the right reasons), and refactoring the code did not change anything. It’s not immediately clear how this refactoring work adds value to the project. Derick puts it like this: STOP! Hold on a second… before you go any further and before you even think about refactoring what you just wrote to make your test pass, you need to understand something: if your done with your requirements after making the test green, you are not required to refactor the code. I know… I’m speaking heresy, here. Toss me to the wolves, I’ve gone over to the dark side! Seriously, though… if your test is passing for the right reasons, and you do not need to write any test or any more code for you class at this point, what value does refactoring add? Derick immediately answers his own question: So why should you follow the refactor portion of red/green/refactor? When you have added code that makes the system less readable, less understandable, less expressive of the domain or concern’s intentions, less architecturally sound, less DRY, etc, then you should refactor it. I couldn’t state it more precise. From my personal perspective, I’d add the following: You have to keep in mind that real-world software systems are usually quite large and there are dozens or even hundreds of occasions where micro-refactorings like the above can be applied. It’s the sum of them all that counts. And to have a good overall quality of the system (e.g. in terms of the Code Duplication Percentage metric) you have to be pedantic on the individual, seemingly trivial cases. My job regularly requires the reading and understanding of ‘foreign’ code. So code quality/readability really makes a HUGE difference for me – sometimes it can be even the difference between project success and failure… Conclusions The above described development process emerged over the years, and there were mainly two things that guided its evolution (you might call it eternal principles, personal beliefs, or anything in between): Test-driven development is the normal, natural way of writing software, code-first is exceptional. So ‘doing TDD or not’ is not a question. And good, stable code can only reliably be produced by doing TDD (yes, I know: many will strongly disagree here again, but I’ve never seen high-quality code – and high-quality code is code that stood the test of time and causes low maintenance costs – that was produced code-first…) It’s the production code that pays our bills in the end. (Though I have seen customers these days who demand an acceptance test battery as part of the final delivery. Things seem to go into the right direction…). The test code serves ‘only’ to make the production code work. But it’s the number of delivered features which solely counts at the end of the day - no matter how much test code you wrote or how good it is. With these two things in mind, I tried to optimize my coding process for coding speed – or, in business terms: productivity - without sacrificing the principles of TDD (more than I’d do either way…). As a result, I consider a ratio of about 3-5/1 for test code vs. production code as normal and desirable. In other words: roughly 60-80% of my code is test code (This might sound heavy, but that is mainly due to the fact that software development standards only begin to evolve. The entire software development profession is very young, historically seen; only at the very beginning, and there are no viable standards yet. If you think about software development as a kind of casting process, where the test code is the mold and the resulting production code is the final product, then the above ratio sounds no longer extraordinary…) Although the above might look like very much unnecessary work at first sight, it’s not. With the aid of the mentioned add-ins, doing all the above is a matter of minutes, sometimes seconds (while writing this post took hours and days…). The most important thing is to have the right tools at hand. Slow developer machines or the lack of a tool or something like that - for ‘saving’ a few 100 bucks - is just not acceptable and a very bad decision in business terms (though I quite some times have seen and heard that…). Production of high-quality products needs the usage of high-quality tools. This is a platitude that every craftsman knows… The here described round-trip will take me about five to ten minutes in my real-world development practice. I guess it’s about 30% more time compared to developing the ‘traditional’ (code-first) way. But the so manufactured ‘product’ is of much higher quality and massively reduces maintenance costs, which is by far the single biggest cost factor, as I showed in this previous post: It's the maintenance, stupid! (or: Something is rotten in developerland.). In the end, this is a highly cost-effective way of software development… But on the other hand, there clearly is a trade-off here: coding speed vs. code quality/later maintenance costs. The here described development method might be a perfect fit for the overwhelming majority of software projects, but there certainly are some scenarios where it’s not - e.g. if time-to-market is crucial for a software project. So this is a business decision in the end. It’s just that you have to know what you’re doing and what consequences this might have… Some last words First, I’d like to thank Derick Bailey again. His two aforementioned posts (which I strongly recommend for reading) inspired me to think deeply about my own personal way of doing TDD and to clarify my thoughts about it. I wouldn’t have done that without this inspiration. I really enjoy that kind of discussions… I agree with him in all respects. But I don’t know (yet?) how to bring his insights into the described production process without slowing things down. The above described method proved to be very “good enough” in my practical experience. But of course, I’m open to suggestions here… My rationale for now is: If the test is initially red during the red-green-refactor cycle, the ‘right reason’ is: it actually calls the right method, but this method is not yet operational. Later on, when the cycle is finished and the tests become part of the regular, automated Continuous Integration process, ‘red’ certainly must occur for the ‘right reason’: in this phase, ‘red’ MUST mean nothing but an unfulfilled assertion - Fail By Assertion, Not By Anything Else!

Read the article
value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article
iPhone SDK vs Windows Phone 7 Series SDK Challenge, Part 1: Hello World!

In this series, I will be taking sample applications from the iPhone SDK and implementing them on Windows Phone 7 Series. My goal is to do as much of an apples-to-apples comparison as I can. This series will be written to not only compare and contrast how easy or difficult it is to complete tasks on either platform, how many lines of code, etc., but Id also like it to be a way for iPhone developers to either get started on Windows Phone 7 Series development, or for developers in general to learn the platform. Heres my methodology: Run the iPhone SDK app in the iPhone Simulator to get a feel for what it does and how it works, without looking at the implementation Implement the equivalent functionality on Windows Phone 7 Series using Silverlight. Compare the two implementations based on complexity, functionality, lines of code, number of files, etc. Add some functionality to the Windows Phone 7 Series app that shows off a way to make the scenario more interesting or leverages an aspect of the platform, or uses a better design pattern to implement the functionality. You can download Microsoft Visual Studio 2010 Express for Windows Phone CTP here, and the Expression Blend 4 Beta here. Hello World! Of course no first post would be allowed if it didnt focus on the hello world scenario. The iPhone SDK follows that tradition with the Your First iPhone Application walkthrough. I will say that the developer documentation for iPhone is pretty good. There are plenty of walkthoughs and they break things down into nicely sized steps and do a good job of bringing the user along. As expected, this application is quite simple. It comprises of a text box, a label, and a button. When you push the button, the label changes to Hello plus the word you typed into the text box. Makes perfect sense for a starter application. Theres not much to this but it covers a few basic elements: Laying out basic UI Handling user input Hooking up events Formatting text So, lets get started building a similar app for Windows Phone 7 Series! Implementing the UI: UI in Silverlight (and therefore Windows Phone 7) is defined in XAML, which is a declarative XML language also used by WPF on the desktop. For anyone thats familiar with similar types of markup, its relatively straightforward to learn, but has a lot of power in it once you get it figured out. Well talk more about that. This UI is very simple. When I look at this, I note a couple of things: Elements are arranged vertically They are all centered So, lets create our Application and then start with the UI. Once you have the the VS 2010 Express for Windows Phone tool running, create a new Windows Phone Project, and call it Hello World: Once created, youll see the designer on one side and your XAML on the other: Now, we can create our UI in one of three ways: Use the designer in Visual Studio to drag and drop the components Use the designer in Expression Blend 4 to drag and drop the components Enter the XAML by hand in either of the above Well start with (1), then kind of move to (3) just for instructional value. To develop this UI in the designer: First, delete all of the markup between inside of the Grid element (LayoutRoot). You should be left with just this XAML for your MainPage.xaml (i shortened all the xmlns declarations below for brevity): 1: <phoneNavigation:PhoneApplicationPage 2: x:Class="HelloWorld.MainPage" 3: xmlns="...[snip]" 4: FontFamily="{StaticResource PhoneFontFamilyNormal}" 5: FontSize="{StaticResource PhoneFontSizeNormal}" 6: Foreground="{StaticResource PhoneForegroundBrush}"> 7: 8: <Grid x:Name="LayoutRoot" Background="{StaticResource PhoneBackgroundBrush}"> 9: 10: </Grid> 11: 12: </phoneNavigation:PhoneApplicationPage> .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Well be adding XAML at line 9, so thats the important part. Now, Click on the center area of the phone surface Open the Toolbox and double click StackPanel Double click TextBox Double click TextBlock Double click Button That will create the necessary UI elements but they wont be arranged quite right. Well fix it in a second. Heres the XAML that we end up with: 1: <StackPanel Height="100" HorizontalAlignment="Left" Margin="10,10,0,0" Name="stackPanel1" VerticalAlignment="Top" Width="200"> 2: <TextBox Height="32" Name="textBox1" Text="TextBox" Width="100" /> 3: <TextBlock Height="23" Name="textBlock1" Text="TextBlock" /> 4: <Button Content="Button" Height="70" Name="button1" Width="160" /> 5: </StackPanel> .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } The designer does its best at guessing what we want, but in this case we want things to be a bit simpler. So well just clean it up a bit. We want the items to be centered and we want them to have a little bit of a margin on either side, so heres what we end up with. Ive also made it match the values and style from the iPhone app: 1: <StackPanel Margin="10"> 2: <TextBox Name="textBox1" HorizontalAlignment="Stretch" Text="You" TextAlignment="Center"/> 3: <TextBlock Name="textBlock1" HorizontalAlignment="Center" Margin="0,100,0,0" Text="Hello You!" /> 4: <Button Name="button1" HorizontalAlignment="Center" Margin="0,150,0,0" Content="Hello"/> 5: </StackPanel> .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now lets take a look at what weve done there. Line 1: We removed all of the formatting from the StackPanel, except for Margin, as thats all we need. Since our parent element is a Grid, by default the StackPanel will be sized to fit in that space. The Margin says that we want to reserve 10 pixels on each side of the StackPanel. Line 2: Weve set the HorizontalAlignment of the TextBox to Stretch, which says that it should fill its parents size horizontally. We want to do this so the TextBox is always full-width. We also set TextAlignment to Center, to center the text. Line 3: In contrast to the TextBox above, we dont care how wide the TextBlock is, just so long as it is big enough for its text. Thatll happen automatically, so we just set its Horizontal alignment to Center. We also set a Margin above the TextBlock of 100 pixels to bump it down a bit, per the iPhone UI. Line 4: We do the same things here as in Line 3. Heres how the UI looks in the designer: Believe it or not, were almost done! Implementing the App Logic Now, we want the TextBlock to change its text when the Button is clicked. In the designer, double click the Button to be taken to the Event Handler for the Buttons Click event. In that event handler, we take the Text property from the TextBox, and format it into a string, then set it into the TextBlock. Thats it! 1: private void button1_Click(object sender, RoutedEventArgs e) 2: { 3: string name = textBox1.Text; 4: 5: // if there isn't a name set, just use "World" 6: if (String.IsNullOrEmpty(name)) 7: { 8: name = "World"; 9: } 10: 11: // set the value into the TextBlock 12: textBlock1.Text = String.Format("Hello {0}!", name); 13: 14: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } We use the String.Format() method to handle the formatting for us. Now all thats left is to test the app in the Windows Phone Emulator and verify it does what we think it does! And it does! Comparing against the iPhone Looking at the iPhone example, there are basically three things that you have to touch as the developer: 1) The UI in the Nib file 2) The app delegate 3) The view controller Counting lines is a bit tricky here, but to try to keep this even, Im going to only count lines of code that I could not have (or would not have) generated with the tooling. Meaning, Im not counting XAML and Im not counting operations that happen in the Nib file with the XCode designer tool. So in the case of the above, even though I modified the XAML, I could have done all of those operations using the visual designer tool. And normally I would have, but the XAML is more instructive (and less steps!). Im interested in things that I, as the developer have to figure out in code. Im also not counting lines that just have a curly brace on them, or lines that are generated for me (e.g. method names that are generated for me when I make a connection, etc.) So, by that count, heres what I get from the code listing for the iPhone app found here: HelloWorldAppDelegate.h: 6 HelloWorldAppDelegate.m: 12 MyViewController.h: 8 MyViewController.m: 18 Which gives me a grand total of about 44 lines of code on iPhone. I really do recommend looking at the iPhone code for a comparison to the above. Now, for the Windows Phone 7 Series application, the only code I typed was in the event handler above Main.Xaml.cs: 4 So a total of 4 lines of code on Windows Phone 7. And more importantly, the process is just A LOT simpler. For example, I was surprised that the User Interface Designer in XCode doesnt automatically create instance variables for me and wire them up to the corresponding elements. I assumed I wouldnt have to write this code myself (and risk getting it wrong!). I dont need to worry about view controllers or anything. I just write my code. This blog post up to this point has covered almost every aspect of this apps development in a few pages. The iPhone tutorial has 5 top level steps with 2-3 sub sections of each. Now, its worth pointing out that the iPhone development model uses the Model View Controller (MVC) pattern, which is a very flexible and powerful pattern that enforces proper separation of concerns. But its fairly complex and difficult to understand when you first walk up to it. Here at Microsoft weve dabbled in MVC a bit, with frameworks like MFC on Visual C++ and with the ASP.NET MVC framework now. Both are very powerful frameworks. But one of the reasons weve stayed away from MVC with client UI frameworks is that its difficult to tool. We havent seen the type of value that beats double click, write code! for the broad set of scenarios. Another thing to think about is how many of those lines of code were focused on my apps functionality?. Or, the converse of How many lines of code were boilerplate plumbing? In both examples, the actual number of functional code lines is similar. I count most of them in MyViewController.m, in the changeGreeting method. Its about 7 lines of code that do the work of taking the value from the TextBox and putting it into the label. Versus 4 on the Windows Phone 7 side. But, unfortunately, on iPhone I still have to write that other 37 lines of code, just to get there. 10% of the code, 1 file instead of 4, its just much simpler. Making Some Tweaks It turns out, I can actually do this application with ZERO lines of code, if Im willing to change the spec a bit. The data binding functionality in Silverlight is incredibly powerful. And what I can do is databind the TextBoxs value directly to the TextBlock. Take some time looking at this XAML below. Youll see that I have added another nested StackPanel and two more TextBlocks. Why? Because thats how I build that string, and the nested StackPanel will lay things out Horizontally for me, as specified by the Orientation property. 1: <StackPanel Margin="10"> 2: <TextBox Name="textBox1" HorizontalAlignment="Stretch" Text="You" TextAlignment="Center"/> 3: <StackPanel Orientation="Horizontal" HorizontalAlignment="Center" Margin="0,100,0,0" > 4: <TextBlock Text="Hello " /> 5: <TextBlock Name="textBlock1" Text="{Binding ElementName=textBox1, Path=Text}" /> 6: <TextBlock Text="!" /> 7: </StackPanel> 8: <Button Name="button1" HorizontalAlignment="Center" Margin="0,150,0,0" Content="Hello" Click="button1_Click" /> 9: </StackPanel> .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now, the real action is there in the bolded TextBlock.Text property: Text="{Binding ElementName=textBox1, Path=Text}" .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } That does all the heavy lifting. It sets up a databinding between the TextBox.Text property on textBox1 and the TextBlock.Text property on textBlock1. As I change the text of the TextBox, the label updates automatically. In fact, I dont even need the button any more, so I could get rid of that altogether. And no button means no event handler. No event handler means no C# code at all. Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
FreeBSD performance tuning. Sysctls, loader.conf, kernel

- by SaveTheRbtz

I wanted to share knowledge of tuning FreeBSD via sysctl.conf/loader.conf/KENCONF. It was initially based on Igor Sysoev's (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections. Tunings are for FreeBSD-CURRENT. Since 7.2 amd64 some of them are tuned well by default. Prior 7.0 some of them are boot only (set via /boot/loader.conf) or does not exist at all. sysctl.conf: # No zero mapping feature # May break wine # (There are also reports about broken samba3) #security.bsd.map_at_zero=0 # If you have really busy webserver with apache13 you may run out of processes #kern.maxproc=10000 # Same for servers with apache2 / Pound #kern.threads.max_threads_per_proc=4096 # Max. backlog size kern.ipc.somaxconn=4096 # Shared memory // 7.2+ can use shared memory > 2Gb kern.ipc.shmmax=2147483648 # Sockets kern.ipc.maxsockets=204800 # Can cause this on older kernels: # http://old.nabble.com/Significant-performance-regression-for-increased-maxsockbuf-on-8.0-RELEASE-tt26745981.html#a26745981 ) kern.ipc.maxsockbuf=10485760 # Mbuf 2k clusters (on amd64 7.2+ 25600 is default) # For such high value vm.kmem_size must be increased to 3G kern.ipc.nmbclusters=262144 # Jumbo pagesize(_SC_PAGESIZE) clusters # Used as general packet storage for jumbo frames # can be monitored via `netstat -m` #kern.ipc.nmbjumbop=262144 # Jumbo 9k/16k clusters # If you are using them #kern.ipc.nmbjumbo9=65536 #kern.ipc.nmbjumbo16=32768 # For lower latency you can decrease scheduler's maximum time slice # default: stathz/10 (~ 13) #kern.sched.slice=1 # Increase max command-line length showed in `ps` (e.g for Tomcat/Java) # Default is PAGE_SIZE / 16 or 256 on x86 # This avoids commands to be presented as [executable] in `ps` # For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749 kern.ps_arg_cache_limit=4096 # Every socket is a file, so increase them kern.maxfiles=204800 kern.maxfilesperproc=200000 kern.maxvnodes=200000 # On some systems HPET is almost 2 times faster than default ACPI-fast # Useful on systems with lots of clock_gettime / gettimeofday calls # See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html # After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222 kern.timecounter.hardware=HPET # Small receive space, only usable on http-server, on file server this # should be increased to 65535 or even more #net.inet.tcp.recvspace=8192 # This is useful on Fat-Long-Pipes #net.inet.tcp.recvbuf_max=10485760 #net.inet.tcp.recvbuf_inc=65535 # Small send space is useful for http servers that serve small files # Autotuned since 7.x net.inet.tcp.sendspace=16384 # This is useful on Fat-Long-Pipes #net.inet.tcp.sendbuf_max=10485760 #net.inet.tcp.sendbuf_inc=65535 # Turn off receive autotuning # You can play with it. #net.inet.tcp.recvbuf_auto=0 #net.inet.tcp.sendbuf_auto=0 # This should be enabled if you going to use big spaces (>64k) # Also timestamp field is useful when using syncookies net.inet.tcp.rfc1323=1 # Turn this off on high-speed, lossless connections (LAN 1Gbit+) # If you set it there is no need in TCP_NODELAY sockopt (see man tcp) net.inet.tcp.delayed_ack=0 # This feature is useful if you are serving data over modems, Gigabit Ethernet, # or even high speed WAN links (or any other link with a high bandwidth delay product), # especially if you are also using window scaling or have configured a large send window. # Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 ) # This sysctl was removed in 10-CURRENT: # See: http://www.mail-archive.com/[email protected]/msg06178.html #net.inet.tcp.inflight.enable=0 # TCP slowstart algorithm tunings # We assuming we have very fast clients #net.inet.tcp.slowstart_flightsize=100 #net.inet.tcp.local_slowstart_flightsize=100 # Disable randomizing of ports to avoid false RST # Before usage check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf # (it's also says that port randomization auto-disables at some conn.rates, but I didn't checked it thou) #net.inet.ip.portrange.randomized=0 # Increase portrange # For outgoing connections only. Good for seed-boxes and ftp servers. net.inet.ip.portrange.first=1024 net.inet.ip.portrange.last=65535 # # stops route cache degregation during a high-bandwidth flood # http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html #net.inet.ip.rtexpire=2 net.inet.ip.rtminexpire=2 net.inet.ip.rtmaxcache=1024 # Security net.inet.ip.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.icmp.maskrepl=0 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.drop_synfin=1 # # There is also good example of sysctl.conf with comments: # http://www.thern.org/projects/sysctl.conf # # icmp may NOT rst, helpful for those pesky spoofed # icmp/udp floods that end up taking up your outgoing # bandwidth/ifqueue due to all that outgoing RST traffic. # #net.inet.tcp.icmp_may_rst=0 # Security net.inet.udp.blackhole=1 net.inet.tcp.blackhole=2 # IPv6 Security # For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6 # Disable Node info replies # To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node net.inet6.icmp6.nodeinfo=0 # Turn on IPv6 privacy extensions # For more info see proposal http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00103.html net.inet6.ip6.use_tempaddr=1 net.inet6.ip6.prefer_tempaddr=1 # Disable ICMP redirect net.inet6.icmp6.rediraccept=0 # Disable acceptation of RA and auto linklocal generation if you don't use them #net.inet6.ip6.accept_rtadv=0 #net.inet6.ip6.auto_linklocal=0 # Increases default TTL, sometimes useful # Default is 64 net.inet.ip.ttl=128 # Lessen max segment life to conserve resources # ACK waiting time in miliseconds # (default: 30000. RFC from 1979 recommends 120000) net.inet.tcp.msl=5000 # Max bumber of timewait sockets net.inet.tcp.maxtcptw=200000 # Don't use tw on local connections # As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy realization. # So disable it or now till get fixed #net.inet.tcp.nolocaltimewait=1 # FIN_WAIT_2 state fast recycle net.inet.tcp.fast_finwait2_recycle=1 # Time before tcp keepalive probe is sent # default is 2 hours (7200000) #net.inet.tcp.keepidle=60000 # Should be increased until net.inet.ip.intr_queue_drops is zero net.inet.ip.intr_queue_maxlen=4096 # Interrupt handling via multiple CPU, but with context switch. # You can play with it. Default is 1; #net.isr.direct=0 # This is for routers only #net.inet.ip.forwarding=1 #net.inet.ip.fastforwarding=1 # This speed ups dummynet when channel isn't saturated net.inet.ip.dummynet.io_fast=1 # Increase dummynet(4) hash #net.inet.ip.dummynet.hash_size=2048 #net.inet.ip.dummynet.max_chain_len # Should be increased when you have A LOT of files on server # (Increase until vfs.ufs.dirhash_mem becomes lower) vfs.ufs.dirhash_maxmem=67108864 # Note from commit http://svn.freebsd.org/base/head@211031 : # For systems with RAID volumes and/or virtualization envirnments, where # read performance is very important, increasing this sysctl tunable to 32 # or even more will demonstratively yield additional performance benefits. vfs.read_max=32 # Explicit Congestion Notification (see http://en.wikipedia.org/wiki/Explicit_Congestion_Notification) net.inet.tcp.ecn.enable=1 # Flowtable - flow caching mechanism # Useful for routers #net.inet.flowtable.enable=1 #net.inet.flowtable.nmbflows=65535 # Extreme polling tuning #kern.polling.burst_max=1000 #kern.polling.each_burst=1000 #kern.polling.reg_frac=100 #kern.polling.user_frac=1 #kern.polling.idle_poll=0 # IPFW dynamic rules and timeouts tuning # Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower net.inet.ip.fw.dyn_buckets=65536 net.inet.ip.fw.dyn_max=65536 net.inet.ip.fw.dyn_ack_lifetime=120 net.inet.ip.fw.dyn_syn_lifetime=10 net.inet.ip.fw.dyn_fin_lifetime=2 net.inet.ip.fw.dyn_short_lifetime=10 # Make packets pass firewall only once when using dummynet # i.e. packets going thru pipe are passing out from firewall with accept #net.inet.ip.fw.one_pass=1 # shm_use_phys Wires all shared pages, making them unswappable # Use this to lessen Virtual Memory Manager's work when using Shared Mem. # Useful for databases #kern.ipc.shm_use_phys=1 # ZFS # Enable prefetch. Useful for sequential load type i.e fileserver. # FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and # on any amd64 systems with less than 4GB of avaiable memory # For additional info check this nabble thread http://old.nabble.com/Samba-read-speed-performance-tuning-td27964534.html #vfs.zfs.prefetch_disable=0 # On highload servers you may notice following message in dmesg: # "Approaching the limit on PV entries, consider increasing either the # vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable" vm.pmap.shpgperproc=2048 loader.conf: # Accept filters for data, http and DNS requests # Useful when your software uses select() instead of kevent/kqueue or when you under DDoS # DNS accf available on 8.0+ accf_data_load="YES" accf_http_load="YES" accf_dns_load="YES" # Async IO system calls aio_load="YES" # Linux specific devices in /dev # As for 8.1 it only /dev/full #lindev_load="YES" # Adds NCQ support in FreeBSD # WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+ # 8.0+ only #ahci_load="YES" #siis_load="YES" # FreeBSD 8.2+ # New Congestion Control for FreeBSD # http://caia.swin.edu.au/urp/newtcp/tools/cc_chd-readme-0.1.txt # http://www.ietf.org/proceedings/78/slides/iccrg-5.pdf # Initial merge commit message http://www.mail-archive.com/[email protected]/msg31410.html #cc_chd_load="YES" # Increase kernel memory size to 3G. # # Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM # Otherwise panic will happen on next reboot! # # It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc # Useful on highload stateful firewalls, proxies or ZFS fileservers # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #vm.kmem_size="3G" # If your server has lots of swap (>4Gb) you should increase following value # according to http://lists.freebsd.org/pipermail/freebsd-hackers/2009-October/029616.html # Otherwise you'll be getting errors # "kernel: swap zone exhausted, increase kern.maxswzone" # kern.maxswzone="256M" # Older versions of FreeBSD can't tune maxfiles on the fly #kern.maxfiles="200000" # Useful for databases # Sets maximum data size to 1G # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #kern.maxdsiz="1G" # Maximum buffer size(vfs.maxbufspace) # You can check current one via vfs.bufspace # Should be lowered/upped depending on server's load-type # Usually decreased to preserve kmem # (default is 10% of mem) #kern.maxbcache="512M" # Sendfile buffers # For i386 only #kern.ipc.nsfbufs=10240 # FreeBSD 9+ # HPET "legacy route" support. It should allow HPET to work per-CPU # See http://www.mail-archive.com/[email protected]/msg03603.html #hint.atrtc.0.clock=0 #hint.attimer.0.clock=0 #hint.hpet.0.legacy_route=1 # syncache Hash table tuning net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=512 net.inet.tcp.syncache.cachelimit=65536 # Increased hostcache # Later host cache can be viewed via net.inet.tcp.hostcache.list hidden sysctl # Very useful for it's RTT RTTVAR # Must be power of two net.inet.tcp.hostcache.hashsize=65536 # hashsize * bucketlimit (which is 30 by default) # It allocates 255Mb (1966080*136) of RAM net.inet.tcp.hostcache.cachelimit=1966080 # TCP control-block Hash table tuning net.inet.tcp.tcbhashsize=4096 # Disable ipfw deny all # Should be uncommented when there is a chance that # kernel and ipfw binary may be out-of sync on next reboot #net.inet.ip.fw.default_to_accept=1 # # SIFTR (Statistical Information For TCP Research) is a kernel module that # logs a range of statistics on active TCP connections to a log file. # See prerelease notes http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/b4c18be6cdce76e4 # and man 4 sitfr #siftr_load="YES" # Enable superpages, for 7.2+ only # Also read http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html vm.pmap.pg_ps_enabled=1 # Usefull if you are using Intel-Gigabit NIC #hw.em.rxd=4096 #hw.em.txd=4096 #hw.em.rx_process_limit="-1" # Also if you have ALOT interrupts on NIC - play with following parameters # NOTE: You should set them for every NIC #dev.em.0.rx_int_delay: 250 #dev.em.0.tx_int_delay: 250 #dev.em.0.rx_abs_int_delay: 250 #dev.em.0.tx_abs_int_delay: 250 # There is also multithreaded version of em/igb drivers can be found here: # http://people.yandex-team.ru/~wawa/ # # for additional em monitoring and statistics use # sysctl dev.em.0.stats=1 ; dmesg # sysctl dev.em.0.debug=1 ; dmesg # Also after r209242 (-CURRENT) there is a separate sysctl for each stat variable; # Same tunings for igb #hw.igb.rxd=4096 #hw.igb.txd=4096 #hw.igb.rx_process_limit=100 # Some useful netisr tunables. See sysctl net.isr #net.isr.maxthreads=4 #net.isr.defaultqlimit=4096 #net.isr.maxqlimit: 10240 # Bind netisr threads to CPUs #net.isr.bindthreads=1 # # FreeBSD 9.x+ # Increase interface send queue length # See commit message http://svn.freebsd.org/viewvc/base?view=revision&revision=207554 #net.link.ifqmaxlen=1024 # Nicer boot logo =) loader_logo="beastie" And finally here is KERNCONF: # Just some of them, see also # cat /sys/{i386,amd64,}/conf/NOTES # This one useful only on i386 #options KVA_PAGES=512 # You can play with HZ in environments with high interrupt rate (default is 1000) # 100 is for my notebook to prolong it's battery life #options HZ=100 # Polling is goot on network loads with high packet rates and low-end NICs # NB! Do not enable it if you want more than one netisr thread #options DEVICE_POLLING # Eliminate datacopy on socket read-write # To take advantage with zero copy sockets you should have an MTU >= 4k # This req. is only for receiving data. # Read more in man zero_copy_sockets # Also this epic thread on kernel trap: # http://kerneltrap.org/node/6506 # Here Linus says that "anybody that does it that way (FreeBSD) is totally incompetent" #options ZERO_COPY_SOCKETS # Support TCP sign. Used for IPSec options TCP_SIGNATURE # There was stackoverflow found in KAME IPSec stack: # See http://secunia.com/advisories/43995/ # For quick workaround you can use `ipfw add deny proto ipcomp` options IPSEC # This ones can be loaded as modules. They described in loader.conf section #options ACCEPT_FILTER_DATA #options ACCEPT_FILTER_HTTP # Adding ipfw, also can be loaded as modules options IPFIREWALL # On 8.1+ you can disable verbose to see blocked packets on ipfw0 interface. # Also there is no point in compiling verbose into the kernel, because # now there is net.inet.ip.fw.verbose tunable. #options IPFIREWALL_VERBOSE #options IPFIREWALL_VERBOSE_LIMIT=10 options IPFIREWALL_FORWARD # Adding kernel NAT options IPFIREWALL_NAT options LIBALIAS # Traffic shaping options DUMMYNET # Divert, i.e. for userspace NAT options IPDIVERT # This is for OpenBSD's pf firewall device pf device pflog # pf's QoS - ALTQ options ALTQ options ALTQ_CBQ # Class Bases Queuing (CBQ) options ALTQ_RED # Random Early Detection (RED) options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler (HFSC) options ALTQ_PRIQ # Priority Queuing (PRIQ) options ALTQ_NOPCC # Required for SMP build # Pretty console # Manual can be found here http://forums.freebsd.org/showthread.php?t=6134 #options VESA #options SC_PIXEL_MODE # Disable reboot on Ctrl Alt Del #options SC_DISABLE_REBOOT # Change normal|kernel messages color options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK) # More scroll space options SC_HISTORY_SIZE=8192 # Adding hardware crypto device device crypto device cryptodev # Useful network interfaces device vlan device tap #Virtual Ethernet driver device gre #IP over IP tunneling device if_bridge #Bridge interface device pfsync #synchronization interface for PF device carp #Common Address Redundancy Protocol device enc #IPsec interface device lagg #Link aggregation interface device stf #IPv4-IPv6 port # Also for my notebook, but may be used with Opteron device amdtemp # Same for Intel processors device coretemp # man 4 cpuctl device cpuctl # CPU control pseudo-device # Support for ECMP. More than one route for destination # Works even with default route so one can use it as LB for two ISP # For now code is unstable and panics (panic: rtfree 2) on route deletions. #options RADIX_MPATH # Multicast routing #options MROUTING #options PIM # Debug & DTrace options KDB # Kernel debugger related code options KDB_TRACE # Print a stack trace for a panic options KDTRACE_FRAME # amd64-only(?) options KDTRACE_HOOKS # all architectures - enable general DTrace hooks #options DDB #options DDB_CTF # all architectures - kernel ELF linker loads CTF data # Adaptive spining in lockmgr (8.x+) # See http://www.mail-archive.com/[email protected]/msg10782.html options ADAPTIVE_LOCKMGRS # UTF-8 in console (8.x+) #options TEKEN_UTF8 # FreeBSD 8.1+ # Deadlock resolver thread # For additional information see http://www.mail-archive.com/[email protected]/msg18124.html # (FYI: "resolution" is panic so use with caution) #options DEADLKRES # Increase maximum size of Raw I/O and sendfile(2) readahead #options MAXPHYS=(1024*1024) #options MAXBSIZE=(1024*1024) # For scheduler debug enable following option. # Debug will be available via `kern.sched.stats` sysctl # For more information see http://svnweb.freebsd.org/base/head/sys/conf/NOTES?view=markup #options SCHED_STATS If you are tuning network for maximum performance you may wish to play with ifconfig options like: # You can list all capabilities via `ifconfig -m` ifconfig [-]rxcsum [-]txcsum [-]tso [-]lro mtu In case you've enabled DDB in kernel config, you should edit your /etc/ddb.conf and add something like this to enable automatic reboot (and textdump as bonus): script kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; call doadump; reset script kdb.enter.default=textdump set; capture on; bt; ps; capture off; call doadump; reset And do not forget to add ddb_enable="YES" to /etc/rc.conf Since FreeBSD 9 you can select to enable/disable flowcontrol on your NIC: # See http://en.wikipedia.org/wiki/Ethernet_flow_control and # http://www.mail-archive.com/[email protected]/msg07927.html for additional info ifconfig bge0 media auto mediaopt flowcontrol PS. Also most of FreeBSD's limits can be monitored by # vmstat -z and # limits PPS. variety of network counters can be monitored via # netstat -s In FreeBSD-9 netstat's -Q option appeared, try following command to display netisr stats # netstat -Q PPPS. also see # man 7 tuning PPPPS. I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning. FreeBSD WIP * Whats cooking for FreeBSD 7? * Whats cooking for FreeBSD 8? * Whats cooking for FreeBSD 9? So here is the question: What tunings are you using on yours FreeBSD servers? You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (web, smb, gateway, etc) Let's share experience!

Read the article
TCP RST Reset Every 5 Minutes on Windows 2003 sp2

- by Dan

Hey, Recently I had a web developer come to me and ask why he was receiving connection errors in his app that was accessing a sql database. So, I went through my normal trouble shooting steps to isolate or reproduce the issue. I discovered that if I connected to the database using Query Analyzer and let the connection idle for 5 minutes it would disconnect. Meaning... I would no longer be able to refresh my tables or any other object/node within the object browser in Query Analyzer. I would have to right click on the instance and refresh for it to re-establish the connection. Next I went to wireshark and ran a capture on the client pc's nic card. Sure enough it was receiving a TCP RST reset every 5 min if the connection idled longer than 5 min. I also ran a capture on the SQL Server and noticed the TCP RST reset command as well. Attached below is the capture from the client Machine. If someone could please assist... That would be great. -I checked all settings within SQL Server 2000 against another server and they all seem to be the same. -Issue does not occur if I connect to any other SQL server 2000 server. -Issue does not occur if connecting to SQL on the server itself... so only over the network. -I consulted with network team and this is the response back: There are no firewalls or proxies in between SQL Server and your desktop. The traffic flows like this: Desktop-Access Switch-Distro Switch-Core Switch-Datacenter Switch-SQL Server None of the switches have security ACL’s configured on them. Also they stated that NAT was not turned on. -Issue does not occur with SQL server Enterprise Manager. -Ran SQL Profiler at the same time and did not see anything out of the ordinary during the RST I HAVE SEARCHED HIGH AND LOW ON GOOGLE FOR A RESOLUTION FOR THIS ISSUE. NO LUCK! My questions are: What could be causing this? Wrong Sequence number? setting in a router or switch the network team may have over looked? Setting within Windows? Setting within SQL Server 2000 that I have over looked? Better way to utilize Wireshark to find more answers? RST is about 10 from the bottom. No. Time Source Destination Protocol Info 258 24.390708 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [SYN] Seq=0 Len=0 MSS=1260 259 24.401679 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460 260 24.401729 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0 261 24.402212 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=42 262 24.413335 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=1 Ack=43 Win=64198 Len=37 285 24.466512 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [ACK] Seq=43 Ack=38 Win=65498 [TCP CHECKSUM INCORRECT] Len=1260 286 24.466536 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1303 Ack=38 Win=65498 [TCP CHECKSUM INCORRECT] Len=437 289 24.478168 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [ACK] Seq=38 Ack=1740 Win=64240 Len=0 290 24.480078 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=38 Ack=1740 Win=64240 Len=385 293 24.493629 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1740 Ack=423 Win=65113 [TCP CHECKSUM INCORRECT] Len=60 294 24.504637 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=423 Ack=1800 Win=64180 Len=17 295 24.533197 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1800 Ack=440 Win=65096 [TCP CHECKSUM INCORRECT] Len=44 296 24.544098 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=440 Ack=1844 Win=64136 Len=17 297 24.544524 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1844 Ack=457 Win=65079 [TCP CHECKSUM INCORRECT] Len=58 298 24.558033 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=457 Ack=1902 Win=64078 Len=31 299 24.558493 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1902 Ack=488 Win=65048 [TCP CHECKSUM INCORRECT] Len=92 300 24.569984 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=488 Ack=1994 Win=63986 Len=70 301 24.577395 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [PSH, ACK] Seq=1994 Ack=558 Win=64978 [TCP CHECKSUM INCORRECT] Len=448 303 24.589834 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [PSH, ACK] Seq=558 Ack=2442 Win=63538 Len=64 304 24.590122 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [FIN, ACK] Seq=2442 Ack=622 Win=64914 [TCP CHECKSUM INCORRECT] Len=0 305 24.601094 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [ACK] Seq=622 Ack=2443 Win=63538 Len=0 306 24.601659 x.x.x.10 x.x.x.99 TCP 2226 > 14488 [FIN, ACK] Seq=622 Ack=2443 Win=63538 Len=0 307 24.601686 x.x.x.99 x.x.x.10 TCP 14488 > 2226 [ACK] Seq=2443 Ack=623 Win=64914 [TCP CHECKSUM INCORRECT] Len=0 321 25.839371 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [SYN] Seq=0 Len=0 MSS=1260 322 25.850291 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460 323 25.850321 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0 324 25.850660 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=42 325 25.861573 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=1 Ack=43 Win=64198 Len=37 326 25.863103 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [ACK] Seq=43 Ack=38 Win=65498 [TCP CHECKSUM INCORRECT] Len=1260 327 25.863130 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=1303 Ack=38 Win=65498 [TCP CHECKSUM INCORRECT] Len=463 328 25.874417 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [ACK] Seq=38 Ack=1766 Win=64240 Len=0 329 25.876315 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=38 Ack=1766 Win=64240 Len=385 330 25.876905 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=1766 Ack=423 Win=65113 [TCP CHECKSUM INCORRECT] Len=60 331 25.887773 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=423 Ack=1826 Win=64180 Len=17 332 25.888299 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=1826 Ack=440 Win=65096 [TCP CHECKSUM INCORRECT] Len=44 333 25.899169 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=440 Ack=1870 Win=64136 Len=17 334 25.899574 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=1870 Ack=457 Win=65079 [TCP CHECKSUM INCORRECT] Len=58 335 25.910618 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=457 Ack=1928 Win=64078 Len=31 336 25.911051 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=1928 Ack=488 Win=65048 [TCP CHECKSUM INCORRECT] Len=92 337 25.922068 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=488 Ack=2020 Win=63986 Len=70 338 25.922500 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2020 Ack=558 Win=64978 [TCP CHECKSUM INCORRECT] Len=34 339 25.933621 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=558 Ack=2054 Win=63952 Len=29 340 25.941165 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2054 Ack=587 Win=64949 [TCP CHECKSUM INCORRECT] Len=54 341 25.952164 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=587 Ack=2108 Win=63898 Len=17 342 25.952993 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2108 Ack=604 Win=64932 [TCP CHECKSUM INCORRECT] Len=72 343 25.963889 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=604 Ack=2180 Win=63826 Len=17 344 25.964366 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2180 Ack=621 Win=64915 [TCP CHECKSUM INCORRECT] Len=52 345 25.975253 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=621 Ack=2232 Win=63774 Len=17 346 25.975590 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2232 Ack=638 Win=64898 [TCP CHECKSUM INCORRECT] Len=32 347 25.986588 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=638 Ack=2264 Win=63742 Len=167 348 25.987262 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2264 Ack=805 Win=64731 [TCP CHECKSUM INCORRECT] Len=512 349 25.998464 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=805 Ack=2776 Win=63230 Len=89 350 25.998861 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2776 Ack=894 Win=64642 [TCP CHECKSUM INCORRECT] Len=46 351 26.009849 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=894 Ack=2822 Win=63184 Len=17 352 26.010175 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2822 Ack=911 Win=64625 [TCP CHECKSUM INCORRECT] Len=80 353 26.021220 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=911 Ack=2902 Win=63104 Len=33 354 26.022613 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [PSH, ACK] Seq=2902 Ack=944 Win=64592 [TCP CHECKSUM INCORRECT] Len=498 355 26.034018 x.x.x.10 x.x.x.99 TCP 2226 > 14492 [PSH, ACK] Seq=944 Ack=3400 Win=64240 Len=89 356 26.046501 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [SYN] Seq=0 Len=0 MSS=1260 357 26.057323 x.x.x.10 x.x.x.99 TCP 2226 > 14493 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460 358 26.057355 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0 359 26.057661 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [PSH, ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=42 361 26.068606 x.x.x.10 x.x.x.99 TCP 2226 > 14493 [PSH, ACK] Seq=1 Ack=43 Win=64198 Len=37 362 26.070087 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [ACK] Seq=43 Ack=38 Win=65498 [TCP CHECKSUM INCORRECT] Len=1260 363 26.070113 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [PSH, ACK] Seq=1303 Ack=38 Win=65498 [TCP CHECKSUM INCORRECT] Len=485 364 26.081336 x.x.x.10 x.x.x.99 TCP 2226 > 14493 [ACK] Seq=38 Ack=1788 Win=64240 Len=0 365 26.083330 x.x.x.10 x.x.x.99 TCP 2226 > 14493 [PSH, ACK] Seq=38 Ack=1788 Win=64240 Len=385 366 26.083943 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [PSH, ACK] Seq=1788 Ack=423 Win=65113 [TCP CHECKSUM INCORRECT] Len=46 368 26.094921 x.x.x.10 x.x.x.99 TCP 2226 > 14493 [PSH, ACK] Seq=423 Ack=1834 Win=64194 Len=17 369 26.095317 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [PSH, ACK] Seq=1834 Ack=440 Win=65096 [TCP CHECKSUM INCORRECT] Len=48 370 26.107553 x.x.x.10 x.x.x.99 TCP 2226 > 14493 [PSH, ACK] Seq=440 Ack=1882 Win=64146 Len=877 371 26.241285 x.x.x.99 x.x.x.10 TCP 14492 > 2226 [ACK] Seq=3400 Ack=1033 Win=64503 [TCP CHECKSUM INCORRECT] Len=0 372 26.241307 x.x.x.99 x.x.x.10 TCP 14493 > 2226 [ACK] Seq=1882 Ack=1317 Win=65535 [TCP CHECKSUM INCORRECT] Len=0 653 55.913838 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 > 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 654 55.924547 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 > 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 910 85.887176 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 > 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 911 85.898010 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 > 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 1155 115.859520 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 1156 115.870285 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 1395 145.934403 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 1396 145.945938 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 1649 175.906767 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 1650 175.917741 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 1887 205.881080 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 1888 205.891818 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 2112 235.854408 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 2113 235.865482 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 2398 265.928342 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 2399 265.939242 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 2671 295.900714 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 2672 295.911590 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 2880 315.705029 x.x.x.10 x.x.x.99 TCP 2226 14493 [RST] Seq=1317 Len=0 2973 325.975607 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive] 14492 2226 [ACK] Seq=3399 Ack=1033 Win=64503 Len=1 2974 325.986337 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive ACK] 2226 14492 [ACK] Seq=1033 Ack=3400 Win=64240 Len=0 2975 326.154327 x.x.x.10 x.x.x.99 TCP [TCP Keep-Alive] 2226 14492 [ACK] Seq=1032 Ack=3400 Win=64240 Len=1 2976 326.154350 x.x.x.99 x.x.x.10 TCP [TCP Keep-Alive ACK] 14492 2226 [ACK] Seq=3400 Ack=1033 Win=64503 [TCP CHECKSUM INCORRECT] Len=0

Read the article
TCP RST right after FIN/ACK

- by Nitzan Shaked

I am having the weirdest issue: I have a web server which sometimes, only on very specific requests, will send a RST to the client after having sent the FIN datagram. First, a description of the setup: The server runs on an Ubuntu 12.04.1 LTS, which itself is a VM guest inside a Win7 x64 host, in bridged mode. ufw is disabled on the host The client runs on a iOS simulator, which runs on OS X Mountain Lion, which is a VM guest (hackintosh) inside a Win7 x64 host, in bridged mode. Both client and server are on the same LAN, one is connected to the home router via an Ethernet cable, and then other thru WiFi. I happened to glimpse over the server's http logs and found that the client sometimes issuing multiple subsequent identical requests. Further investigation led me to discover that this happens when the server sends a RST, and that the client is simply re-trying. I am attaching several tcpdump's: Good1 is the server-side tcpdump of a good session ("good" meaning no RST was generated). Good3 is another sever-side tcpdump of a good session. (The difference between Good1 and Good3 is the order in which ACK's were sent from the server to the client, ACK'ing the client's request. The client's request arives in 2 segements (specifically: one for the http headers, and another for a body containing an empty json object, "{}"). In Good1, the server ACK's both request segments, using 2 ACK segments, after the second request has arrived. In Good3, the server ACK's each request segment with an ACK segment as soon as the request segment arrives. Not that it should make a difference.) Bad1 is a dump, both client- and server-side, of a bad session. Bad2 is another bad session, this time server-side only. Note that in all "bad" sessions, the server ACK's each request segments immediately after having received it. I've looked at a few other bad sessions, and the situation is the same in all of them. But this is also the behavior in "Good3", so I don't see how that observation helps me, of for that matter why it should matter. I can't find any difference between good and bad sessions, or at least one that I think should matter. My question is: why are those RST's being generated? Or at least: how do I go about debugging this, or providing more info here that'll help? Edit 2 new facts that I have learned: Section 4.2.2.13 of the RFC (1122) (and Wikipedia, in the article "TCP", under "Connection Termination") says that a TCP application on one host may close the connection before it has read all of the data in its socket buffer, and in such a case the TCP on the host will sent a RST to the other side, to let it know that not all the data it has sent has been read. I'm not sure I completely understand this, since closing my side of the connection still allows me to read, no? It also means that I can't write any more. I am not sure this is relevant, though, since I see a RST after FIN. There are multiple complaints of this happening with wsgiref (Python's dev-mode HTTP server), which is exactly what I'm using. I'll keep updating as I find out more. Thanks! ~~~~~~~~~~~~~~~~~~~~ Good1 -- Server Side ~~~~~~~~~~~~~~~~~~~~ 13:28:02.308319 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [S], seq 94268074, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 943308864 ecr 0,sackOK,eol], length 0 13:28:02.308336 IP 192.168.1.132.5000 > 192.168.1.51.51479: Flags [S.], seq 1726304574, ack 94268075, win 14480, options [mss 1460,sackOK,TS val 326480982 ecr 943308864,nop,wscale 3], length 0 13:28:02.309750 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [.], ack 1, win 8235, options [nop,nop,TS val 943308865 ecr 326480982], length 0 13:28:02.310744 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [P.], seq 1:351, ack 1, win 8235, options [nop,nop,TS val 943308865 ecr 326480982], length 350 13:28:02.310766 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [P.], seq 351:353, ack 1, win 8235, options [nop,nop,TS val 943308865 ecr 326480982], length 2 13:28:02.310841 IP 192.168.1.132.5000 > 192.168.1.51.51479: Flags [.], ack 351, win 1944, options [nop,nop,TS val 326480983 ecr 943308865], length 0 13:28:02.310918 IP 192.168.1.132.5000 > 192.168.1.51.51479: Flags [.], ack 353, win 1944, options [nop,nop,TS val 326480983 ecr 943308865], length 0 13:28:02.315931 IP 192.168.1.132.5000 > 192.168.1.51.51479: Flags [P.], seq 1:18, ack 353, win 1944, options [nop,nop,TS val 326480984 ecr 943308865], length 17 13:28:02.316107 IP 192.168.1.132.5000 > 192.168.1.51.51479: Flags [FP.], seq 18:684, ack 353, win 1944, options [nop,nop,TS val 326480984 ecr 943308865], length 666 13:28:02.317651 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [.], ack 18, win 8234, options [nop,nop,TS val 943308872 ecr 326480984], length 0 13:28:02.318288 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [.], ack 685, win 8192, options [nop,nop,TS val 943308872 ecr 326480984], length 0 13:28:02.318640 IP 192.168.1.51.51479 > 192.168.1.132.5000: Flags [F.], seq 353, ack 685, win 8192, options [nop,nop,TS val 943308872 ecr 326480984], length 0 13:28:02.318651 IP 192.168.1.132.5000 > 192.168.1.51.51479: Flags [.], ack 354, win 1944, options [nop,nop,TS val 326480985 ecr 943308872], length 0 ~~~~~~~~~~~~~~~~~~~~ Good3 -- Server Side ~~~~~~~~~~~~~~~~~~~~ 13:28:03.311143 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [S], seq 1982901126, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 943309853 ecr 0,sackOK,eol], length 0 13:28:03.311155 IP 192.168.1.132.5000 > 192.168.1.51.51486: Flags [S.], seq 2245063571, ack 1982901127, win 14480, options [mss 1460,sackOK,TS val 326481233 ecr 943309853,nop,wscale 3], length 0 13:28:03.312671 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [.], ack 1, win 8235, options [nop,nop,TS val 943309854 ecr 326481233], length 0 13:28:03.313330 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [P.], seq 1:351, ack 1, win 8235, options [nop,nop,TS val 943309855 ecr 326481233], length 350 13:28:03.313337 IP 192.168.1.132.5000 > 192.168.1.51.51486: Flags [.], ack 351, win 1944, options [nop,nop,TS val 326481234 ecr 943309855], length 0 13:28:03.313342 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [P.], seq 351:353, ack 1, win 8235, options [nop,nop,TS val 943309855 ecr 326481233], length 2 13:28:03.313346 IP 192.168.1.132.5000 > 192.168.1.51.51486: Flags [.], ack 353, win 1944, options [nop,nop,TS val 326481234 ecr 943309855], length 0 13:28:03.327942 IP 192.168.1.132.5000 > 192.168.1.51.51486: Flags [P.], seq 1:18, ack 353, win 1944, options [nop,nop,TS val 326481237 ecr 943309855], length 17 13:28:03.328253 IP 192.168.1.132.5000 > 192.168.1.51.51486: Flags [FP.], seq 18:684, ack 353, win 1944, options [nop,nop,TS val 326481237 ecr 943309855], length 666 13:28:03.329076 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [.], ack 18, win 8234, options [nop,nop,TS val 943309868 ecr 326481237], length 0 13:28:03.329688 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [.], ack 685, win 8192, options [nop,nop,TS val 943309868 ecr 326481237], length 0 13:28:03.330361 IP 192.168.1.51.51486 > 192.168.1.132.5000: Flags [F.], seq 353, ack 685, win 8192, options [nop,nop,TS val 943309869 ecr 326481237], length 0 13:28:03.330370 IP 192.168.1.132.5000 > 192.168.1.51.51486: Flags [.], ack 354, win 1944, options [nop,nop,TS val 326481238 ecr 943309869], length 0 ~~~~~~~~~~~~~~~~~~~~ Bad1 -- Server Side ~~~~~~~~~~~~~~~~~~~~ 13:28:01.311876 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [S], seq 920400580, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 943307883 ecr 0,sackOK,eol], length 0 13:28:01.311896 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [S.], seq 3103085782, ack 920400581, win 14480, options [mss 1460,sackOK,TS val 326480733 ecr 943307883,nop,wscale 3], length 0 13:28:01.313509 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [.], ack 1, win 8235, options [nop,nop,TS val 943307884 ecr 326480733], length 0 13:28:01.315614 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [P.], seq 1:351, ack 1, win 8235, options [nop,nop,TS val 943307886 ecr 326480733], length 350 13:28:01.315727 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [.], ack 351, win 1944, options [nop,nop,TS val 326480734 ecr 943307886], length 0 13:28:01.316229 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [P.], seq 351:353, ack 1, win 8235, options [nop,nop,TS val 943307886 ecr 326480733], length 2 13:28:01.316242 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [.], ack 353, win 1944, options [nop,nop,TS val 326480734 ecr 943307886], length 0 13:28:01.321019 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [P.], seq 1:18, ack 353, win 1944, options [nop,nop,TS val 326480735 ecr 943307886], length 17 13:28:01.321294 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [FP.], seq 18:684, ack 353, win 1944, options [nop,nop,TS val 326480736 ecr 943307886], length 666 13:28:01.321386 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [R.], seq 685, ack 353, win 1944, options [nop,nop,TS val 326480736 ecr 943307886], length 0 13:28:01.322727 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [.], ack 18, win 8234, options [nop,nop,TS val 943307891 ecr 326480735], length 0 13:28:01.322733 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [R], seq 3103085800, win 0, length 0 13:28:01.323221 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [.], ack 685, win 8192, options [nop,nop,TS val 943307892 ecr 326480736], length 0 13:28:01.323231 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [R], seq 3103086467, win 0, length 0 ~~~~~~~~~~~~~~~~~~~~ Bad1 -- Client Side ~~~~~~~~~~~~~~~~~~~~ 13:28:11.374654 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [S], seq 920400580, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 943307883 ecr 0,sackOK,eol], length 0 13:28:11.375764 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [S.], seq 3103085782, ack 920400581, win 14480, options [mss 1460,sackOK,TS val 326480733 ecr 943307883,nop,wscale 3], length 0 13:28:11.376352 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [.], ack 1, win 8235, options [nop,nop,TS val 943307884 ecr 326480733], length 0 13:28:11.378252 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [P.], seq 1:351, ack 1, win 8235, options [nop,nop,TS val 943307886 ecr 326480733], length 350 13:28:11.379027 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [P.], seq 351:353, ack 1, win 8235, options [nop,nop,TS val 943307886 ecr 326480733], length 2 13:28:11.379732 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [.], ack 351, win 1944, options [nop,nop,TS val 326480734 ecr 943307886], length 0 13:28:11.380592 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [.], ack 353, win 1944, options [nop,nop,TS val 326480734 ecr 943307886], length 0 13:28:11.384968 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [P.], seq 1:18, ack 353, win 1944, options [nop,nop,TS val 326480735 ecr 943307886], length 17 13:28:11.385044 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [.], ack 18, win 8234, options [nop,nop,TS val 943307891 ecr 326480735], length 0 13:28:11.385586 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [FP.], seq 18:684, ack 353, win 1944, options [nop,nop,TS val 326480736 ecr 943307886], length 666 13:28:11.385743 IP 192.168.1.51.51472 > 192.168.1.132.5000: Flags [.], ack 685, win 8192, options [nop,nop,TS val 943307892 ecr 326480736], length 0 13:28:11.385966 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [R.], seq 685, ack 353, win 1944, options [nop,nop,TS val 326480736 ecr 943307886], length 0 13:28:11.387343 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [R], seq 3103085800, win 0, length 0 13:28:11.387344 IP 192.168.1.132.5000 > 192.168.1.51.51472: Flags [R], seq 3103086467, win 0, length 0 ~~~~~~~~~~~~~~~~~~~~ Bad2 -- Server Side ~~~~~~~~~~~~~~~~~~~~ 13:28:01.319185 IP 192.168.1.51.51473 > 192.168.1.132.5000: Flags [S], seq 1631526992, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 943307889 ecr 0,sackOK,eol], length 0 13:28:01.319197 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [S.], seq 2524685719, ack 1631526993, win 14480, options [mss 1460,sackOK,TS val 326480735 ecr 943307889,nop,wscale 3], length 0 13:28:01.320692 IP 192.168.1.51.51473 > 192.168.1.132.5000: Flags [.], ack 1, win 8235, options [nop,nop,TS val 943307890 ecr 326480735], length 0 13:28:01.322219 IP 192.168.1.51.51473 > 192.168.1.132.5000: Flags [P.], seq 1:351, ack 1, win 8235, options [nop,nop,TS val 943307890 ecr 326480735], length 350 13:28:01.322336 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [.], ack 351, win 1944, options [nop,nop,TS val 326480736 ecr 943307890], length 0 13:28:01.322689 IP 192.168.1.51.51473 > 192.168.1.132.5000: Flags [P.], seq 351:353, ack 1, win 8235, options [nop,nop,TS val 943307890 ecr 326480735], length 2 13:28:01.322700 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [.], ack 353, win 1944, options [nop,nop,TS val 326480736 ecr 943307890], length 0 13:28:01.326307 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [P.], seq 1:18, ack 353, win 1944, options [nop,nop,TS val 326480737 ecr 943307890], length 17 13:28:01.326614 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [FP.], seq 18:684, ack 353, win 1944, options [nop,nop,TS val 326480737 ecr 943307890], length 666 13:28:01.326710 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [R.], seq 685, ack 353, win 1944, options [nop,nop,TS val 326480737 ecr 943307890], length 0 13:28:01.328499 IP 192.168.1.51.51473 > 192.168.1.132.5000: Flags [.], ack 18, win 8234, options [nop,nop,TS val 943307896 ecr 326480737], length 0 13:28:01.328509 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [R], seq 2524685737, win 0, length 0 13:28:01.328514 IP 192.168.1.51.51473 > 192.168.1.132.5000: Flags [.], ack 685, win 8192, options [nop,nop,TS val 943307896 ecr 326480737], length 0 13:28:01.328517 IP 192.168.1.132.5000 > 192.168.1.51.51473: Flags [R], seq 2524686404, win 0, length 0

Read the article
Optimized OCR black/white pixel algorithm

- by eagle

I am writing a simple OCR solution for a finite set of characters. That is, I know the exact way all 26 letters in the alphabet will look like. I am using C# and am able to easily determine if a given pixel should be treated as black or white. I am generating a matrix of black/white pixels for every single character. So for example, the letter I (capital i), might look like the following: 01110 00100 00100 00100 01110 Note: all points, which I use later in this post, assume that the top left pixel is (0, 0), bottom right pixel is (4, 4). 1's represent black pixels, and 0's represent white pixels. I would create a corresponding matrix in C# like this: CreateLetter("I", new List<List<bool>>() { new List<bool>() { false, true, true, true, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, true, true, true, false } }); I know I could probably optimize this part by using a multi-dimensional array instead, but let's ignore that for now, this is for illustrative purposes. Every letter is exactly the same dimensions, 10px by 11px (10px by 11px is the actual dimensions of a character in my real program. I simplified this to 5px by 5px in this posting since it is much easier to "draw" the letters using 0's and 1's on a smaller image). Now when I give it a 10px by 11px part of an image to analyze with OCR, it would need to run on every single letter (26) on every single pixel (10 * 11 = 110) which would mean 2,860 (26 * 110) iterations (in the worst case) for every single character. I was thinking this could be optimized by defining the unique characteristics of every character. So, for example, let's assume that the set of characters only consists of 5 distinct letters: I, A, O, B, and L. These might look like the following: 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 After analyzing the unique characteristics of every character, I can significantly reduce the number of tests that need to be performed to test for a character. For example, for the "I" character, I could define it's unique characteristics as having a black pixel in the coordinate (3, 0) since no other characters have that pixel as black. So instead of testing 110 pixels for a match on the "I" character, I reduced it to a 1 pixel test. This is what it might look like for all these characters: var LetterI = new OcrLetter() { Name = "I", BlackPixels = new List<Point>() { new Point (3, 0) } } var LetterA = new OcrLetter() { Name = "A", WhitePixels = new List<Point>() { new Point(2, 4) } } var LetterO = new OcrLetter() { Name = "O", BlackPixels = new List<Point>() { new Point(3, 2) }, WhitePixels = new List<Point>() { new Point(2, 2) } } var LetterB = new OcrLetter() { Name = "B", BlackPixels = new List<Point>() { new Point(3, 1) }, WhitePixels = new List<Point>() { new Point(3, 2) } } var LetterL = new OcrLetter() { Name = "L", BlackPixels = new List<Point>() { new Point(1, 1), new Point(3, 4) }, WhitePixels = new List<Point>() { new Point(2, 2) } } This is challenging to do manually for 5 characters and gets much harder the greater the amount of letters that are added. You also want to guarantee that you have the minimum set of unique characteristics of a letter since you want it to be optimized as much as possible. I want to create an algorithm that will identify the unique characteristics of all the letters and would generate similar code to that above. I would then use this optimized black/white matrix to identify characters. How do I take the 26 letters that have all their black/white pixels filled in (e.g. the CreateLetter code block) and convert them to an optimized set of unique characteristics that define a letter (e.g. the new OcrLetter() code block)? And how would I guarantee that it is the most efficient definition set of unique characteristics (e.g. instead of defining 6 points as the unique characteristics, there might be a way to do it with 1 or 2 points, as the letter "I" in my example was able to). An alternative solution I've come up with is using a hash table, which will reduce it from 2,860 iterations to 110 iterations, a 26 time reduction. This is how it might work: I would populate it with data similar to the following: Letters["01110 00100 00100 00100 01110"] = "I"; Letters["00100 01010 01110 01010 01010"] = "A"; Letters["00100 01010 01010 01010 00100"] = "O"; Letters["01100 01010 01100 01010 01100"] = "B"; Now when I reach a location in the image to process, I convert it to a string such as: "01110 00100 00100 00100 01110" and simply find it in the hash table. This solution seems very simple, however, this still requires 110 iterations to generate this string for each letter. In big O notation, the algorithm is the same since O(110N) = O(2860N) = O(N) for N letters to process on the page. However, it is still improved by a constant factor of 26, a significant improvement (e.g. instead of it taking 26 minutes, it would take 1 minute). Update: Most of the solutions provided so far have not addressed the issue of identifying the unique characteristics of a character and rather provide alternative solutions. I am still looking for this solution which, as far as I can tell, is the only way to achieve the fastest OCR processing. I just came up with a partial solution: For each pixel, in the grid, store the letters that have it as a black pixel. Using these letters: I A O B L 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 You would have something like this: CreatePixel(new Point(0, 0), new List<Char>() { }); CreatePixel(new Point(1, 0), new List<Char>() { 'I', 'B', 'L' }); CreatePixel(new Point(2, 0), new List<Char>() { 'I', 'A', 'O', 'B' }); CreatePixel(new Point(3, 0), new List<Char>() { 'I' }); CreatePixel(new Point(4, 0), new List<Char>() { }); CreatePixel(new Point(0, 1), new List<Char>() { }); CreatePixel(new Point(1, 1), new List<Char>() { 'A', 'B', 'L' }); CreatePixel(new Point(2, 1), new List<Char>() { 'I' }); CreatePixel(new Point(3, 1), new List<Char>() { 'A', 'O', 'B' }); // ... CreatePixel(new Point(2, 2), new List<Char>() { 'I', 'A', 'B' }); CreatePixel(new Point(3, 2), new List<Char>() { 'A', 'O' }); // ... CreatePixel(new Point(2, 4), new List<Char>() { 'I', 'O', 'B', 'L' }); CreatePixel(new Point(3, 4), new List<Char>() { 'I', 'A', 'L' }); CreatePixel(new Point(4, 4), new List<Char>() { }); Now for every letter, in order to find the unique characteristics, you need to look at which buckets it belongs to, as well as the amount of other characters in the bucket. So let's take the example of "I". We go to all the buckets it belongs to (1,0; 2,0; 3,0; ...; 3,4) and see that the one with the least amount of other characters is (3,0). In fact, it only has 1 character, meaning it must be an "I" in this case, and we found our unique characteristic. You can also do the same for pixels that would be white. Notice that bucket (2,0) contains all the letters except for "L", this means that it could be used as a white pixel test. Similarly, (2,4) doesn't contain an 'A'. Buckets that either contain all the letters or none of the letters can be discarded immediately, since these pixels can't help define a unique characteristic (e.g. 1,1; 4,0; 0,1; 4,4). It gets trickier when you don't have a 1 pixel test for a letter, for example in the case of 'O' and 'B'. Let's walk through the test for 'O'... It's contained in the following buckets: // Bucket Count Letters // 2,0 4 I, A, O, B // 3,1 3 A, O, B // 3,2 2 A, O // 2,4 4 I, O, B, L Additionally, we also have a few white pixel tests that can help: (I only listed those that are missing at most 2). The Missing Count was calculated as (5 - Bucket.Count). // Bucket Missing Count Missing Letters // 1,0 2 A, O // 1,1 2 I, O // 2,2 2 O, L // 3,4 2 O, B So now we can take the shortest black pixel bucket (3,2) and see that when we test for (3,2) we know it is either an 'A' or an 'O'. So we need an easy way to tell the difference between an 'A' and an 'O'. We could either look for a black pixel bucket that contains 'O' but not 'A' (e.g. 2,4) or a white pixel bucket that contains an 'O' but not an 'A' (e.g. 1,1). Either of these could be used in combination with the (3,2) pixel to uniquely identify the letter 'O' with only 2 tests. This seems like a simple algorithm when there are 5 characters, but how would I do this when there are 26 letters and a lot more pixels overlapping? For example, let's say that after the (3,2) pixel test, it found 10 different characters that contain the pixel (and this was the least from all the buckets). Now I need to find differences from 9 other characters instead of only 1 other character. How would I achieve my goal of getting the least amount of checks as possible, and ensure that I am not running extraneous tests?

Read the article
Agile Development

- by James Oloo Onyango

Alot of literature has and is being written about agile developement and its surrounding philosophies. In my quest to find the best way to express the importance of agile methodologies, i have found Robert C. Martin's "A Satire Of Two Companies" to be both the most concise and thorough! Enjoy the read! Rufus Inc Project Kick Off Your name is Bob. The date is January 3, 2001, and your head still aches from the recent millennial revelry. You are sitting in a conference room with several managers and a group of your peers. You are a project team leader. Your boss is there, and he has brought along all of his team leaders. His boss called the meeting. "We have a new project to develop," says your boss's boss. Call him BB. The points in his hair are so long that they scrape the ceiling. Your boss's points are just starting to grow, but he eagerly awaits the day when he can leave Brylcream stains on the acoustic tiles. BB describes the essence of the new market they have identified and the product they want to develop to exploit this market. "We must have this new project up and working by fourth quarter October 1," BB demands. "Nothing is of higher priority, so we are cancelling your current project." The reaction in the room is stunned silence. Months of work are simply going to be thrown away. Slowly, a murmur of objection begins to circulate around the conference table. His points give off an evil green glow as BB meets the eyes of everyone in the room. One by one, that insidious stare reduces each attendee to quivering lumps of protoplasm. It is clear that he will brook no discussion on this matter. Once silence has been restored, BB says, "We need to begin immediately. How long will it take you to do the analysis?" You raise your hand. Your boss tries to stop you, but his spitwad misses you and you are unaware of his efforts. "Sir, we can't tell you how long the analysis will take until we have some requirements." "The requirements document won't be ready for 3 or 4 weeks," BB says, his points vibrating with frustration. "So, pretend that you have the requirements in front of you now. How long will you require for analysis?" No one breathes. Everyone looks around to see whether anyone has some idea. "If analysis goes beyond April 1, we have a problem. Can you finish the analysis by then?" Your boss visibly gathers his courage: "We'll find a way, sir!" His points grow 3 mm, and your headache increases by two Tylenol. "Good." BB smiles. "Now, how long will it take to do the design?" "Sir," you say. Your boss visibly pales. He is clearly worried that his 3 mms are at risk. "Without an analysis, it will not be possible to tell you how long design will take." BB's expression shifts beyond austere. "PRETEND you have the analysis already!" he says, while fixing you with his vacant, beady little eyes. "How long will it take you to do the design?" Two Tylenol are not going to cut it. Your boss, in a desperate attempt to save his new growth, babbles: "Well, sir, with only six months left to complete the project, design had better take no longer than 3 months." "I'm glad you agree, Smithers!" BB says, beaming. Your boss relaxes. He knows his points are secure. After a while, he starts lightly humming the Brylcream jingle. BB continues, "So, analysis will be complete by April 1, design will be complete by July 1, and that gives you 3 months to implement the project. This meeting is an example of how well our new consensus and empowerment policies are working. Now, get out there and start working. I'll expect to see TQM plans and QIT assignments on my desk by next week. Oh, and don't forget that your crossfunctional team meetings and reports will be needed for next month's quality audit." "Forget the Tylenol," you think to yourself as you return to your cubicle. "I need bourbon." Visibly excited, your boss comes over to you and says, "Gosh, what a great meeting. I think we're really going to do some world shaking with this project." You nod in agreement, too disgusted to do anything else. "Oh," your boss continues, "I almost forgot." He hands you a 30-page document. "Remember that the SEI is coming to do an evaluation next week. This is the evaluation guide. You need to read through it, memorize it, and then shred it. It tells you how to answer any questions that the SEI auditors ask you. It also tells you what parts of the building you are allowed to take them to and what parts to avoid. We are determined to be a CMM level 3 organization by June!" You and your peers start working on the analysis of the new project. This is difficult because you have no requirements. But from the 10-minute introduction given by BB on that fateful morning, you have some idea of what the product is supposed to do. Corporate process demands that you begin by creating a use case document. You and your team begin enumerating use cases and drawing oval and stick diagrams. Philosophical debates break out among the team members. There is disagreement as to whether certain use cases should be connected with <<extends>> or <<includes>> relationships. Competing models are created, but nobody knows how to evaluate them. The debate continues, effectively paralyzing progress. After a week, somebody finds the iceberg.com Web site, which recommends disposing entirely of <<extends>> and <<includes>> and replacing them with <<precedes>> and <<uses>>. The documents on this Web site, authored by Don Sengroiux, describes a method known as stalwart-analysis, which claims to be a step-by-step method for translating use cases into design diagrams. More competing use case models are created using this new scheme, but again, people can't agree on how to evaluate them. The thrashing continues. More and more, the use case meetings are driven by emotion rather than by reason. If it weren't for the fact that you don't have requirements, you'd be pretty upset by the lack of progress you are making. The requirements document arrives on February 15. And then again on February 20, 25, and every week thereafter. Each new version contradicts the previous one. Clearly, the marketing folks who are writing the requirements, empowered though they might be, are not finding consensus. At the same time, several new competing use case templates have been proposed by the various team members. Each template presents its own particularly creative way of delaying progress. The debates rage on. On March 1, Prudence Putrigence, the process proctor, succeeds in integrating all the competing use case forms and templates into a single, all-encompassing form. Just the blank form is 15 pages long. She has managed to include every field that appeared on all the competing templates. She also presents a 159- page document describing how to fill out the use case form. All current use cases must be rewritten according to the new standard. You marvel to yourself that it now requires 15 pages of fill-in-the-blank and essay questions to answer the question: What should the system do when the user presses Return? The corporate process (authored by L. E. Ott, famed author of "Holistic Analysis: A Progressive Dialectic for Software Engineers") insists that you discover all primary use cases, 87 percent of all secondary use cases, and 36.274 percent of all tertiary use cases before you can complete analysis and enter the design phase. You have no idea what a tertiary use case is. So in an attempt to meet this requirement, you try to get your use case document reviewed by the marketing department, which you hope will know what a tertiary use case is. Unfortunately, the marketing folks are too busy with sales support to talk to you. Indeed, since the project started, you have not been able to get a single meeting with marketing, which has provided a never-ending stream of changing and contradictory requirements documents. While one team has been spinning endlessly on the use case document, another team has been working out the domain model. Endless variations of UML documents are pouring out of this team. Every week, the model is reworked. The team members can't decide whether to use <<interfaces>> or <<types>> in the model. A huge disagreement has been raging on the proper syntax and application of OCL. Others on the team just got back from a 5-day class on catabolism, and have been producing incredibly detailed and arcane diagrams that nobody else can fathom. On March 27, with one week to go before analysis is to be complete, you have produced a sea of documents and diagrams but are no closer to a cogent analysis of the problem than you were on January 3. **** And then, a miracle happens. **** On Saturday, April 1, you check your e-mail from home. You see a memo from your boss to BB. It states unequivocally that you are done with the analysis! You phone your boss and complain. "How could you have told BB that we were done with the analysis?" "Have you looked at a calendar lately?" he responds. "It's April 1!" The irony of that date does not escape you. "But we have so much more to think about. So much more to analyze! We haven't even decided whether to use <<extends>> or <<precedes>>!" "Where is your evidence that you are not done?" inquires your boss, impatiently. "Whaaa . . . ." But he cuts you off. "Analysis can go on forever; it has to be stopped at some point. And since this is the date it was scheduled to stop, it has been stopped. Now, on Monday, I want you to gather up all existing analysis materials and put them into a public folder. Release that folder to Prudence so that she can log it in the CM system by Monday afternoon. Then get busy and start designing." As you hang up the phone, you begin to consider the benefits of keeping a bottle of bourbon in your bottom desk drawer. They threw a party to celebrate the on-time completion of the analysis phase. BB gave a colon-stirring speech on empowerment. And your boss, another 3 mm taller, congratulated his team on the incredible show of unity and teamwork. Finally, the CIO takes the stage to tell everyone that the SEI audit went very well and to thank everyone for studying and shredding the evaluation guides that were passed out. Level 3 now seems assured and will be awarded by June. (Scuttlebutt has it that managers at the level of BB and above are to receive significant bonuses once the SEI awards level 3.) As the weeks flow by, you and your team work on the design of the system. Of course, you find that the analysis that the design is supposedly based on is flawedno, useless; no, worse than useless. But when you tell your boss that you need to go back and work some more on the analysis to shore up its weaker sections, he simply states, "The analysis phase is over. The only allowable activity is design. Now get back to it." So, you and your team hack the design as best you can, unsure of whether the requirements have been properly analyzed. Of course, it really doesn't matter much, since the requirements document is still thrashing with weekly revisions, and the marketing department still refuses to meet with you. The design is a nightmare. Your boss recently misread a book named The Finish Line in which the author, Mark DeThomaso, blithely suggested that design documents should be taken down to code-level detail. "If we are going to be working at that level of detail," you ask, "why don't we simply write the code instead?" "Because then you wouldn't be designing, of course. And the only allowable activity in the design phase is design!" "Besides," he continues, "we have just purchased a companywide license for Dandelion! This tool enables 'Round the Horn Engineering!' You are to transfer all design diagrams into this tool. It will automatically generate our code for us! It will also keep the design diagrams in sync with the code!" Your boss hands you a brightly colored shrinkwrapped box containing the Dandelion distribution. You accept it numbly and shuffle off to your cubicle. Twelve hours, eight crashes, one disk reformatting, and eight shots of 151 later, you finally have the tool installed on your server. You consider the week your team will lose while attending Dandelion training. Then you smile and think, "Any week I'm not here is a good week." Design diagram after design diagram is created by your team. Dandelion makes it very difficult to draw these diagrams. There are dozens and dozens of deeply nested dialog boxes with funny text fields and check boxes that must all be filled in correctly. And then there's the problem of moving classes between packages. At first, these diagram are driven from the use cases. But the requirements are changing so often that the use cases rapidly become meaningless. Debates rage about whether VISITOR or DECORATOR design patterns should be used. One developer refuses to use VISITOR in any form, claiming that it's not a properly object-oriented construct. Someone refuses to use multiple inheritance, since it is the spawn of the devil. Review meetings rapidly degenerate into debates about the meaning of object orientation, the definition of analysis versus design, or when to use aggregation versus association. Midway through the design cycle, the marketing folks announce that they have rethought the focus of the system. Their new requirements document is completely restructured. They have eliminated several major feature areas and replaced them with feature areas that they anticipate customer surveys will show to be more appropriate. You tell your boss that these changes mean that you need to reanalyze and redesign much of the system. But he says, "The analysis phase is system. But he says, "The analysis phase is over. The only allowable activity is design. Now get back to it." You suggest that it might be better to create a simple prototype to show to the marketing folks and even some potential customers. But your boss says, "The analysis phase is over. The only allowable activity is design. Now get back to it." Hack, hack, hack, hack. You try to create some kind of a design document that might reflect the new requirements documents. However, the revolution of the requirements has not caused them to stop thrashing. Indeed, if anything, the wild oscillations of the requirements document have only increased in frequency and amplitude. You slog your way through them. On June 15, the Dandelion database gets corrupted. Apparently, the corruption has been progressive. Small errors in the DB accumulated over the months into bigger and bigger errors. Eventually, the CASE tool just stopped working. Of course, the slowly encroaching corruption is present on all the backups. Calls to the Dandelion technical support line go unanswered for several days. Finally, you receive a brief e-mail from Dandelion, informing you that this is a known problem and that the solution is to purchase the new version, which they promise will be ready some time next quarter, and then reenter all the diagrams by hand. **** Then, on July 1 another miracle happens! You are done with the design! Rather than go to your boss and complain, you stock your middle desk drawer with some vodka. **** They threw a party to celebrate the on-time completion of the design phase and their graduation to CMM level 3. This time, you find BB's speech so stirring that you have to use the restroom before it begins. New banners and plaques are all over your workplace. They show pictures of eagles and mountain climbers, and they talk about teamwork and empowerment. They read better after a few scotches. That reminds you that you need to clear out your file cabinet to make room for the brandy. You and your team begin to code. But you rapidly discover that the design is lacking in some significant areas. Actually, it's lacking any significance at all. You convene a design session in one of the conference rooms to try to work through some of the nastier problems. But your boss catches you at it and disbands the meeting, saying, "The design phase is over. The only allowable activity is coding. Now get back to it." **** The code generated by Dandelion is really hideous. It turns out that you and your team were using association and aggregation the wrong way, after all. All the generated code has to be edited to correct these flaws. Editing this code is extremely difficult because it has been instrumented with ugly comment blocks that have special syntax that Dandelion needs in order to keep the diagrams in sync with the code. If you accidentally alter one of these comments, the diagrams will be regenerated incorrectly. It turns out that "Round the Horn Engineering" requires an awful lot of effort. The more you try to keep the code compatible with Dandelion, the more errors Dandelion generates. In the end, you give up and decide to keep the diagrams up to date manually. A second later, you decide that there's no point in keeping the diagrams up to date at all. Besides, who has time? Your boss hires a consultant to build tools to count the number of lines of code that are being produced. He puts a big thermometer graph on the wall with the number 1,000,000 on the top. Every day, he extends the red line to show how many lines have been added. Three days after the thermometer appears on the wall, your boss stops you in the hall. "That graph isn't growing quickly enough. We need to have a million lines done by October 1." "We aren't even sh-sh-sure that the proshect will require a m-million linezh," you blather. "We have to have a million lines done by October 1," your boss reiterates. His points have grown again, and the Grecian formula he uses on them creates an aura of authority and competence. "Are you sure your comment blocks are big enough?" Then, in a flash of managerial insight, he says, "I have it! I want you to institute a new policy among the engineers. No line of code is to be longer than 20 characters. Any such line must be split into two or more preferably more. All existing code needs to be reworked to this standard. That'll get our line count up!" You decide not to tell him that this will require two unscheduled work months. You decide not to tell him anything at all. You decide that intravenous injections of pure ethanol are the only solution. You make the appropriate arrangements. Hack, hack, hack, and hack. You and your team madly code away. By August 1, your boss, frowning at the thermometer on the wall, institutes a mandatory 50-hour workweek. Hack, hack, hack, and hack. By September 1st, the thermometer is at 1.2 million lines and your boss asks you to write a report describing why you exceeded the coding budget by 20 percent. He institutes mandatory Saturdays and demands that the project be brought back down to a million lines. You start a campaign of remerging lines. Hack, hack, hack, and hack. Tempers are flaring; people are quitting; QA is raining trouble reports down on you. Customers are demanding installation and user manuals; salespeople are demanding advance demonstrations for special customers; the requirements document is still thrashing, the marketing folks are complaining that the product isn't anything like they specified, and the liquor store won't accept your credit card anymore. Something has to give. On September 15, BB calls a meeting. As he enters the room, his points are emitting clouds of steam. When he speaks, the bass overtones of his carefully manicured voice cause the pit of your stomach to roll over. "The QA manager has told me that this project has less than 50 percent of the required features implemented. He has also informed me that the system crashes all the time, yields wrong results, and is hideously slow. He has also complained that he cannot keep up with the continuous train of daily releases, each more buggy than the last!" He stops for a few seconds, visibly trying to compose himself. "The QA manager estimates that, at this rate of development, we won't be able to ship the product until December!" Actually, you think it's more like March, but you don't say anything. "December!" BB roars with such derision that people duck their heads as though he were pointing an assault rifle at them. "December is absolutely out of the question. Team leaders, I want new estimates on my desk in the morning. I am hereby mandating 65-hour work weeks until this project is complete. And it better be complete by November 1." As he leaves the conference room, he is heard to mutter: "Empowermentbah!" * * * Your boss is bald; his points are mounted on BB's wall. The fluorescent lights reflecting off his pate momentarily dazzle you. "Do you have anything to drink?" he asks. Having just finished your last bottle of Boone's Farm, you pull a bottle of Thunderbird from your bookshelf and pour it into his coffee mug. "What's it going to take to get this project done? " he asks. "We need to freeze the requirements, analyze them, design them, and then implement them," you say callously. "By November 1?" your boss exclaims incredulously. "No way! Just get back to coding the damned thing." He storms out, scratching his vacant head. A few days later, you find that your boss has been transferred to the corporate research division. Turnover has skyrocketed. Customers, informed at the last minute that their orders cannot be fulfilled on time, have begun to cancel their orders. Marketing is re-evaluating whether this product aligns with the overall goals of the company. Memos fly, heads roll, policies change, and things are, overall, pretty grim. Finally, by March, after far too many sixty-five hour weeks, a very shaky version of the software is ready. In the field, bug-discovery rates are high, and the technical support staff are at their wits' end, trying to cope with the complaints and demands of the irate customers. Nobody is happy. In April, BB decides to buy his way out of the problem by licensing a product produced by Rupert Industries and redistributing it. The customers are mollified, the marketing folks are smug, and you are laid off. Rupert Industries: Project Alpha Your name is Robert. The date is January 3, 2001. The quiet hours spent with your family this holiday have left you refreshed and ready for work. You are sitting in a conference room with your team of professionals. The manager of the division called the meeting. "We have some ideas for a new project," says the division manager. Call him Russ. He is a high-strung British chap with more energy than a fusion reactor. He is ambitious and driven but understands the value of a team. Russ describes the essence of the new market opportunity the company has identified and introduces you to Jane, the marketing manager, who is responsible for defining the products that will address it. Addressing you, Jane says, "We'd like to start defining our first product offering as soon as possible. When can you and your team meet with me?" You reply, "We'll be done with the current iteration of our project this Friday. We can spare a few hours for you between now and then. After that, we'll take a few people from the team and dedicate them to you. We'll begin hiring their replacements and the new people for your team immediately." "Great," says Russ, "but I want you to understand that it is critical that we have something to exhibit at the trade show coming up this July. If we can't be there with something significant, we'll lose the opportunity." "I understand," you reply. "I don't yet know what it is that you have in mind, but I'm sure we can have something by July. I just can't tell you what that something will be right now. In any case, you and Jane are going to have complete control over what we developers do, so you can rest assured that by July, you'll have the most important things that can be accomplished in that time ready to exhibit." Russ nods in satisfaction. He knows how this works. Your team has always kept him advised and allowed him to steer their development. He has the utmost confidence that your team will work on the most important things first and will produce a high-quality product. * * * "So, Robert," says Jane at their first meeting, "How does your team feel about being split up?" "We'll miss working with each other," you answer, "but some of us were getting pretty tired of that last project and are looking forward to a change. So, what are you people cooking up?" Jane beams. "You know how much trouble our customers currently have . . ." And she spends a half hour or so describing the problem and possible solution. "OK, wait a second" you respond. "I need to be clear about this." And so you and Jane talk about how this system might work. Some of her ideas aren't fully formed. You suggest possible solutions. She likes some of them. You continue discussing. During the discussion, as each new topic is addressed, Jane writes user story cards. Each card represents something that the new system has to do. The cards accumulate on the table and are spread out in front of you. Both you and Jane point at them, pick them up, and make notes on them as you discuss the stories. The cards are powerful mnemonic devices that you can use to represent complex ideas that are barely formed. At the end of the meeting, you say, "OK, I've got a general idea of what you want. I'm going to talk to the team about it. I imagine they'll want to run some experiments with various database structures and presentation formats. Next time we meet, it'll be as a group, and we'll start identifying the most important features of the system." A week later, your nascent team meets with Jane. They spread the existing user story cards out on the table and begin to get into some of the details of the system. The meeting is very dynamic. Jane presents the stories in the order of their importance. There is much discussion about each one. The developers are concerned about keeping the stories small enough to estimate and test. So they continually ask Jane to split one story into several smaller stories. Jane is concerned that each story have a clear business value and priority, so as she splits them, she makes sure that this stays true. The stories accumulate on the table. Jane writes them, but the developers make notes on them as needed. Nobody tries to capture everything that is said; the cards are not meant to capture everything but are simply reminders of the conversation. As the developers become more comfortable with the stories, they begin writing estimates on them. These estimates are crude and budgetary, but they give Jane an idea of what the story will cost. At the end of the meeting, it is clear that many more stories could be discussed. It is also clear that the most important stories have been addressed and that they represent several months worth of work. Jane closes the meeting by taking the cards with her and promising to have a proposal for the first release in the morning. * * * The next morning, you reconvene the meeting. Jane chooses five cards and places them on the table. "According to your estimates, these cards represent about one perfect team-week's worth of work. The last iteration of the previous project managed to get one perfect team-week done in 3 real weeks. If we can get these five stories done in 3 weeks, we'll be able to demonstrate them to Russ. That will make him feel very comfortable about our progress." Jane is pushing it. The sheepish look on her face lets you know that she knows it too. You reply, "Jane, this is a new team, working on a new project. It's a bit presumptuous to expect that our velocity will be the same as the previous team's. However, I met with the team yesterday afternoon, and we all agreed that our initial velocity should, in fact, be set to one perfectweek for every 3 real-weeks. So you've lucked out on this one." "Just remember," you continue, "that the story estimates and the story velocity are very tentative at this point. We'll learn more when we plan the iteration and even more when we implement it." Jane looks over her glasses at you as if to say "Who's the boss around here, anyway?" and then smiles and says, "Yeah, don't worry. I know the drill by now."Jane then puts 15 more cards on the table. She says, "If we can get all these cards done by the end of March, we can turn the system over to our beta test customers. And we'll get good feedback from them." You reply, "OK, so we've got our first iteration defined, and we have the stories for the next three iterations after that. These four iterations will make our first release." "So," says Jane, can you really do these five stories in the next 3 weeks?" "I don't know for sure, Jane," you reply. "Let's break them down into tasks and see what we get." So Jane, you, and your team spend the next several hours taking each of the five stories that Jane chose for the first iteration and breaking them down into small tasks. The developers quickly realize that some of the tasks can be shared between stories and that other tasks have commonalities that can probably be taken advantage of. It is clear that potential designs are popping into the developers' heads. From time to time, they form little discussion knots and scribble UML diagrams on some cards. Soon, the whiteboard is filled with the tasks that, once completed, will implement the five stories for this iteration. You start the sign-up process by saying, "OK, let's sign up for these tasks." "I'll take the initial database generation." Says Pete. "That's what I did on the last project, and this doesn't look very different. I estimate it at two of my perfect workdays." "OK, well, then, I'll take the login screen," says Joe. "Aw, darn," says Elaine, the junior member of the team, "I've never done a GUI, and kinda wanted to try that one." "Ah, the impatience of youth," Joe says sagely, with a wink in your direction. "You can assist me with it, young Jedi." To Jane: "I think it'll take me about three of my perfect workdays." One by one, the developers sign up for tasks and estimate them in terms of their own perfect workdays. Both you and Jane know that it is best to let the developers volunteer for tasks than to assign the tasks to them. You also know full well that you daren't challenge any of the developers' estimates. You know these people, and you trust them. You know that they are going to do the very best they can. The developers know that they can't sign up for more perfect workdays than they finished in the last iteration they worked on. Once each developer has filled his or her schedule for the iteration, they stop signing up for tasks. Eventually, all the developers have stopped signing up for tasks. But, of course, tasks are still left on the board. "I was worried that that might happen," you say, "OK, there's only one thing to do, Jane. We've got too much to do in this iteration. What stories or tasks can we remove?" Jane sighs. She knows that this is the only option. Working overtime at the beginning of a project is insane, and projects where she's tried it have not fared well. So Jane starts to remove the least-important functionality. "Well, we really don't need the login screen just yet. We can simply start the system in the logged-in state." "Rats!" cries Elaine. "I really wanted to do that." "Patience, grasshopper." says Joe. "Those who wait for the bees to leave the hive will not have lips too swollen to relish the honey." Elaine looks confused. Everyone looks confused. "So . . .," Jane continues, "I think we can also do away with . . ." And so, bit by bit, the list of tasks shrinks. Developers who lose a task sign up for one of the remaining ones. The negotiation is not painless. Several times, Jane exhibits obvious frustration and impatience. Once, when tensions are especially high, Elaine volunteers, "I'll work extra hard to make up some of the missing time." You are about to correct her when, fortunately, Joe looks her in the eye and says, "When once you proceed down the dark path, forever will it dominate your destiny." In the end, an iteration acceptable to Jane is reached. It's not what Jane wanted. Indeed, it is significantly less. But it's something the team feels that can be achieved in the next 3 weeks. And, after all, it still addresses the most important things that Jane wanted in the iteration. "So, Jane," you say when things had quieted down a bit, "when can we expect acceptance tests from you?" Jane sighs. This is the other side of the coin. For every story the development team implements, Jane must supply a suite of acceptance tests that prove that it works. And the team needs these long before the end of the iteration, since they will certainly point out differences in the way Jane and the developers imagine the system's behaviour. "I'll get you some example test scripts today," Jane promises. "I'll add to them every day after that. You'll have the entire suite by the middle of the iteration." * * * The iteration begins on Monday morning with a flurry of Class, Responsibilities, Collaborators sessions. By midmorning, all the developers have assembled into pairs and are rapidly coding away. "And now, my young apprentice," Joe says to Elaine, "you shall learn the mysteries of test-first design!" "Wow, that sounds pretty rad," Elaine replies. "How do you do it?" Joe beams. It's clear that he has been anticipating this moment. "OK, what does the code do right now?" "Huh?" replied Elaine, "It doesn't do anything at all; there is no code." "So, consider our task; can you think of something the code should do?" "Sure," Elaine said with youthful assurance, "First, it should connect to the database." "And thereupon, what must needs be required to connecteth the database?" "You sure talk weird," laughed Elaine. "I think we'd have to get the database object from some registry and call the Connect() method. "Ah, astute young wizard. Thou perceives correctly that we requireth an object within which we can cacheth the database object." "Is 'cacheth' really a word?" "It is when I say it! So, what test can we write that we know the database registry should pass?" Elaine sighs. She knows she'll just have to play along. "We should be able to create a database object and pass it to the registry in a Store() method. And then we should be able to pull it out of the registry with a Get() method and make sure it's the same object." "Oh, well said, my prepubescent sprite!" "Hay!" "So, now, let's write a test function that proves your case." "But shouldn't we write the database object and registry object first?" "Ah, you've much to learn, my young impatient one. Just write the test first." "But it won't even compile!" "Are you sure? What if it did?" "Uh . . ." "Just write the test, Elaine. Trust me." And so Joe, Elaine, and all the other developers began to code their tasks, one test case at a time. The room in which they worked was abuzz with the conversations between the pairs. The murmur was punctuated by an occasional high five when a pair managed to finish a task or a difficult test case. As development proceeded, the developers changed partners once or twice a day. Each developer got to see what all the others were doing, and so knowledge of the code spread generally throughout the team. Whenever a pair finished something significant whether a whole task or simply an important part of a task they integrated what they had with the rest of the system. Thus, the code base grew daily, and integration difficulties were minimized. The developers communicated with Jane on a daily basis. They'd go to her whenever they had a question about the functionality of the system or the interpretation of an acceptance test case. Jane, good as her word, supplied the team with a steady stream of acceptance test scripts. The team read these carefully and thereby gained a much better understanding of what Jane expected the system to do. By the beginning of the second week, there was enough functionality to demonstrate to Jane. She watched eagerly as the demonstration passed test case after test case. "This is really cool," Jane said as the demonstration finally ended. "But this doesn't seem like one-third of the tasks. Is your velocity slower than anticipated?" You grimace. You'd been waiting for a good time to mention this to Jane but now she was forcing the issue. "Yes, unfortunately, we are going more slowly than we had expected. The new application server we are using is turning out to be a pain to configure. Also, it takes forever to reboot, and we have to reboot it whenever we make even the slightest change to its configuration." Jane eyes you with suspicion. The stress of last Monday's negotiations had still not entirely dissipated. She says, "And what does this mean to our schedule? We can't slip it again, we just can't. Russ will have a fit! He'll haul us all into the woodshed and ream us some new ones." You look Jane right in the eyes. There's no pleasant way to give someone news like this. So you just blurt out, "Look, if things keep going like they're going, we're not going to be done with everything by next Friday. Now it's possible that we'll figure out a way to go faster. But, frankly, I wouldn't depend on that. You should start thinking about one or two tasks that could be removed from the iteration without ruining the demonstration for Russ. Come hell or high water, we are going to give that demonstration on Friday, and I don't think you want us to choose which tasks to omit." "Aw forchrisakes!" Jane barely manages to stifle yelling that last word as she stalks away, shaking her head. Not for the first time, you say to yourself, "Nobody ever promised me project management would be easy." You are pretty sure it won't be the last time, either. Actually, things went a bit better than you had hoped. The team did, in fact, have to drop one task from the iteration, but Jane had chosen wisely, and the demonstration for Russ went without a hitch. Russ was not impressed with the progress, but neither was he dismayed. He simply said, "This is pretty good. But remember, we have to be able to demonstrate this system at the trade show in July, and at this rate, it doesn't look like you'll have all that much to show." Jane, whose attitude had improved dramatically with the completion of the iteration, responded to Russ by saying, "Russ, this team is working hard, and well. When July comes around, I am confident that we'll have something significant to demonstrate. It won't be everything, and some of it may be smoke and mirrors, but we'll have something." Painful though the last iteration was, it had calibrated your velocity numbers. The next iteration went much better. Not because your team got more done than in the last iteration but simply because the team didn't have to remove any tasks or stories in the middle of the iteration. By the start of the fourth iteration, a natural rhythm has been established. Jane, you, and the team know exactly what to expect from one another. The team is running hard, but the pace is sustainable. You are confident that the team can keep up this pace for a year or more. The number of surprises in the schedule diminishes to near zero; however, the number of surprises in the requirements does not. Jane and Russ frequently look over the growing system and make recommendations or changes to the existing functionality. But all parties realize that these changes take time and must be scheduled. So the changes do not cause anyone's expectations to be violated. In March, there is a major demonstration of the system to the board of directors. The system is very limited and is not yet in a form good enough to take to the trade show, but progress is steady, and the board is reasonably impressed. The second release goes even more smoothly than the first. By now, the team has figured out a way to automate Jane's acceptance test scripts. The team has also refactored the design of the system to the point that it is really easy to add new features and change old ones. The second release was done by the end of June and was taken to the trade show. It had less in it than Jane and Russ would have liked, but it did demonstrate the most important features of the system. Although customers at the trade show noticed that certain features were missing, they were very impressed overall. You, Russ, and Jane all returned from the trade show with smiles on your faces. You all felt as though this project was a winner. Indeed, many months later, you are contacted by Rufus Inc. That company had been working on a system like this for its internal operations. Rufus has canceled the development of that system after a death-march project and is negotiating to license your technology for its environment. Indeed, things are looking up!

Read the article
What are good design practices when working with Entity Framework

- by AD

This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply. This is a community post, so please add to it as you see fit. Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1. Why pick EF in the first place? Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think. EF and inheritance The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part. (The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible. Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once. Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this. So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance. Bypassing inheritance with Interfaces First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do? You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be. public enum EntityTypes{ Unknown = -1, Dog = 0, Cat } public interface IEntity { int EntityID { get; } string Name { get; } Type EntityType { get; } } public partial class Dog : IEntity { // implement EntityID and Name which could actually be fields // from your EF model Type EntityType{ get{ return EntityTypes.Dog; } } } Using this IEntity, you can then work with undefined associations in other classes // lets take a class that you defined in your model. // that class has a mapping to the columns: PetID, PetType public partial class Person { public IEntity GetPet() { return IEntityController.Get(PetID,PetType); } } which makes use of some extension functions: public class IEntityController { static public IEntity Get(int id, EntityTypes type) { switch (type) { case EntityTypes.Dog: return Dog.Get(id); case EntityTypes.Cat: return Cat.Get(id); default: throw new Exception("Invalid EntityType"); } } } Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back. It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards. Compiled Queries Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql. What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries. There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached). An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = @ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1]. Ok, so here is how to do it using ObjectQuery< string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option. The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate: static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => ctx.DogSet.FirstOrDefault(it => it.ID == id)); or using linq static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault()); to call the query: query_GetDog.Invoke( YourContext, id ); The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration... Includes Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right? EntitySQL string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner"); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); CompiledQuery static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault()); Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load. It is easy to do with EntitySQL public Dog Get(int id, string include) { string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)) .IncludeMany(include); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); } The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function. If we try to do it with CompiledQuery however, we run into numerous problems: The obvious static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault()); will choke when called with: query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" ); Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!). Then, let's use IncludeMany(), which is an extension function static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault()); Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension. Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this: from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3) .Include(include4).Include(include5).Include(include6) .[...].Include(include19).Include(include20) where dog.ID == id select dog which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery< with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well: static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) => (ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog)); public Dog GetDog( int id, string include ) { ObjectQuery<Dog> oQuery = query_GetDog(id); oQuery = oQuery.IncludeMany(include); return oQuery.FirstOrDefault; } That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again. So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible. An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required. Passing more than 3 parameters to a CompiledQuery Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily. public struct MyParams { public string param1; public int param2; public DateTime param3; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog); public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate ) { MyParams myParams = new MyParams(); myParams.param1 = name; myParams.param2 = age; myParams.param3 = birthDate; return query_GetDog(YourContext,myParams).ToList(); } Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method) Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way: static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public IEnumerable<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name); } public void DataBindStuff() { IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List< of objects instead. static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public List<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name).ToList(); //<== change here } public void DataBindStuff() { List<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan. Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it. Performance What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds. So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries. Can this query be cached? { Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog; } No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it. Parametrized Queries Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use: public struct MyParams { public string name; public bool checkName; public int age; public bool checkAge; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where (myParams.checkAge == true && dog.Age == myParams.age) && (myParams.checkName == true && dog.Name == myParams.name ) select dog); protected List<Dog> GetSomeDogs() { MyParams myParams = new MyParams(); myParams.name = "Bud"; myParams.checkName = true; myParams.age = 0; myParams.checkAge = false; return query_GetDog(YourContext,myParams).ToList(); } The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in). Another way is to build an EntitySQL query piece by piece, like we all did with SQL. protected List<Dod> GetSomeDogs( string name, int age) { string query = "select value dog from Entities.DogSet where 1 = 1 "; if( !String.IsNullOrEmpty(name) ) query = query + " and dog.Name == @Name "; if( age > 0 ) query = query + " and dog.Age == @Age "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); if( !String.IsNullOrEmpty(name) ) oQuery.Parameters.Add( new ObjectParameter( "Name", name ) ); if( age > 0 ) oQuery.Parameters.Add( new ObjectParameter( "Age", age ) ); return oQuery.ToList(); } Here the problems are: - there is no syntax checking during compilation - each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search. - Noone likes to concatenate strings! Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results protected List<Dod> GetSomeDogs( string name, int age, string city) { string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == @City "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); oQuery.Parameters.Add( new ObjectParameter( "City", city ) ); List<Dog> dogs = oQuery.ToList(); if( !String.IsNullOrEmpty(name) ) dogs = dogs.Where( it => it.Name == name ); if( age > 0 ) dogs = dogs.Where( it => it.Age == age ); return dogs; } It is particularly useful when you start displaying all the data then allow for filtering. Problems: - Could lead to serious data transfer if you are not careful about your subset. - You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem: - Use lambda-based query building when you don't care about pre-compiling your queries. - Use fully-defined pre-compiled Linq query when your object structure is not too complex. - Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits). - Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db). Singleton access The best way to deal with your context and entities accross all your pages is to use the singleton pattern: public sealed class YourContext { private const string instanceKey = "On3GoModelKey"; YourContext(){} public static YourEntities Instance { get { HttpContext context = HttpContext.Current; if( context == null ) return Nested.instance; if (context.Items[instanceKey] == null) { On3GoEntities entity = new On3GoEntities(); context.Items[instanceKey] = entity; } return (YourEntities)context.Items[instanceKey]; } } class Nested { // Explicit static constructor to tell C# compiler // not to mark type as beforefieldinit static Nested() { } internal static readonly YourEntities instance = new YourEntities(); } } NoTracking, is it worth it? When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here. For example, lets assume that Dog with ID == 2 has an owner which ID == 10. Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == true; If we were to do the same with no tracking, the result would be different. ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog = oDogQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>) (from o in YourContext.PersonSet where o.ID == 10 select o); oPersonQuery.MergeOption = MergeOption.NoTracking; Owner owner = oPersonQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for. Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown. In a page where there are absolutly no updates to the database, you can use NoTracking. Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault(); ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog2 = oDogQuery.FirstOrDefault(); dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful. How much faster is it with NoTracking It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps. So I should use NoTracking everywhere then? Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that). What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens. Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them: ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog); oDogQuery.MergeOption = MergeOption.NoTracking; List<Dog> dogs = oDogQuery.ToList(); ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o); oPersonQuery.MergeOption = MergeOption.NoTracking; List<Person> owners = oPersonQuery.ToList(); In this case, no dog will have its .Owner property set. Some things to keep in mind when you are trying to optimize the performance. No lazy loading, what am I to do? This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF. Of course, you can call if( !ObjectReference.IsLoaded ) ObjectReference.Load(); if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense. Lets say you have you Dog object public class Dog { public Dog Get(int id) { return YourContext.DogSet.FirstOrDefault(it => it.ID == id ); } } This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc. Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object: static public Dog Get(int id) { return GetDog(entity,"");} static public Dog Get(int id, string includePath) { string query = "select value o " + " from YourEntities.DogSet as o " +

Read the article

< Previous Page | 106 107 108 109 110