Search Results

Search found 1695 results on 68 pages for 'drew gain'.

Page 68/68 | < Previous Page | 64 65 66 67 68

C#/.NET Little Wonders: The Useful But Overlooked Sets

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>. Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items. In other words, the set {2,3,5} is identical to the set {3,5,2}. In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2). In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both. Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set. Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets. Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation? The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order. Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for. Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list. If it is, great, we already have that market data feed! If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it. This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6: 7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4: 5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13: 14: // place the order logic! 15: } What’s wrong with this? In short: performance! Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion. Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long. Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection. Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage. So performing a sort after each add would probably add more time. As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant. The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface. As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>. That is, a Dictionary where the TKey and TValue refer to the same object. This is really an oversimplification, but logically it makes sense. I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups! Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n) Now, these times are amortized and represent the typical case. In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains! This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection. Compare this to the List where almost any add or remove must rearrange potentially all the elements! And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T> is change our declaration. Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4: 5: // ... 6: 7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22: 23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>. Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order. Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally. Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes. There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>. Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n) The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error. There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity. Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection. Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection. This is very good performance! It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000 Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count. This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search). So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets. Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets. Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets. Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets. You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order. That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it. This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection. Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value. In addition, if you have need for the mathematical set operations, the C# sets support those as well. The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order. In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance. Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

Read the article
EM12c: Using the LIST verb in emcli

- by SubinDaniVarughese

Many of us who use EM CLI to write scripts and automate our daily tasks should not miss out on the new list verb released with Oracle Enterprise Manager 12.1.0.3.0. The combination of list and Jython based scripting support in EM CLI makes it easier to achieve automation for complex tasks with just a few lines of code. Before I jump into a script, let me highlight the key attributes of the list verb and why it’s simply excellent! 1. Multiple resources under a single verb:A resource can be set of users or targets, etc. Using the list verb, you can retrieve information about a resource from the repository database.Here is an example which retrieves the list of administrators within EM.Standard mode$ emcli list -resource="Administrators" Interactive modeemcli>list(resource="Administrators")The output will be the same as standard mode.Standard mode$ emcli @myAdmin.pyEnter password : ******The output will be the same as standard mode.Contents of myAdmin.py scriptlogin()print list(resource="Administrators",jsonout=False).out()To get a list of all available resources use$ emcli list -helpWith every release of EM, more resources are being added to the list verb. If you have a resource which you feel would be valuable then go ahead and contact Oracle Support to log an enhancement request with product development. Be sure to say how the resource is going to help improve your daily tasks. 2. Consistent Formatting:It is possible to format the output of any resource consistently using these options: –column This option is used to specify which columns should be shown in the output. Here is an example which shows the list of administrators and their account status$ emcli list -resource="Administrators" -columns="USER_NAME,REPOS_ACCOUNT_STATUS" To get a list of columns in a resource use:$ emcli list -resource="Administrators" -help You can also specify the width of the each column. For example, here the column width of user_type is set to 20 and department to 30. $ emcli list -resource=Administrators -columns="USER_NAME,USER_TYPE:20,COST_CENTER,CONTACT,DEPARTMENT:30"This is useful if your terminal is too small or you need to fine tune a list of specific columns for your quick use or improved readability. –colsize This option is used to resize column widths.Here is the same example as above, but using -colsize to define the width of user_type to 20 and department to 30.$ emcli list -resource=Administrators -columns="USER_NAME,USER_TYPE,COST_CENTER,CONTACT,DEPARTMENT" -colsize="USER_TYPE:20,DEPARTMENT:30" The existing standard EMCLI formatting options are also available in list verb. They are: -format="name:pretty" | -format="name:script” | -format="name:csv" | -noheader | -scriptThere are so many uses depending on your needs. Have a look at the resources and columns in each resource. Refer to the EMCLI book in EM documentation for more information.3. Search:Using the -search option in the list verb makes it is possible to search for a specific row in a specific column within a resource. This is similar to the sqlplus where clause. The following operators are supported: = != > < >= <= like is (Must be followed by null or not null)Here is an example which searches for all EM administrators in the marketing department located in the USA.$emcli list -resource="Administrators" -search="DEPARTMENT ='Marketing'" -search="LOCATION='USA'" Here is another example which shows all the named credentials created since a specific date. $emcli list -resource=NamedCredentials -search="CredCreatedDate > '11-Nov-2013 12:37:20 PM'"Note that the timestamp has to be in the format DD-MON-YYYY HH:MI:SS AM/PM Some resources need a bind variable to be passed to get output. A bind variable is created in the resource and then referenced in the command. For example, this command will list all the default preferred credentials for target type oracle_database.Here is an example$ emcli list -resource="PreferredCredentialsDefault" -bind="TargetType='oracle_database'" -colsize="SetName:15,TargetType:15" You can provide multiple bind variables. To verify if a column is searchable or requires a bind variable, use the –help option. Here is an example:$ emcli list -resource="PreferredCredentialsDefault" -help 4. Secure accessWhen list verb collects the data, it only displays content for which the administrator currently logged into emcli, has access. For example consider this usecase:AdminA has access only to TargetA. AdminA logs into EM CLIExecuting the list verb to get the list of all targets will only show TargetA.5. User defined SQLUsing the –sql option, user defined sql can be executed. The SQL provided in the -sql option is executed as the EM user MGMT_VIEW, which has read-only access to the EM published MGMT$ database views in the SYSMAN schema. To get the list of EM published MGMT$ database views, go to the Extensibility Programmer's Reference book in EM documentation. There is a chapter about Using Management Repository Views. It’s always recommended to reference the documentation for the supported MGMT$ database views. Consider you are using the MGMT$ABC view which is not in the chapter. During upgrade, it is possible, since the view was not in the book and not supported, it is likely the view might undergo a change in its structure or the data in it. Using a supported view ensures that your scripts using -sql will continue working after upgrade.Here’s an example $ emcli list -sql='select * from mgmt$target' 6. JSON output support JSON (JavaScript Object Notation) enables data to be displayed in a collection of name/value pairs. There is lot of reading material about JSON on line for more information.As an example, we had a requirement where an EM administrator had many 11.2 databases in their test environment and the developers had requested an Administrator to change the lifecycle status from Test to Production which meant the admin had to go to the EM “All targets” page and identify the set of 11.2 databases and then to go into each target database page and manually changes the property to Production. Sounds easy to say, but this Administrator had numerous targets and this task is repeated for every release cycle.We told him there is an easier way to do this with a script and he can reuse the script whenever anyone wanted to change a set of targets to a different Lifecycle status. Here is a jython script which uses list and JSON to change all 11.2 database target’s LifeCycle Property value.If you are new to scripting and Jython, I would suggest visiting the basic chapters in any Jython tutorials. Understanding Jython is important to write the logic depending on your usecase.If you are already writing scripts like perl or shell or know a programming language like java, then you can easily understand the logic.Disclaimer: The scripts in this post are subject to the Oracle Terms of Use located here. 1 from emcli import * 2 search_list = ['PROPERTY_NAME=\'DBVersion\'','TARGET_TYPE= \'oracle_database\'','PROPERTY_VALUE LIKE \'11.2%\''] 3 if len(sys.argv) == 2: 4 print login(username=sys.argv[0]) 5 l_prop_val_to_set = sys.argv[1] 6 l_targets = list(resource="TargetProperties", search=search_list, columns="TARGET_NAME,TARGET_TYPE,PROPERTY_NAME") 7 for target in l_targets.out()['data']: 8 t_pn = 'LifeCycle Status' 9 print "INFO: Setting Property name " + t_pn + " to value " + l_prop_val_to_set + " for " + target['TARGET_NAME'] 10 print set_target_property_value(property_records= target['TARGET_NAME']+":"+target['TARGET_TYPE']+":"+ t_pn+":"+l_prop_val_to_set) 11 else: 12 print "\n ERROR: Property value argument is missing" 13 print "\n INFO: Format to run this file is filename.py <username> <Database Target LifeCycle Status Property Value>" You can download the script from here. I could not upload the file with .py extension so you need to rename the file to myScript.py before executing it using emcli.A line by line explanation for beginners: Line 1 Imports the emcli verbs as functions 2 search_list is a variable to pass to the search option in list verb. I am using escape character for the single quotes. In list verb to pass more than one value for the same option, you should define as above comma separated values, surrounded by square brackets. 3 This is an “if” condition to ensure the user does provide two arguments with the script, else in line #15, it prints an error message. 4 Logging into EM. You can remove this if you have setup emcli with autologin. For more details about setup and autologin, please go the EM CLI book in EM documentation. 5 l_prop_val_to_set is another variable. This is the property value to be set. Remember we are changing the value from Test to Production. The benefit of this variable is you can reuse the script to change the property value from and to any other values. 6 Here the output of the list verb is stored in l_targets. In the list verb I am passing the resource as TargetProperties, search as the search_list variable and I only need these three columns – target_name, target_type and property_name. I don’t need the other columns for my task. 7 This is a for loop. The data in l_targets is available in JSON format. Using the for loop, each pair will now be available in the ‘target’ variable. 8 t_pn is the “LifeCycle Status” variable. If required, I can have this also as an input and then use my script to change any target property. In this example, I just wanted to change the “LifeCycle Status”. 9 This a message informing the user the script is setting the property value for dbxyz. 10 This line shows the set_target_property_value verb which sets the value using the property_records option. Once it is set for a target pair, it moves to the next one. In my example, I am just showing three dbs, but the real use is when you have 20 or 50 targets. The script is executed as:$ emcli @myScript.py subin Production The recommendation is to first test the scripts before running it on a production system. We tested on a small set of targets and optimizing the script for fewer lines of code and better messaging.For your quick reference, the resources available in Enterprise Manager 12.1.0.4.0 with list verb are:$ emcli list -helpWatch this space for more blog posts using the list verb and EM CLI Scripting use cases. I hope you enjoyed reading this blog post and it has helped you gain more information about the list verb. Happy Scripting!!Disclaimer: The scripts in this post are subject to the Oracle Terms of Use located here. Stay Connected: Twitter | Facebook | YouTube | Linkedin | Newsletter mt=8">Download the Oracle Enterprise Manager 12c Mobile app

Read the article
IIS Strategies for Accessing Secured Network Resources

- by ErikE

Problem: A user connects to a service on a machine, such as an IIS web site or a SQL Server database. The site or the database need to gain access to network resources such as file shares (the most common) or a database on a different server. Permission is denied. This is because the user the service is running under doesn't have network permissions in the first place, or if it does, it doesn't have rights to access the remote resource. I keep running into this problem over and over again and am tired of not having a really solid way of handling it. Here are some workarounds I'm aware of: Run IIS as a custom-created domain user who is granted high permissions If permissions are granted one file share at a time, then every time I want to read from a new share, I would have to ask a network admin to add it for me. Eventually, with many web sites reading from many shares, it is going to get really complicated. If permissions are just opened up wide for the user to access any file shares in our domain, then this seems like an unnecessary security surface area to present. This also applies to all the sites running on IIS, rather than just the selected site or virtual directory that needs the access, a further surface area problem. Still use the IUSR account but give it network permissions and set up the same user name on the remote resource (not a domain user, a local user) This also has its problems. For example, there's a file share I am using that I have full rights to for sharing, but I can't log in to the machine. So I have to find the right admin and ask him to do it for me. Any time something has to change, it's another request to an admin. Allow IIS users to connect as anonymous, but set the account used for anonymous access to a high-privilege one This is even worse than giving the IIS IUSR full privileges, because it means my web site can't use any kind of security in the first place. Connect using Kerberos, then delegate This sounds good in principle but has all sorts of problems. First of all, if you're using virtual web sites where the domain name you connect to the site with is not the base machine name (as we do frequently), then you have to set up a Service Principal Name on the webserver using Microsoft's SetSPN utility. It's complicated and apparently prone to errors. Also, you have to ask your network/domain admin to change security policy for both the web server and the domain account so they are "trusted for delegation." If you don't get everything perfectly right, suddenly your intended Kerberos authentication is NTLM instead, and you can only impersonate rather than delegate, and thus no reaching out over the network as the user. Also, this method can be problematic because sometimes you need the web site or database to have permissions that the connecting user doesn't have. Create a service or COM+ application that fetches the resource for the web site Services and COM+ packages are run with their own set of credentials. Running as a high-privilege user is okay since they can do their own security and deny requests that are not legitimate, putting control in the hands of the application developer instead of the network admin. Problems: I am using a COM+ package that does exactly this on Windows Server 2000 to deliver highly sensitive images to a secured web application. I tried moving the web site to Windows Server 2003 and was suddenly denied permission to instantiate the COM+ object, very likely registry permissions. I trolled around quite a bit and did not solve the problem, partly because I was reluctant to give the IUSR account full registry permissions. That seems like the same bad practice as just running IIS as a high-privilege user. Note: This is actually really simple. In a programming language of your choice, you create a class with a function that returns an instance of the object you want (an ADODB.Connection, for example), and build a dll, which you register as a COM+ object. In your web server-side code, you create an instance of the class and use the function, and since it is running under a different security context, calls to network resources work. Map drive letters to shares This could theoretically work, but in my mind it's not really a good long-term strategy. Even though mappings can be created with specific credentials, and this can be done by others than a network admin, this also is going to mean that there are either way too many shared drives (small granularity) or too much permission is granted to entire file servers (large granularity). Also, I haven't figured out how to map a drive so that the IUSR gets the drives. Mapping a drive is for the current user, I don't know the IUSR account password to log in as it and create the mappings. Move the resources local to the web server/database There are times when I've done this, especially with Access databases. Does the database have to live out on the file share? Sometimes, it was just easiest to move the database to the web server or to the SQL database server (so the linked server to it would work). But I don't think this is a great all-around solution, either. And it won't work when the resource is a service rather than a file. Move the service to the final web server/database I suppose I could run a web server on my SQL Server database, so the web site can connect to it using impersonation and make me happy. But do we really want random extra web servers on our database servers just so this is possible? No. Virtual directories in IIS I know that virtual directories can help make remote resources look as though they are local, and this supports using custom credentials for each virtual directory. I haven't been able to come up with, yet, how this would solve the problem for system calls. Users could reach file shares directly, but this won't help, say, classic ASP code access resources. I could use a URL instead of a file path to read remote data files in a web page, but this isn't going to help me make a connection to an Access database, a SQL server database, or any other resource that uses a connection library rather than being able to just read all the bytes and work with them. I wish there was some kind of "service tunnel" that I could create. Think about how a VPN makes remote resources look like they are local. With a richer aliasing mechanism, perhaps code-based, why couldn't even database connections occur under a defined security context? Why not a special Windows component that lets you specify, per user, what resources are available and what alternate credentials are used for the connection? File shares, databases, web sites, you name it. I guess I'm almost talking about a specialized local proxy server. Anyway, so there's my list. I may update it if I think of more. Does anyone have any ideas for me? My current problem today is, yet again, I need a web site to connect to an Access database on a file share. Here we go again...

Read the article
IIS Strategies for Accessing Secured Network Resources

- by Emtucifor

Problem: A user connects to a service on a machine, such as an IIS web site or a SQL Server database. The site or the database need to gain access to network resources such as file shares (the most common) or a database on a different server. Permission is denied. This is because the user the service is running as doesn't have network permissions in the first place, or if it does, it doesn't have rights to access the remote resource. I keep running into this problem over and over again and am tired of not having a really solid way of handling it. Here are some workarounds I'm aware of: Run IIS as a custom-created domain user who is granted high permissions If permissions are granted one file share at a time, then every time I want to read from a new share, I would have to ask a network admin to add it for me. Eventually, with many web sites reading from many shares, it is going to get really complicated. If permissions are just opened up wide for the user to access any file shares in our domain, then this seems like an unnecessary security surface area to present. This also applies to all the sites running on IIS, rather than just the selected site or virtual directory that needs the access, a further surface area problem. Still use the IUSR account but give it network permissions and set up the same user name on the remote resource (not a domain user, a local user) This also has its problems. For example, there's a file share I am using that I have full rights to for sharing, but I can't log in to the machine. So I have to find the right admin and ask him to do it for me. Any time something has to change, it's another request to an admin. Allow IIS users to connect as anonymous, but set the account used for anonymous access to a high-privilege one This is even worse than giving the IIS IUSR full privileges, because it means my web site can't use any kind of security in the first place. Connect using Kerberos, then delegate This sounds good in principle but has all sorts of problems. First of all, if you're using virtual web sites where the domain name you connect to the site with is not the base machine name (as we do frequently), then you have to set up a Service Principal Name on the webserver using Microsoft's SetSPN utility. It's complicated and apparently prone to errors. Also, you have to ask your network/domain admin to change security policy for the web server so it is "trusted for delegation." If you don't get everything perfectly right, suddenly your intended Kerberos authentication is NTLM instead, and you can only impersonate rather than delegate, and thus no reaching out over the network as the user. Also, this method can be problematic because sometimes you need the web site or database to have permissions that the connecting user doesn't have. Create a service or COM+ application that fetches the resource for the web site Services and COM+ packages are run with their own set of credentials. Running as a high-privilege user is okay since they can do their own security and deny requests that are not legitimate, putting control in the hands of the application developer instead of the network admin. Problems: I am using a COM+ package that does exactly this on Windows Server 2000 to deliver highly sensitive images to a secured web application. I tried moving the web site to Windows Server 2003 and was suddenly denied permission to instantiate the COM+ object, very likely registry permissions. I trolled around quite a bit and did not solve the problem, partly because I was reluctant to give the IUSR account full registry permissions. That seems like the same bad practice as just running IIS as a high-privilege user. Note: This is actually really simple. In a programming language of your choice, you create a class with a function that returns an instance of the object you want (an ADODB.Connection, for example), and build a dll, which you register as a COM+ object. In your web server-side code, you create an instance of the class and use the function, and since it is running under a different security context, calls to network resources work. Map drive letters to shares This could theoretically work, but in my mind it's not really a good long-term strategy. Even though mappings can be created with specific credentials, and this can be done by others than a network admin, this also is going to mean that there are either way too many shared drives (small granularity) or too much permission is granted to entire file servers (large granularity). Also, I haven't figured out how to map a drive so that the IUSR gets the drives. Mapping a drive is for the current user, I don't know the IUSR account password to log in as it and create the mappings. Move the resources local to the web server/database There are times when I've done this, especially with Access databases. Does the database have to live out on the file share? Sometimes, it was just easiest to move the database to the web server or to the SQL database server (so the linked server to it would work). But I don't think this is a great all-around solution, either. And it won't work when the resource is a service rather than a file. Move the service to the final web server/database I suppose I could run a web server on my SQL Server database, so the web site can connect to it using impersonation and make me happy. But do we really want random extra web servers on our database servers just so this is possible? No. Virtual directories in IIS I know that virtual directories can help make remote resources look as though they are local, and this supports using custom credentials for each virtual directory. I haven't been able to come up with, yet, how this would solve the problem for system calls. Users could reach file shares directly, but this won't help, say, classic ASP code access resources. I could use a URL instead of a file path to read remote data files in a web page, but this isn't going to help me make a connection to an Access database, a SQL server database, or any other resource that uses a connection library rather than being able to just read all the bytes and work with them. I wish there was some kind of "service tunnel" that I could create. Think about how a VPN makes remote resources look like they are local. With a richer aliasing mechanism, perhaps code-based, why couldn't even database connections occur under a defined security context? Why not a special Windows component that lets you specify, per user, what resources are available and what alternate credentials are used for the connection? File shares, databases, web sites, you name it. I guess I'm almost talking about a specialized local proxy server. Anyway, so there's my list. I may update it if I think of more. Does anyone have any ideas for me? My current problem today is, yet again, I need a web site to connect to an Access database on a file share. Here we go again...

Read the article
Scrum Master Stephen Forte Teaches Agile Development, Silverlight and BI at GIDS 2010

- by rajesh ahuja

Great Indian Developer Summit 2010 – Gold Standard for India's Software Developer Ecosystem Bangalore, March 25, 2010: The author of several books on application and database development including Programming SQL Server 2008 and certified Scrum Master Stephen Forte is coming this summer to India's biggest summit for the developer ecosystem - Great Indian Developer Summit. At the summit, Stephen will conduct a workshop guaranteed to give attendees a jump start in taking a certified scrum master exam. Scrum, one of the most popular Agile project management and development methods, which is starting to be adopted at major corporations and on very large projects. After an introduction to the basics of Scrum like project planning and estimation, the Scrum Master, team, product owner and burn down, and of course the daily Scrum, Stephen will show many real world applications of the methodology drawn from his own experience as a Scrum Master. Negotiating with the business, estimation and team dynamics are all discussed as well as how to use Scrum in small organizations, large enterprise environments and consulting environments. Stephen will also discuss using Scrum with virtual teams and an off-shoring environment. He will then take a look at the tools we will use for Agile development, including planning poker, unit testing, and much more. On 20th April at the GIDS.NET Conference, Stephen will also conduct a series of sessions on Microsoft computing technologies. He will teach how to build data driven, n-tier Rich Internet Applications (RIA) with Silverlight 4.0. Line of business applications (LOB) in Silverlight 4.0 are easy by tapping the power of WCF RIA Services, the Silverlight Toolkit, and elevated out of browser support. Stephen's demo centric session will walk you through an example of building a LOB application with Silverlight 4.0. See how Silverlight and WCF RIA Services support domain logic, services, data binding, validation, server based paging, authentication, authorization and much more. Silverlight 4.0 means business. Silverlight runs C# and Visual Basic code, and so it seems natural that a business application might share some code between the Silverlight client and its ASP.NET Web server. You may want to run some code client-side for interactivity, but re-run that code on the server for security or reliability. This is possible, and there are several techniques you can use to accomplish this goal. In Stephen's second talk learn about the various techniques and their pros and cons. Some techniques work better in C#, others in VB. Still others are simpler with a little extra tooling or code-generation. Any serious Silverlight business application will almost certainly face this issue, and this session gets you going fast. In the third talk, Stephen will explain how to properly architect and deploy a BI application using a mix of some exciting new tools and some old familiar ones. He will start with a traditional relational transaction centric database (OLTP) and explore ways to build a data warehouse (OLAP), looking at the star and snowflake schemas. Next he will look at the process of extraction, transformation, and loading (ETL) your OLTP data into your data warehouse. Different techniques for ETL will be described and the various tradeoffs will be discussed. Then he will look at using the warehouse for reporting, drill down, and data analysis in Microsoft Excel's PowerPivot 2010. The session will round off by showing how to properly build a cube and build a data analysis application on top of that cube, and conclude by looking at some tools to help with the data visualization process. Every year, GIDS is a game changer for several thousands of IT professionals, providing them with a competitive edge over their peers, enlightening them with bleeding-edge information most useful in their daily jobs, helping them network with world-class experts and visionaries, and providing them with a much needed thrust in their careers. Attend Great Indian Developer Summit to gain the information, education and solutions you seek. From post-conference workshops, breakout sessions by expert instructors, keynotes by industry heavyweights, enhanced networking opportunities, and more. About Great Indian Developer Summit Great Indian Developer Summit is the gold standard for India's software developer ecosystem for gaining exposure to and evaluating new projects, tools, services, platforms, languages, software and standards. Packed with premium knowledge, action plans and advise from been-there-done-it veterans, creators, and visionaries, the 2010 edition of Great Indian Developer Summit features focused sessions, case studies, workshops and power panels that will transform you into a force to reckon with. Featuring 3 co-located conferences: GIDS.NET, GIDS.Web, GIDS.Java and an exclusive day of in-depth tutorials - GIDS.Workshops, from 20 April to 24 April at the IISc campus in Bangalore. At GIDS you'll participate in hundreds of sessions encompassing the full range of Microsoft computing, Java, Agile, RIA, Rich Web, open source/standards, languages, frameworks and platforms, practical tutorials that deep dive into technical skill and best practices, inspirational keynote presentations, an Expo Hall featuring dozens of the latest projects and products activities, engaging networking events, and the interact with the best and brightest of speakers from around the world. For further information on GIDS 2010, please visit the summit on the web http://www.developersummit.com/ A Saltmarch Media Press Release E: [email protected] Ph: +91 80 4005 1000

Read the article
Help with force close occurrences in my app

- by Ken

This is the last issue with this app. Periodic force close situations. I think something should be on another thread but I'm not sure what. Anyway, I can always count on a freeze on first install. If I wait, eventually (maybe 10 seconds) the app comes around, maybe more. here is an excerpt from logcat--the three lines occur after full layout is displayed and I attempt to touch a [game] 'peg' which should spawn a sprite, but the freeze occurs there. Can anybody tell what the issue might be?: I/System.out( 279): TouchDown (17.0,106.0) I/System.out( 279): checking (17,106 I/System.out( 279): hit for bounds Rect(3, 98 - 32, 130) [FREEZE BEGINS] W/webcore ( 279): Can't get the viewWidth after the first layout W/WindowManager( 60): Key dispatching timed out sending to com.live.brainbuilderfree/com.live.brainbuilderfree.BrainBuilderFree W/WindowManager( 60): Previous dispatch state: null W/WindowManager( 60): Current dispatch state: {{null to Window{43fd87a0 com.live.brainbuilderfree/com.live.brainbuilderfree.BrainBuilderFree paused=false} @ 1295232880017 lw=Window{43fd87a0 com.live.brainbuilderfree/com.live.brainbuilderfree.BrainBuilderFree paused=false} lb=android.os.BinderProxy@440523b8 fin=false gfw=true ed=true tts=0 wf=false fp=false mcf=Window{43fd87a0 com.live.brainbuilderfree/com.live.brainbuilderfree.BrainBuilderFree paused=false}}} I/Process ( 60): Sending signal. PID: 279 SIG: 3 I/dalvikvm( 279): threadid=3: reacting to signal 3 D/dalvikvm( 124): GC_EXPLICIT freed 1754 objects / 106104 bytes in 7365ms I/Process ( 60): Sending signal. PID: 60 SIG: 3 I/dalvikvm( 60): threadid=3: reacting to signal 3 I/dalvikvm( 60): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 60): Sending signal. PID: 263 SIG: 3 I/dalvikvm( 263): threadid=3: reacting to signal 3 I/dalvikvm( 279): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 60): Sending signal. PID: 117 SIG: 3 I/dalvikvm( 117): threadid=3: reacting to signal 3 I/dalvikvm( 117): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 60): Sending signal. PID: 254 SIG: 3 I/Process ( 60): Sending signal. PID: 121 SIG: 3 I/dalvikvm( 121): threadid=3: reacting to signal 3 D/AudioSink( 34): bufferCount (4) is too small and increased to 12 I/System.out( 279): making white sprite I/Process ( 60): Sending signal. PID: 186 SIG: 3 I/Process ( 60): Sending signal. PID: 232 SIG: 3 D/MillennialMediaAdSDK( 279): size: 1 D/MillennialMediaAdSDK( 279): num: 1 D/AdWhirl SDK( 279): Millennial success D/AdWhirl SDK( 279): Will call rotateAd() in 120 seconds I/dalvikvm( 232): threadid=3: reacting to signal 3 I/dalvikvm( 121): Wrote stack traces to '/data/anr/traces.txt' I/Process ( 60): Sending signal. PID: 222 SIG: 3 I/MillennialMediaAdSDK( 279): Millennial ad return success D/MillennialMediaAdSDK( 279): View height: 0 D/MillennialMediaAdSDK( 279): nextUrl: [deleted] I/Process ( 60): Sending signal. PID: 239 SIG: 3 I/Process ( 60): Sending signal. PID: 213 SIG: 3 D/AdWhirl SDK( 279): Added subview D/AdWhirl SDK( 279): Pinging URL: [deleted] I/Process ( 60): Sending signal. PID: 197 SIG: 3 I/dalvikvm( 197): threadid=3: reacting to signal 3 I/Process ( 60): Sending signal. PID: 164 SIG: 3 I/dalvikvm( 164): threadid=3: reacting to signal 3 D/dalvikvm( 279): GC_FOR_MALLOC freed 7735 objects / 639688 bytes in 217ms I/Process ( 60): Sending signal. PID: 124 SIG: 3 I/dalvikvm( 124): threadid=3: reacting to signal 3 I/Process ( 60): Sending signal. PID: 158 SIG: 3 I/dalvikvm( 158): threadid=3: reacting to signal 3 I/Process ( 60): Sending signal. PID: 127 SIG: 3 E/ActivityManager( 60): ANR in com.live.brainbuilderfree (com.live.brainbuilderfree/.BrainBuilderFree) E/ActivityManager( 60): Reason: keyDispatchingTimedOut E/ActivityManager( 60): Load: 3.46 / 1.69 / 0.65 E/ActivityManager( 60): CPU usage from 28095ms to 140ms ago: E/ActivityManager( 60): system_server: 30% = 25% user + 4% kernel / faults: 3119 minor 66 major E/ActivityManager( 60): mediaserver: 11% = 7% user + 4% kernel / faults: 746 minor 17 major E/ActivityManager( 60): com.svox.pico: 1% = 0% user + 1% kernel / faults: 2833 minor 8 major E/ActivityManager( 60): d.process.acore: 1% = 0% user + 0% kernel / faults: 1146 minor 36 major E/ActivityManager( 60): ndroid.launcher: 1% = 0% user + 0% kernel / faults: 852 minor 6 major E/ActivityManager( 60): m.android.phone: 0% = 0% user + 0% kernel / faults: 621 minor 7 major E/ActivityManager( 60): kswapd0: 0% = 0% user + 0% kernel E/ActivityManager( 60): ronsoft.openwnn: 0% = 0% user + 0% kernel / faults: 337 minor 2 major E/ActivityManager( 60): adbd: 0% = 0% user + 0% kernel / faults: 3 minor E/ActivityManager( 60): zygote: 0% = 0% user + 0% kernel / faults: 169 minor E/ActivityManager( 60): events/0: 0% = 0% user + 0% kernel E/ActivityManager( 60): rild: 0% = 0% user + 0% kernel / faults: 103 minor 3 major E/ActivityManager( 60): pdflush: 0% = 0% user + 0% kernel E/ActivityManager( 60): .quicksearchbox: 0% = 0% user + 0% kernel / faults: 61 minor E/ActivityManager( 60): id.defcontainer: 0% = 0% user + 0% kernel / faults: 12 minor E/ActivityManager( 60): +rainbuilderfree: 0% = 0% user + 0% kernel E/ActivityManager( 60): +sh: 0% = 0% user + 0% kernel E/ActivityManager( 60): +app_process: 0% = 0% user + 0% kernel E/ActivityManager( 60): TOTAL: 100% = 76% user + 21% kernel + 2% iowait + 0% irq + 0% softirq I/dalvikvm( 127): threadid=3: reacting to signal 3 I/dalvikvm( 186): threadid=3: reacting to signal 3 D/dalvikvm( 60): GC_FOR_MALLOC freed 3747 objects / 228920 bytes in 609ms I/dalvikvm-heap( 60): Grow heap (frag case) to 4.759MB for 36896-byte allocation I/dalvikvm( 239): threadid=3: reacting to signal 3 D/dalvikvm( 60): GC_FOR_MALLOC freed 226 objects / 9952 bytes in 546ms I/dalvikvm( 213): threadid=3: reacting to signal 3 D/dalvikvm( 60): GC_FOR_MALLOC freed 105 objects / 5816 bytes in 492ms I/dalvikvm-heap( 60): Grow heap (frag case) to 4.815MB for 49188-byte allocation I/dalvikvm( 222): threadid=3: reacting to signal 3 D/dalvikvm( 60): GC_FOR_MALLOC freed 77 objects / 5232 bytes in 546ms I/dalvikvm( 254): threadid=3: reacting to signal 3 D/dalvikvm( 60): GC_FOR_MALLOC freed 105 objects / 55856 bytes in 521ms I/dalvikvm-heap( 60): Grow heap (frag case) to 4.876MB for 98360-byte allocation D/dalvikvm( 60): GC_FOR_MALLOC freed 58 objects / 3632 bytes in 340ms D/dalvikvm( 60): GC_FOR_MALLOC freed 1093 objects / 185256 bytes in 572ms W/WindowManager( 60): Continuing to wait for key to be dispatched I/System.out( 279): TouchMove (117.0,124.0) I/System.out( 279): TouchUP (117.0,124.0) D/dalvikvm( 60): GC_FOR_MALLOC freed 141 objects / 108328 bytes in 564ms I/ARMAssembler( 60): generated scanline__00000077:03515104_00000000_00000000 [ 33 ipp] (47 ins) at [0x313d78:0x313e34] in 11621593 ns W/InputManagerService( 60): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@43f66a10 I/dalvikvm( 239): Wrote stack traces to '/data/anr/traces.txt' I/dalvikvm( 263): Wrote stack traces to '/data/anr/traces.txt' etc...

Read the article
Matlab Image watermarking question , using both SVD and DWT

- by Georgek

Hello all . here is a code that i got over the net ,and it is supposed to embed a watermark of size(50*20) called _copyright.bmp in the Code below . the size of the cover object is (512*512), it is called _lena_std_bw.bmp.What we did here is we did DWT2 2 times for the image , when we reached our second dwt2 cA2 size is 128*128. You should notice that the blocksize and it equals 4, it is used to determine the max msg size based on cA2 according to the following code:max_message=RcA2*CcA2/(blocksize^2). in our current case max_message would equal 128*128/(4^2)=1024. i want to embed a bigger watermark in the 2nd dwt2 and lets say the size of that watermark is 400*10(i can change the dimension using MS PAINT), what i have to do is change the size of the blocksize to 2. so max_message=4096.Matlab gives me 3 errors and they are : ??? Error using == plus Matrix dimensions must agree. Error in == idwt2 at 93 x = upsconv2(a,{Lo_R,Lo_R},sx,dwtEXTM,shift)+ ... % Approximation. Error in == two_dwt_svd_low_low at 88 CAA1 = idwt2(cA22,cH2,cV2,cD2,'haar',[RcA1,CcA1]); The origional Code is (the origional code where blocksize =4): %This algorithm makes DWT for the whole image and after that make DWT for %cH1 and make SVD for cH2 and embed the watermark in every level after SVD %(1) -------------- Embed Watermark ------------------------------------ %Add the watermar W to original image I and give the watermarked image in J %-------------------------------------------------------------------------- % set the gain factor for embeding and threshold for evaluation clc; clear all; close all; % save start time start_time=cputime; % set the value of threshold and alpha thresh=.5; alpha =0.01; % read in the cover object file_name='_lena_std_bw.bmp'; cover_object=double(imread(file_name)); % determine size of watermarked image Mc=size(cover_object,1); %Height Nc=size(cover_object,2); %Width % read in the message image and reshape it into a vector file_name='_copyright.bmp'; message=double(imread(file_name)); T=message; Mm=size(message,1); %Height Nm=size(message,2); %Width % perform 1-level DWT for the whole cover image [cA1,cH1,cV1,cD1] = dwt2(cover_object,'haar'); % determine the size of cA1 [RcA1 CcA1]=size(cA1) % perform 2-level DWT for cA1 [cA2,cH2,cV2,cD2] = dwt2(cA1,'haar'); % determine the size of cA2 [RcA2 CcA2]=size(cA2) % set the value of blocksize blocksize=4 % reshape the watermark to a vector message_vector=round(reshape(message,Mm*Nm,1)./256); W=message_vector; % determine maximum message size based on cA2, and blocksize max_message=RcA2*CcA2/(blocksize^2) % check that the message isn't too large for cover if (length(message) max_message) error('Message too large to fit in Cover Object') end %----------------------- process the image in blocks ---------------------- x=1; y=1; for (kk = 1:length(message_vector)) [cA2u cA2s cA2v]=svd(cA2(y:y+blocksize-1,x:x+blocksize-1)); % if message bit contains zero, modify S of the original image if (message_vector(kk) == 0) cA2s = cA2s*(1 + alpha); % otherwise mask is filled with zeros else cA2s=cA2s; end cA22(y:y+blocksize-1,x:x+blocksize-1)=cA2u*cA2s*cA2v; % move to next block of mask along x; If at end of row, move to next row if (x+blocksize) >= CcA2 x=1; y=y+blocksize; else x=x+blocksize; end end % perform IDWT CAA1 = idwt2(cA22,cH2,cV2,cD2,'haar',[RcA1,CcA1]); watermarked_image= idwt2(CAA1,cH1,cV1,cD1,'haar',[Mc,Nc]); % convert back to uint8 watermarked_image_uint8=uint8(watermarked_image); % write watermarked Image to file imwrite(watermarked_image_uint8,'dwt_watermarked.bmp','bmp'); % display watermarked image figure(1) imshow(watermarked_image_uint8,[]) title('Watermarked Image') %(2) ---------------------------------------------------------------------- %---------- Extract Watermark from attacked watermarked image ------------- %-------------------------------------------------------------------------- % read in the watermarked object file_name='dwt_watermarked.bmp'; watermarked_image=double(imread(file_name)); % determine size of watermarked image Mw=size(watermarked_image,1); %Height Nw=size(watermarked_image,2); %Width % perform 1-level DWT for the whole watermarked image [ca1,ch1,cv1,cd1] = dwt2(watermarked_image,'haar'); % determine the size of ca1 [Rca1 Cca1]=size(ca1); % perform 2-level DWT for ca1 [ca2,ch2,cv2,cd2] = dwt2(ca1,'haar'); % determine the size of ca2 [Rca2 Cca2]=size(ca2); % process the image in blocks % for each block get a bit for message x=1; y=1; for (kk = 1:length(message_vector)) % sets correlation to 1 when patterns are identical to avoid /0 errors % otherwise calcluate difference between the cover image and the % watermarked image [cA2u cA2s cA2v]=svd(cA2(y:y+blocksize-1,x:x+blocksize-1)); [ca2u1 ca2s1 ca2v1]=svd(ca2(y:y+blocksize-1,x:x+blocksize-1)); correlation(kk)=diag(ca2s1-cA2s)'*diag(ca2s1-cA2s)/(alpha*alpha)/(diag(cA2s)*diag(cA2s)); % move on to next block. At and of row move to next row if (x+blocksize) >= Cca2 x=1; y=y+blocksize; else x=x+blocksize; end end % if correlation exceeds average correlation correlation(kk)=correlation(kk)+mean(correlation(1:Mm*Nm)); for kk = 1:length(correlation) if (correlation(kk) > thresh*alpha);%thresh*mean(correlation(1:Mo*No))) message_vector(kk)=0; end end % reshape the message vector and display recovered watermark. figure(2) message=reshape(message_vector(1:Mm*Nm),Mm,Nm); imshow(message,[]) title('Recovered Watermark') % display processing time elapsed_time=cputime-start_time, please do help,its my graduation project and i have been trying this code for along time but failed miserable. Thanks in advance

Read the article
The Tab1.java from API Demo has exception.

- by Kooper

I don't know why.All my Tab programs have exception.Even from API Demo. Here is the code: package com.example.android.apis.view; import android.app.TabActivity; import android.os.Bundle; import android.widget.TabHost; import android.widget.TabHost.TabSpec; import android.view.LayoutInflater; import android.view.View; public class Tab1 extends TabActivity { @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); TabHost tabHost = getTabHost(); LayoutInflater.from(this).inflate(R.layout.main,tabHost.getTabContentView(), true); tabHost.addTab(tabHost.newTabSpec("tab1") .setIndicator("tab1") .setContent(R.id.view1)); tabHost.addTab(tabHost.newTabSpec("tab2") .setIndicator("tab2") .setContent(R.id.view2)); tabHost.addTab(tabHost.newTabSpec("tab3") .setIndicator("tab3") .setContent(R.id.view3)); } } Here is the log: 06-13 17:24:38.336: WARN/jdwp(262): Debugger is telling the VM to exit with code=1 06-13 17:24:38.336: INFO/dalvikvm(262): GC lifetime allocation: 2511 bytes 06-13 17:24:38.416: DEBUG/Zygote(30): Process 262 exited cleanly (1) 06-13 17:24:38.456: INFO/ActivityManager(54): Process com.example.android.apis.view (pid 262) has died. 06-13 17:24:38.696: INFO/UsageStats(54): Unexpected resume of com.android.launcher while already resumed in com.example.android.apis.view 06-13 17:24:38.736: WARN/InputManagerService(54): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@44dc4b38 06-13 17:24:48.337: DEBUG/AndroidRuntime(269): AndroidRuntime START <<<<<<<<<<<<<< 06-13 17:24:48.346: DEBUG/AndroidRuntime(269): CheckJNI is ON 06-13 17:24:48.856: DEBUG/AndroidRuntime(269): --- registering native functions --- 06-13 17:24:49.596: DEBUG/ddm-heap(269): Got feature list request 06-13 17:24:50.576: DEBUG/AndroidRuntime(269): Shutting down VM 06-13 17:24:50.576: DEBUG/dalvikvm(269): DestroyJavaVM waiting for non-daemon threads to exit 06-13 17:24:50.576: DEBUG/dalvikvm(269): DestroyJavaVM shutting VM down 06-13 17:24:50.576: DEBUG/dalvikvm(269): HeapWorker thread shutting down 06-13 17:24:50.586: DEBUG/dalvikvm(269): HeapWorker thread has shut down 06-13 17:24:50.586: DEBUG/jdwp(269): JDWP shutting down net... 06-13 17:24:50.586: INFO/dalvikvm(269): Debugger has detached; object registry had 1 entries 06-13 17:24:50.596: ERROR/AndroidRuntime(269): ERROR: thread attach failed 06-13 17:24:50.606: DEBUG/dalvikvm(269): VM cleaning up 06-13 17:24:50.676: DEBUG/dalvikvm(269): LinearAlloc 0x0 used 628628 of 5242880 (11%) 06-13 17:24:51.476: DEBUG/AndroidRuntime(278): AndroidRuntime START <<<<<<<<<<<<<< 06-13 17:24:51.486: DEBUG/AndroidRuntime(278): CheckJNI is ON 06-13 17:24:51.986: DEBUG/AndroidRuntime(278): --- registering native functions --- 06-13 17:24:52.746: DEBUG/ddm-heap(278): Got feature list request 06-13 17:24:53.716: DEBUG/ActivityManager(54): Uninstalling process com.example.android.apis.view 06-13 17:24:53.726: INFO/ActivityManager(54): Starting activity: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=com.example.android.apis.view/.Tab1 } 06-13 17:24:53.876: DEBUG/AndroidRuntime(278): Shutting down VM 06-13 17:24:53.886: DEBUG/dalvikvm(278): DestroyJavaVM waiting for non-daemon threads to exit 06-13 17:24:53.916: DEBUG/dalvikvm(278): DestroyJavaVM shutting VM down 06-13 17:24:53.926: DEBUG/dalvikvm(278): HeapWorker thread shutting down 06-13 17:24:53.936: DEBUG/dalvikvm(278): HeapWorker thread has shut down 06-13 17:24:53.936: DEBUG/jdwp(278): JDWP shutting down net... 06-13 17:24:53.936: INFO/dalvikvm(278): Debugger has detached; object registry had 1 entries 06-13 17:24:53.957: DEBUG/dalvikvm(278): VM cleaning up 06-13 17:24:54.026: ERROR/AndroidRuntime(278): ERROR: thread attach failed 06-13 17:24:54.146: DEBUG/dalvikvm(278): LinearAlloc 0x0 used 638596 of 5242880 (12%) 06-13 17:24:54.286: INFO/ActivityManager(54): Start proc com.example.android.apis.view for activity com.example.android.apis.view/.Tab1: pid=285 uid=10054 gids={1015} 06-13 17:24:54.676: DEBUG/ddm-heap(285): Got feature list request 06-13 17:24:55.006: WARN/ActivityThread(285): Application com.example.android.apis.view is waiting for the debugger on port 8100... 06-13 17:24:55.126: INFO/System.out(285): Sending WAIT chunk 06-13 17:24:55.186: INFO/dalvikvm(285): Debugger is active 06-13 17:24:55.378: INFO/System.out(285): Debugger has connected 06-13 17:24:55.386: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:55.586: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:55.796: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:55.996: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:56.196: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:56.406: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:56.606: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:56.806: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:57.016: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:57.216: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:57.416: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:57.626: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:57.836: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:58.039: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:58.246: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:58.451: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:58.656: INFO/System.out(285): waiting for debugger to settle... 06-13 17:24:58.866: INFO/System.out(285): debugger has settled (1367) 06-13 17:24:59.126: ERROR/gralloc(54): [unregister] handle 0x129980 still locked (state=40000001) 06-13 17:25:03.816: WARN/ActivityManager(54): Launch timeout has expired, giving up wake lock! 06-13 17:25:04.906: WARN/ActivityManager(54): Activity idle timeout for HistoryRecord{44d60e10 com.example.android.apis.view/.Tab1}

Read the article
Microsoft and jQuery

- by Rick Strahl

The jQuery JavaScript library has been steadily getting more popular and with recent developments from Microsoft, jQuery is also getting ever more exposure on the ASP.NET platform including now directly from Microsoft. jQuery is a light weight, open source DOM manipulation library for JavaScript that has changed how many developers think about JavaScript. You can download it and find more information on jQuery on www.jquery.com. For me jQuery has had a huge impact on how I develop Web applications and was probably the main reason I went from dreading to do JavaScript development to actually looking forward to implementing client side JavaScript functionality. It has also had a profound impact on my JavaScript skill level for me by seeing how the library accomplishes things (and often reviewing the terse but excellent source code). jQuery made an uncomfortable development platform (JavaScript + DOM) a joy to work on. Although jQuery is by no means the only JavaScript library out there, its ease of use, small size, huge community of plug-ins and pure usefulness has made it easily the most popular JavaScript library available today. As a long time jQuery user, I’ve been excited to see the developments from Microsoft that are bringing jQuery to more ASP.NET developers and providing more integration with jQuery for ASP.NET’s core features rather than relying on the ASP.NET AJAX library. Microsoft and jQuery – making Friends jQuery is an open source project but in the last couple of years Microsoft has really thrown its weight behind supporting this open source library as a supported component on the Microsoft platform. When I say supported I literally mean supported: Microsoft now offers actual tech support for jQuery as part of their Product Support Services (PSS) as jQuery integration has become part of several of the ASP.NET toolkits and ships in several of the default Web project templates in Visual Studio 2010. The ASP.NET MVC 3 framework (still in Beta) also uses jQuery for a variety of client side support features including client side validation and we can look forward toward more integration of client side functionality via jQuery in both MVC and WebForms in the future. In other words jQuery is becoming an optional but included component of the ASP.NET platform. PSS support means that support staff will answer jQuery related support questions as part of any support incidents related to ASP.NET which provides some piece of mind to some corporate development shops that require end to end support from Microsoft. In addition to including jQuery and supporting it, Microsoft has also been getting involved in providing development resources for extending jQuery’s functionality via plug-ins. Microsoft’s last version of the Microsoft Ajax Library – which is the successor to the native ASP.NET AJAX Library – included some really cool functionality for client templates, databinding and localization. As it turns out Microsoft has rebuilt most of that functionality using jQuery as the base API and provided jQuery plug-ins of these components. Very recently these three plug-ins were submitted and have been approved for inclusion in the official jQuery plug-in repository and been taken over by the jQuery team for further improvements and maintenance. Even more surprising: The jQuery-templates component has actually been approved for inclusion in the next major update of the jQuery core in jQuery V1.5, which means it will become a native feature that doesn’t require additional script files to be loaded. Imagine this – an open source contribution from Microsoft that has been accepted into a major open source project for a core feature improvement. Microsoft has come a long way indeed! What the Microsoft Involvement with jQuery means to you For Microsoft jQuery support is a strategic decision that affects their direction in client side development, but nothing stopped you from using jQuery in your applications prior to Microsoft’s official backing and in fact a large chunk of developers did so readily prior to Microsoft’s announcement. Official support from Microsoft brings a few benefits to developers however. jQuery support in Visual Studio 2010 means built-in support for jQuery IntelliSense, automatically added jQuery scripts in many projects types and a common base for client side functionality that actually uses what most developers are already using. If you have already been using jQuery and were worried about straying from the Microsoft line and their internal Microsoft Ajax Library – worry no more. With official support and the change in direction towards jQuery Microsoft is now following along what most in the ASP.NET community had already been doing by using jQuery, which is likely the reason for Microsoft’s shift in direction in the first place. ASP.NET AJAX and the Microsoft AJAX Library weren’t bad technology – there was tons of useful functionality buried in these libraries. However, these libraries never got off the ground, mainly because early incarnations were squarely aimed at control/component developers rather than application developers. For all the functionality that these controls provided for control developers they lacked in useful and easily usable application developer functionality that was easily accessible in day to day client side development. The result was that even though Microsoft shipped support for these tools in the box (in .NET 3.5 and 4.0), other than for the internal support in ASP.NET for things like the UpdatePanel and the ASP.NET AJAX Control Toolkit as well as some third party vendors, the Microsoft client libraries were largely ignored by the developer community opening the door for other client side solutions. Microsoft seems to be acknowledging developer choice in this case: Many more developers were going down the jQuery path rather than using the Microsoft built libraries and there seems to be little sense in continuing development of a technology that largely goes unused by the majority of developers. Kudos for Microsoft for recognizing this and gracefully changing directions. Note that even though there will be no further development in the Microsoft client libraries they will continue to be supported so if you’re using them in your applications there’s no reason to start running for the exit in a panic and start re-writing everything with jQuery. Although that might be a reasonable choice in some cases, jQuery and the Microsoft libraries work well side by side so that you can leave existing solutions untouched even as you enhance them with jQuery. The Microsoft jQuery Plug-ins – Solid Core Features One of the most interesting developments in Microsoft’s embracing of jQuery is that Microsoft has started contributing to jQuery via standard mechanism set for jQuery developers: By submitting plug-ins. Microsoft took some of the nicest new features of the unpublished Microsoft Ajax Client Library and re-wrote these components for jQuery and then submitted them as plug-ins to the jQuery plug-in repository. Accepted plug-ins get taken over by the jQuery team and that’s exactly what happened with the three plug-ins submitted by Microsoft with the templating plug-in even getting slated to be published as part of the jQuery core in the next major release (1.5). The following plug-ins are provided by Microsoft: jQuery Templates – a client side template rendering engine jQuery Data Link – a client side databinder that can synchronize changes without code jQuery Globalization – provides formatting and conversion features for dates and numbers The first two are ports of functionality that was slated for the Microsoft Ajax Library while functionality for the globalization library provides functionality that was already found in the original ASP.NET AJAX library. To me all three plug-ins address a pressing need in client side applications and provide functionality I’ve previously used in other incarnations, but with more complete implementations. Let’s take a close look at these plug-ins. jQuery Templates http://api.jquery.com/category/plugins/templates/ Client side templating is a key component for building rich JavaScript applications in the browser. Templating on the client lets you avoid from manually creating markup by creating DOM nodes and injecting them individually into the document via code. Rather you can create markup templates – similar to the way you create classic ASP server markup – and merge data into these templates to render HTML which you can then inject into the document or replace existing content with. Output from templates are rendered as a jQuery matched set and can then be easily inserted into the document as needed. Templating is key to minimize client side code and reduce repeated code for rendering logic. Instead a single template can be used in many places for updating and adding content to existing pages. Further if you build pure AJAX interfaces that rely entirely on client rendering of the initial page content, templates allow you to a use a single markup template to handle all rendering of each specific HTML section/element. I’ve used a number of different client rendering template engines with jQuery in the past including jTemplates (a PHP style templating engine) and a modified version of John Resig’s MicroTemplating engine which I built into my own set of libraries because it’s such a commonly used feature in my client side applications. jQuery templates adds a much richer templating model that allows for sub-templates and access to the data items. Like John Resig’s original Micro Template engine, the core basics of the templating engine create JavaScript code which means that templates can include JavaScript code. To give you a basic idea of how templates work imagine I have an application that downloads a set of stock quotes based on a symbol list then displays them in the document. To do this you can create an ‘item’ template that describes how each of the quotes is renderd as a template inside of the document: <script id="stockTemplate" type="text/x-jquery-tmpl"> <div id="divStockQuote" class="errordisplay" style="width: 500px;"> <div class="label">Company:</div><div><b>${Company}(${Symbol})</b></div> <div class="label">Last Price:</div><div>${LastPrice}</div> <div class="label">Net Change:</div><div> {{if NetChange > 0}} <b style="color:green" >${NetChange}</b> {{else}} <b style="color:red" >${NetChange}</b> {{/if}} </div> <div class="label">Last Update:</div><div>${LastQuoteTimeString}</div> </div> </script> The ‘template’ is little more than HTML with some markup expressions inside of it that define the template language. Notice the embedded ${} expressions which reference data from the quote objects returned from an AJAX call on the server. You can embed any JavaScript or value expression in these template expressions. There are also a number of structural commands like {{if}} and {{each}} that provide for rudimentary logic inside of your templates as well as commands ({{tmpl}} and {{wrap}}) for nesting templates. You can find more about the full set of markup expressions available in the documentation. To load up this data you can use code like the following: <script type="text/javascript"> //var Proxy = new ServiceProxy("../PageMethods/PageMethodsService.asmx/"); $(document).ready(function () { $("#btnGetQuotes").click(GetQuotes); }); function GetQuotes() { var symbols = $("#txtSymbols").val().split(","); $.ajax({ url: "../PageMethods/PageMethodsService.asmx/GetStockQuotes", data: JSON.stringify({ symbols: symbols }), // parameter map type: "POST", // data has to be POSTed contentType: "application/json", timeout: 10000, dataType: "json", success: function (result) { var quotes = result.d; var jEl = $("#stockTemplate").tmpl(quotes); $("#quoteDisplay").empty().append(jEl); }, error: function (xhr, status) { alert(status + "\r\n" + xhr.responseText); } }); }; </script> In this case an ASMX AJAX service is called to retrieve the stock quotes. The service returns an array of quote objects. The result is returned as an object with the .d property (in Microsoft service style) that returns the actual array of quotes. The template is applied with: var jEl = $("#stockTemplate").tmpl(quotes); which selects the template script tag and uses the .tmpl() function to apply the data to it. The result is a jQuery matched set of elements that can then be appended to the quote display element in the page. The template is merged against an array in this example. When the result is an array the template is automatically applied to each each array item. If you pass a single data item – like say a stock quote – the template works exactly the same way but is applied only once. Templates also have access to a $data item which provides the current data item and information about the tempalte that is currently executing. This makes it possible to keep context within the context of the template itself and also to pass context from a parent template to a child template which is very powerful. Templates can be evaluated by using the template selector and calling the .tmpl() function on the jQuery matched set as shown above or you can use the static $.tmpl() function to provide a template as a string. This allows you to dynamically create templates in code or – more likely – to load templates from the server via AJAX calls. In short there are options The above shows off some of the basics, but there’s much for functionality available in the template engine. Check the documentation link for more information and links to additional examples. The plug-in download also comes with a number of examples that demonstrate functionality. jQuery templates will become a native component in jQuery Core 1.5, so it’s definitely worthwhile checking out the engine today and get familiar with this interface. As much as I’m stoked about templating becoming part of the jQuery core because it’s such an integral part of many applications, there are also a couple shortcomings in the current incarnation: Lack of Error Handling Currently if you embed an expression that is invalid it’s simply not rendered. There’s no error rendered into the template nor do the various template functions throw errors which leaves finding of bugs as a runtime exercise. I would like some mechanism – optional if possible – to be able to get error info of what is failing in a template when it’s rendered. No String Output Templates are always rendered into a jQuery matched set and there’s no way that I can see to directly render to a string. String output can be useful for debugging as well as opening up templating for creating non-HTML string output. Limited JavaScript Access Unlike John Resig’s original MicroTemplating Engine which was entirely based on JavaScript code generation these templates are limited to a few structured commands that can ‘execute’. There’s no code execution inside of script code which means you’re limited to calling expressions available in global objects or the data item passed in. This may or may not be a big deal depending on the complexity of your template logic. Error handling has been discussed quite a bit and it’s likely there will be some solution to that particualar issue by the time jQuery templates ship. The others are relatively minor issues but something to think about anyway. jQuery Data Link http://api.jquery.com/category/plugins/data-link/ jQuery Data Link provides the ability to do two-way data binding between input controls and an underlying object’s properties. The typical scenario is linking a textbox to a property of an object and have the object updated when the text in the textbox is changed and have the textbox change when the value in the object or the entire object changes. The plug-in also supports converter functions that can be applied to provide the conversion logic from string to some other value typically necessary for mapping things like textbox string input to say a number property and potentially applying additional formatting and calculations. In theory this sounds great, however in reality this plug-in has some serious usability issues. Using the plug-in you can do things like the following to bind data: person = { firstName: "rick", lastName: "strahl"}; $(document).ready( function() { // provide for two-way linking of inputs $("form").link(person); // bind to non-input elements explicitly $("#objFirst").link(person, { firstName: { name: "objFirst", convertBack: function (value, source, target) { $(target).text(value); } } }); $("#objLast").link(person, { lastName: { name: "objLast", convertBack: function (value, source, target) { $(target).text(value); } } }); }); This code hooks up two-way linking between a couple of textboxes on the page and the person object. The first line in the .ready() handler provides mapping of object to form field with the same field names as properties on the object. Note that .link() does NOT bind items into the textboxes when you call .link() – changes are mapped only when values change and you move out of the field. Strike one. The two following commands allow manual binding of values to specific DOM elements which is effectively a one-way bind. You specify the object and a then an explicit mapping where name is an ID in the document. The converter is required to explicitly assign the value to the element. Strike two. You can also detect changes to the underlying object and cause updates to the input elements bound. Unfortunately the syntax to do this is not very natural as you have to rely on the jQuery data object. To update an object’s properties and get change notification looks like this: function updateFirstName() { $(person).data("firstName", person.firstName + " (code updated)"); } This works fine in causing any linked fields to be updated. In the bindings above both the firstName input field and objFirst DOM element gets updated. But the syntax requires you to use a jQuery .data() call for each property change to ensure that the changes are tracked properly. Really? Sure you’re binding through multiple layers of abstraction now but how is that better than just manually assigning values? The code savings (if any) are going to be minimal. As much as I would like to have a WPF/Silverlight/Observable-like binding mechanism in client script, this plug-in doesn’t help much towards that goal in its current incarnation. While you can bind values, the ‘binder’ is too limited to be really useful. If initial values can’t be assigned from the mappings you’re going to end up duplicating work loading the data using some other mechanism. There’s no easy way to re-bind data with a different object altogether since updates trigger only through the .data members. Finally, any non-input elements have to be bound via code that’s fairly verbose and frankly may be more voluminous than what you might write by hand for manual binding and unbinding. Two way binding can be very useful but it has to be easy and most importantly natural. If it’s more work to hook up a binding than writing a couple of lines to do binding/unbinding this sort of thing helps very little in most scenarios. In talking to some of the developers the feature set for Data Link is not complete and they are still soliciting input for features and functionality. If you have ideas on how you want this feature to be more useful get involved and post your recommendations. As it stands, it looks to me like this component needs a lot of love to become useful. For this component to really provide value, bindings need to be able to be refreshed easily and work at the object level, not just the property level. It seems to me we would be much better served by a model binder object that can perform these binding/unbinding tasks in bulk rather than a tool where each link has to be mapped first. I also find the choice of creating a jQuery plug-in questionable – it seems a standalone object – albeit one that relies on the jQuery library – would provide a more intuitive interface than the current forcing of options onto a plug-in style interface. Out of the three Microsoft created components this is by far the least useful and least polished implementation at this point. jQuery Globalization http://github.com/jquery/jquery-global Globalization in JavaScript applications often gets short shrift and part of the reason for this is that natively in JavaScript there’s little support for formatting and parsing of numbers and dates. There are a number of JavaScript libraries out there that provide some support for globalization, but most are limited to a particular portion of globalization. As .NET developers we’re fairly spoiled by the richness of APIs provided in the framework and when dealing with client development one really notices the lack of these features. While you may not necessarily need to localize your application the globalization plug-in also helps with some basic tasks for non-localized applications: Dealing with formatting and parsing of dates and time values. Dates in particular are problematic in JavaScript as there are no formatters whatsoever except the .toString() method which outputs a verbose and next to useless long string. With the globalization plug-in you get a good chunk of the formatting and parsing functionality that the .NET framework provides on the server. You can write code like the following for example to format numbers and dates: var date = new Date(); var output = $.format(date, "MMM. dd, yy") + "\r\n" + $.format(date, "d") + "\r\n" + // 10/25/2010 $.format(1222.32213, "N2") + "\r\n" + $.format(1222.33, "c") + "\r\n"; alert(output); This becomes even more useful if you combine it with templates which can also include any JavaScript expressions. Assuming the globalization plug-in is loaded you can create template expressions that use the $.format function. Here’s the template I used earlier for the stock quote again with a couple of formats applied: <script id="stockTemplate" type="text/x-jquery-tmpl"> <div id="divStockQuote" class="errordisplay" style="width: 500px;"> <div class="label">Company:</div><div><b>${Company}(${Symbol})</b></div> <div class="label">Last Price:</div> <div>${$.format(LastPrice,"N2")}</div> <div class="label">Net Change:</div><div> {{if NetChange > 0}} <b style="color:green" >${NetChange}</b> {{else}} <b style="color:red" >${NetChange}</b> {{/if}} </div> <div class="label">Last Update:</div> <div>${$.format(LastQuoteTime,"MMM dd, yyyy")}</div> </div> </script> There are also parsing methods that can parse dates and numbers from strings into numbers easily: alert($.parseDate("25.10.2010")); alert($.parseInt("12.222")); // de-DE uses . for thousands separators As you can see culture specific options are taken into account when parsing. The globalization plugin provides rich support for a variety of locales: Get a list of all available cultures Query cultures for culture items (like currency symbol, separators etc.) Localized string names for all calendar related items (days of week, months) Generated off of .NET’s supported locales In short you get much of the same functionality that you already might be using in .NET on the server side. The plugin includes a huge number of locales and an Globalization.all.min.js file that contains the text defaults for each of these locales as well as small locale specific script files that define each of the locale specific settings. It’s highly recommended that you NOT use the huge globalization file that includes all locales, but rather add script references to only those languages you explicitly care about. Overall this plug-in is a welcome helper. Even if you use it with a single locale (like en-US) and do no other localization, you’ll gain solid support for number and date formatting which is a vital feature of many applications. Changes for Microsoft It’s good to see Microsoft coming out of its shell and away from the ‘not-built-here’ mentality that has been so pervasive in the past. It’s especially good to see it applied to jQuery – a technology that has stood in drastic contrast to Microsoft’s own internal efforts in terms of design, usage model and… popularity. It’s great to see that Microsoft is paying attention to what customers prefer to use and supporting the customer sentiment – even if it meant drastically changing course of policy and moving into a more open and sharing environment in the process. The additional jQuery support that has been introduced in the last two years certainly has made lives easier for many developers on the ASP.NET platform. It’s also nice to see Microsoft submitting proposals through the standard jQuery process of plug-ins and getting accepted for various very useful projects. Certainly the jQuery Templates plug-in is going to be very useful to many especially since it will be baked into the jQuery core in jQuery 1.5. I hope we see more of this type of involvement from Microsoft in the future. Kudos!© Rick Strahl, West Wind Technologies, 2005-2010Posted in jQuery ASP.NET

Read the article
Rendering ASP.NET Script References into the Html Header

- by Rick Strahl

One thing that I’ve come to appreciate in control development in ASP.NET that use JavaScript is the ability to have more control over script and script include placement than ASP.NET provides natively. Specifically in ASP.NET you can use either the ClientScriptManager or ScriptManager to embed scripts and script references into pages via code. This works reasonably well, but the script references that get generated are generated into the HTML body and there’s very little operational control for placement of scripts. If you have multiple controls or several of the same control that need to place the same scripts onto the page it’s not difficult to end up with scripts that render in the wrong order and stop working correctly. This is especially critical if you load script libraries with dependencies either via resources or even if you are rendering referenced to CDN resources. Natively ASP.NET provides a host of methods that help embedding scripts into the page via either Page.ClientScript or the ASP.NET ScriptManager control (both with slightly different syntax): RegisterClientScriptBlock Renders a script block at the top of the HTML body and should be used for embedding callable functions/classes. RegisterStartupScript Renders a script block just prior to the </form> tag and should be used to for embedding code that should execute when the page is first loaded. Not recommended – use jQuery.ready() or equivalent load time routines. RegisterClientScriptInclude Embeds a reference to a script from a url into the page. RegisterClientScriptResource Embeds a reference to a Script from a resource file generating a long resource file string All 4 of these methods render their <script> tags into the HTML body. The script blocks give you a little bit of control by having a ‘top’ and ‘bottom’ of the document location which gives you some flexibility over script placement and precedence. Script includes and resource url unfortunately do not even get that much control – references are simply rendered into the page in the order of declaration. The ASP.NET ScriptManager control facilitates this task a little bit with the abililty to specify scripts in code and the ability to programmatically check what scripts have already been registered, but it doesn’t provide any more control over the script rendering process itself. Further the ScriptManager is a bear to deal with generically because generic code has to always check and see if it is actually present. Some time ago I posted a ClientScriptProxy class that helps with managing the latter process of sending script references either to ClientScript or ScriptManager if it’s available. Since I last posted about this there have been a number of improvements in this API, one of which is the ability to control placement of scripts and script includes in the page which I think is rather important and a missing feature in the ASP.NET native functionality. Handling ScriptRenderModes One of the big enhancements that I’ve come to rely on is the ability of the various script rendering functions described above to support rendering in multiple locations: /// <summary> /// Determines how scripts are included into the page /// </summary> public enum ScriptRenderModes { /// <summary> /// Inherits the setting from the control or from the ClientScript.DefaultScriptRenderMode /// </summary> Inherit, /// Renders the script include at the location of the control /// </summary> Inline, /// <summary> /// Renders the script include into the bottom of the header of the page /// </summary> Header, /// <summary> /// Renders the script include into the top of the header of the page /// </summary> HeaderTop, /// <summary> /// Uses ClientScript or ScriptManager to embed the script include to /// provide standard ASP.NET style rendering in the HTML body. /// </summary> Script, /// <summary> /// Renders script at the bottom of the page before the last Page.Controls /// literal control. Note this may result in unexpected behavior /// if /body and /html are not the last thing in the markup page. /// </summary> BottomOfPage } This enum is then applied to the various Register functions to allow more control over where scripts actually show up. Why is this useful? For me I often render scripts out of control resources and these scripts often include things like a JavaScript Library (jquery) and a few plug-ins. The order in which these can be loaded is critical so that jQuery.js always loads before any plug-in for example. Typically I end up with a general script layout like this: Core Libraries- HeaderTop Plug-ins: Header ScriptBlocks: Header or Script depending on other dependencies There’s also an option to render scripts and CSS at the very bottom of the page before the last Page control on the page which can be useful for speeding up page load when lots of scripts are loaded. The API syntax of the ClientScriptProxy methods is closely compatible with ScriptManager’s using static methods and control references to gain access to the page and embedding scripts. For example, to render some script into the current page in the header: // Create script block in header ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "hello_function", "function helloWorld() { alert('hello'); }", true, ScriptRenderModes.Header); // Same again - shouldn't be rendered because it's the same id ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "hello_function", "function helloWorld() { alert('hello'); }", true, ScriptRenderModes.Header); // Create a second script block in header ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "hello_function2", "function helloWorld2() { alert('hello2'); }", true, ScriptRenderModes.Header); // This just calls ClientScript and renders into bottom of document ClientScriptProxy.Current.RegisterStartupScript(this,typeof(ControlResources), "call_hello", "helloWorld();helloWorld2();", true); which generates: <html xmlns="http://www.w3.org/1999/xhtml" > <head><title> </title> <script type="text/javascript"> function helloWorld() { alert('hello'); } </script> <script type="text/javascript"> function helloWorld2() { alert('hello2'); } </script> </head> <body> … <script type="text/javascript"> //<![CDATA[ helloWorld();helloWorld2();//]]> </script> </form> </body> </html> Note that the scripts are generated into the header rather than the body except for the last script block which is the call to RegisterStartupScript. In general I wouldn’t recommend using RegisterStartupScript – ever. It’s a much better practice to use a script base load event to handle ‘startup’ code that should fire when the page first loads. So instead of the code above I’d actually recommend doing: ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "call_hello", "$().ready( function() { alert('hello2'); });", true, ScriptRenderModes.Header); assuming you’re using jQuery on the page. For script includes from a Url the following demonstrates how to embed scripts into the header. This example injects a jQuery and jQuery.UI script reference from the Google CDN then checks each with a script block to ensure that it has loaded and if not loads it from a server local location: // load jquery from CDN ClientScriptProxy.Current.RegisterClientScriptInclude(this, typeof(ControlResources), "http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js", ScriptRenderModes.HeaderTop); // check if jquery loaded - if it didn't we're not online string scriptCheck = @"if (typeof jQuery != 'object') document.write(unescape(""%3Cscript src='{0}' type='text/javascript'%3E%3C/script%3E""));"; string jQueryUrl = ClientScriptProxy.Current.GetWebResourceUrl(this, typeof(ControlResources), ControlResources.JQUERY_SCRIPT_RESOURCE); ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "jquery_register", string.Format(scriptCheck,jQueryUrl),true, ScriptRenderModes.HeaderTop); // Load jquery-ui from cdn ClientScriptProxy.Current.RegisterClientScriptInclude(this, typeof(ControlResources), "http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/jquery-ui.min.js", ScriptRenderModes.Header); // check if we need to load from local string jQueryUiUrl = ResolveUrl("~/scripts/jquery-ui-custom.min.js"); ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "jqueryui_register", string.Format(scriptCheck, jQueryUiUrl), true, ScriptRenderModes.Header); // Create script block in header ClientScriptProxy.Current.RegisterClientScriptBlock(this, typeof(ControlResources), "hello_function", "$().ready( function() { alert('hello'); });", true, ScriptRenderModes.Header); which in turn generates this HTML: <html xmlns="http://www.w3.org/1999/xhtml" > <head> <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js" type="text/javascript"></script> <script type="text/javascript"> if (typeof jQuery != 'object') document.write(unescape("%3Cscript src='/WestWindWebToolkitWeb/WebResource.axd?d=DIykvYhJ_oXCr-TA_dr35i4AayJoV1mgnQAQGPaZsoPM2LCdvoD3cIsRRitHKlKJfV5K_jQvylK7tsqO3lQIFw2&t=633979863959332352' type='text/javascript'%3E%3C/script%3E")); </script> <title> </title> <script src="http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/jquery-ui.min.js" type="text/javascript"></script> <script type="text/javascript"> if (typeof jQuery != 'object') document.write(unescape("%3Cscript src='/WestWindWebToolkitWeb/scripts/jquery-ui-custom.min.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> $().ready(function() { alert('hello'); }); </script> </head> <body> …</body> </html> As you can see there’s a bit more control in this process as you can inject both script includes and script blocks into the document at the top or bottom of the header, plus if necessary at the usual body locations. This is quite useful especially if you create custom server controls that interoperate with script and have certain dependencies. The above is a good example of a useful switchable routine where you can switch where scripts load from by default – the above pulls from Google CDN but a configuration switch may automatically switch to pull from the local development copies if your doing development for example. How does it work? As mentioned the ClientScriptProxy object mimicks many of the ScriptManager script related methods and so provides close API compatibility with it although it contains many additional overloads that enhance functionality. It does however work against ScriptManager if it’s available on the page, or Page.ClientScript if it’s not so it provides a single unified frontend to script access. There are however many overloads of the original SM methods like the above to provide additional functionality. The implementation of script header rendering is pretty straight forward – as long as a server header (ie. it has to have runat=”server” set) is available. Otherwise these routines fall back to using the default document level insertions of ScriptManager/ClientScript. Given that there is a server header it’s relatively easy to generate the script tags and code and append them to the header either at the top or bottom. I suspect Microsoft didn’t provide header rendering functionality precisely because a runat=”server” header is not required by ASP.NET so behavior would be slightly unpredictable. That’s not really a problem for a custom implementation however. Here’s the RegisterClientScriptBlock implementation that takes a ScriptRenderModes parameter to allow header rendering: /// <summary> /// Renders client script block with the option of rendering the script block in /// the Html header /// /// For this to work Header must be defined as runat="server" /// </summary> /// <param name="control">any control that instance typically page</param> /// <param name="type">Type that identifies this rendering</param> /// <param name="key">unique script block id</param> /// <param name="script">The script code to render</param> /// <param name="addScriptTags">Ignored for header rendering used for all other insertions</param> /// <param name="renderMode">Where the block is rendered</param> public void RegisterClientScriptBlock(Control control, Type type, string key, string script, bool addScriptTags, ScriptRenderModes renderMode) { if (renderMode == ScriptRenderModes.Inherit) renderMode = DefaultScriptRenderMode; if (control.Page.Header == null || renderMode != ScriptRenderModes.HeaderTop && renderMode != ScriptRenderModes.Header && renderMode != ScriptRenderModes.BottomOfPage) { RegisterClientScriptBlock(control, type, key, script, addScriptTags); return; } // No dupes - ref script include only once const string identifier = "scriptblock_"; if (HttpContext.Current.Items.Contains(identifier + key)) return; HttpContext.Current.Items.Add(identifier + key, string.Empty); StringBuilder sb = new StringBuilder(); // Embed in header sb.AppendLine("\r\n<script type=\"text/javascript\">"); sb.AppendLine(script); sb.AppendLine("</script>"); int? index = HttpContext.Current.Items["__ScriptResourceIndex"] as int?; if (index == null) index = 0; if (renderMode == ScriptRenderModes.HeaderTop) { control.Page.Header.Controls.AddAt(index.Value, new LiteralControl(sb.ToString())); index++; } else if(renderMode == ScriptRenderModes.Header) control.Page.Header.Controls.Add(new LiteralControl(sb.ToString())); else if (renderMode == ScriptRenderModes.BottomOfPage) control.Page.Controls.AddAt(control.Page.Controls.Count-1,new LiteralControl(sb.ToString())); HttpContext.Current.Items["__ScriptResourceIndex"] = index; } Note that the routine has to keep track of items inserted by id so that if the same item is added again with the same key it won’t generate two script entries. Additionally the code has to keep track of how many insertions have been made at the top of the document so that entries are added in the proper order. The RegisterScriptInclude method is similar but there’s some additional logic in here to deal with script file references and ClientScriptProxy’s (optional) custom resource handler that provides script compression /// <summary> /// Registers a client script reference into the page with the option to specify /// the script location in the page /// </summary> /// <param name="control">Any control instance - typically page</param> /// <param name="type">Type that acts as qualifier (uniqueness)</param> /// <param name="url">the Url to the script resource</param> /// <param name="ScriptRenderModes">Determines where the script is rendered</param> public void RegisterClientScriptInclude(Control control, Type type, string url, ScriptRenderModes renderMode) { const string STR_ScriptResourceIndex = "__ScriptResourceIndex"; if (string.IsNullOrEmpty(url)) return; if (renderMode == ScriptRenderModes.Inherit) renderMode = DefaultScriptRenderMode; // Extract just the script filename string fileId = null; // Check resource IDs and try to match to mapped file resources // Used to allow scripts not to be loaded more than once whether // embedded manually (script tag) or via resources with ClientScriptProxy if (url.Contains(".axd?r=")) { string res = HttpUtility.UrlDecode( StringUtils.ExtractString(url, "?r=", "&", false, true) ); foreach (ScriptResourceAlias item in ScriptResourceAliases) { if (item.Resource == res) { fileId = item.Alias + ".js"; break; } } if (fileId == null) fileId = url.ToLower(); } else fileId = Path.GetFileName(url).ToLower(); // No dupes - ref script include only once const string identifier = "script_"; if (HttpContext.Current.Items.Contains( identifier + fileId ) ) return; HttpContext.Current.Items.Add(identifier + fileId, string.Empty); // just use script manager or ClientScriptManager if (control.Page.Header == null || renderMode == ScriptRenderModes.Script || renderMode == ScriptRenderModes.Inline) { RegisterClientScriptInclude(control, type,url, url); return; } // Retrieve script index in header int? index = HttpContext.Current.Items[STR_ScriptResourceIndex] as int?; if (index == null) index = 0; StringBuilder sb = new StringBuilder(256); url = WebUtils.ResolveUrl(url); // Embed in header sb.AppendLine("\r\n<script src=\"" + url + "\" type=\"text/javascript\"></script>"); if (renderMode == ScriptRenderModes.HeaderTop) { control.Page.Header.Controls.AddAt(index.Value, new LiteralControl(sb.ToString())); index++; } else if (renderMode == ScriptRenderModes.Header) control.Page.Header.Controls.Add(new LiteralControl(sb.ToString())); else if (renderMode == ScriptRenderModes.BottomOfPage) control.Page.Controls.AddAt(control.Page.Controls.Count-1, new LiteralControl(sb.ToString())); HttpContext.Current.Items[STR_ScriptResourceIndex] = index; } There’s a little more code here that deals with cleaning up the passed in Url and also some custom handling of script resources that run through the ScriptCompressionModule – any script resources loaded in this fashion are automatically cached based on the resource id. Raw urls extract just the filename from the URL and cache based on that. All of this to avoid doubling up of scripts if called multiple times by multiple instances of the same control for example or several controls that all load the same resources/includes. Finally RegisterClientScriptResource utilizes the previous method to wrap the WebResourceUrl as well as some custom functionality for the resource compression module: /// <summary> /// Returns a WebResource or ScriptResource URL for script resources that are to be /// embedded as script includes. /// </summary> /// <param name="control">Any control</param> /// <param name="type">A type in assembly where resources are located</param> /// <param name="resourceName">Name of the resource to load</param> /// <param name="renderMode">Determines where in the document the link is rendered</param> public void RegisterClientScriptResource(Control control, Type type, string resourceName, ScriptRenderModes renderMode) { string resourceUrl = GetClientScriptResourceUrl(control, type, resourceName); RegisterClientScriptInclude(control, type, resourceUrl, renderMode); } /// <summary> /// Works like GetWebResourceUrl but can be used with javascript resources /// to allow using of resource compression (if the module is loaded). /// </summary> /// <param name="control"></param> /// <param name="type"></param> /// <param name="resourceName"></param> /// <returns></returns> public string GetClientScriptResourceUrl(Control control, Type type, string resourceName) { #if IncludeScriptCompressionModuleSupport // If wwScriptCompression Module through Web.config is loaded use it to compress // script resources by using wcSC.axd Url the module intercepts if (ScriptCompressionModule.ScriptCompressionModuleActive) { string url = "~/wwSC.axd?r=" + HttpUtility.UrlEncode(resourceName); if (type.Assembly != GetType().Assembly) url += "&t=" + HttpUtility.UrlEncode(type.FullName); return WebUtils.ResolveUrl(url); } #endif return control.Page.ClientScript.GetWebResourceUrl(type, resourceName); } This code merely retrieves the resource URL and then simply calls back to RegisterClientScriptInclude with the URL to be embedded which means there’s nothing specific to deal with other than the custom compression module logic which is nice and easy. What else is there in ClientScriptProxy? ClientscriptProxy also provides a few other useful services beyond what I’ve already covered here: Transparent ScriptManager and ClientScript calls ClientScriptProxy includes a host of routines that help figure out whether a script manager is available or not and all functions in this class call the appropriate object – ScriptManager or ClientScript – that is available in the current page to ensure that scripts get embedded into pages properly. This is especially useful for control development where controls have no control over the scripting environment in place on the page. RegisterCssLink and RegisterCssResource Much like the script embedding functions these two methods allow embedding of CSS links. CSS links are appended to the header or to a form declared with runat=”server”. LoadControlScript Is a high level resource loading routine that can be used to easily switch between different script linking modes. It supports loading from a WebResource, a url or not loading anything at all. This is very useful if you build controls that deal with specification of resource urls/ids in a standard way. Check out the full Code You can check out the full code to the ClientScriptProxyClass here: ClientScriptProxy.cs ClientScriptProxy Documentation (class reference) Note that the ClientScriptProxy has a few dependencies in the West Wind Web Toolkit of which it is part of. ControlResources holds a few standard constants and script resource links and the ScriptCompressionModule which is referenced in a few of the script inclusion methods. There’s also another useful ScriptContainer companion control to the ClientScriptProxy that allows scripts to be placed onto the page’s markup including the ability to specify the script location and script minification options. You can find all the dependencies in the West Wind Web Toolkit repository: West Wind Web Toolkit Repository West Wind Web Toolkit Home Page© Rick Strahl, West Wind Technologies, 2005-2010Posted in ASP.NET JavaScript

Read the article
How to find and fix performance problems in ORM powered applications

- by FransBouma

Once in a while we get requests about how to fix performance problems with our framework. As it comes down to following the same steps and looking into the same things every single time, I decided to write a blogpost about it instead, so more people can learn from this and solve performance problems in their O/R mapper powered applications. In some parts it's focused on LLBLGen Pro but it's also usable for other O/R mapping frameworks, as the vast majority of performance problems in O/R mapper powered applications are not specific for a certain O/R mapper framework. Too often, the developer looks at the wrong part of the application, trying to fix what isn't a problem in that part, and getting frustrated that 'things are so slow with <insert your favorite framework X here>'. I'm in the O/R mapper business for a long time now (almost 10 years, full time) and as it's a small world, we O/R mapper developers know almost all tricks to pull off by now: we all know what to do to make task ABC faster and what compromises (because there are almost always compromises) to deal with if we decide to make ABC faster that way. Some O/R mapper frameworks are faster in X, others in Y, but you can be sure the difference is mainly a result of a compromise some developers are willing to deal with and others aren't. That's why the O/R mapper frameworks on the market today are different in many ways, even though they all fetch and save entities from and to a database. I'm not suggesting there's no room for improvement in today's O/R mapper frameworks, there always is, but it's not a matter of 'the slowness of the application is caused by the O/R mapper' anymore. Perhaps query generation can be optimized a bit here, row materialization can be optimized a bit there, but it's mainly coming down to milliseconds. Still worth it if you're a framework developer, but it's not much compared to the time spend inside databases and in user code: if a complete fetch takes 40ms or 50ms (from call to entity object collection), it won't make a difference for your application as that 10ms difference won't be noticed. That's why it's very important to find the real locations of the problems so developers can fix them properly and don't get frustrated because their quest to get a fast, performing application failed. Performance tuning basics and rules Finding and fixing performance problems in any application is a strict procedure with four prescribed steps: isolate, analyze, interpret and fix, in that order. It's key that you don't skip a step nor make assumptions: these steps help you find the reason of a problem which seems to be there, and how to fix it or leave it as-is. Skipping a step, or when you assume things will be bad/slow without doing analysis will lead to the path of premature optimization and won't actually solve your problems, only create new ones. The most important rule of finding and fixing performance problems in software is that you have to understand what 'performance problem' actually means. Most developers will say "when a piece of software / code is slow, you have a performance problem". But is that actually the case? If I write a Linq query which will aggregate, group and sort 5 million rows from several tables to produce a resultset of 10 rows, it might take more than a couple of milliseconds before that resultset is ready to be consumed by other logic. If I solely look at the Linq query, the code consuming the resultset of the 10 rows and then look at the time it takes to complete the whole procedure, it will appear to me to be slow: all that time taken to produce and consume 10 rows? But if you look closer, if you analyze and interpret the situation, you'll see it does a tremendous amount of work, and in that light it might even be extremely fast. With every performance problem you encounter, always do realize that what you're trying to solve is perhaps not a technical problem at all, but a perception problem. The second most important rule you have to understand is based on the old saying "Penny wise, Pound Foolish": the part which takes e.g. 5% of the total time T for a given task isn't worth optimizing if you have another part which takes a much larger part of the total time T for that same given task. Optimizing parts which are relatively insignificant for the total time taken is not going to bring you better results overall, even if you totally optimize that part away. This is the core reason why analysis of the complete set of application parts which participate in a given task is key to being successful in solving performance problems: No analysis -> no problem -> no solution. One warning up front: hunting for performance will always include making compromises. Fast software can be made maintainable, but if you want to squeeze as much performance out of your software, you will inevitably be faced with the dilemma of compromising one or more from the group {readability, maintainability, features} for the extra performance you think you'll gain. It's then up to you to decide whether it's worth it. In almost all cases it's not. The reason for this is simple: the vast majority of performance problems can be solved by implementing the proper algorithms, the ones with proven Big O-characteristics so you know the performance you'll get plus you know the algorithm will work. The time taken by the algorithm implementing code is inevitable: you already implemented the best algorithm. You might find some optimizations on the technical level but in general these are minor. Let's look at the four steps to see how they guide us through the quest to find and fix performance problems. Isolate The first thing you need to do is to isolate the areas in your application which are assumed to be slow. For example, if your application is a web application and a given page is taking several seconds or even minutes to load, it's a good candidate to check out. It's important to start with the isolate step because it allows you to focus on a single code path per area with a clear begin and end and ignore the rest. The rest of the steps are taken per identified problematic area. Keep in mind that isolation focuses on tasks in an application, not code snippets. A task is something that's started in your application by either another task or the user, or another program, and has a beginning and an end. You can see a task as a piece of functionality offered by your application. Analyze Once you've determined the problem areas, you have to perform analysis on the code paths of each area, to see where the performance problems occur and which areas are not the problem. This is a multi-layered effort: an application which uses an O/R mapper typically consists of multiple parts: there's likely some kind of interface (web, webservice, windows etc.), a part which controls the interface and business logic, the O/R mapper part and the RDBMS, all connected with either a network or inter-process connections provided by the OS or other means. Each of these parts, including the connectivity plumbing, eat up a part of the total time it takes to complete a task, e.g. load a webpage with all orders of a given customer X. To understand which parts participate in the task / area we're investigating and how much they contribute to the total time taken to complete the task, analysis of each participating task is essential. Start with the code you wrote which starts the task, analyze the code and track the path it follows through your application. What does the code do along the way, verify whether it's correct or not. Analyze whether you have implemented the right algorithms in your code for this particular area. Remember we're looking at one area at a time, which means we're ignoring all other code paths, just the code path of the current problematic area, from begin to end and back. Don't dig in and start optimizing at the code level just yet. We're just analyzing. If your analysis reveals big architectural stupidity, it's perhaps a good idea to rethink the architecture at this point. For the rest, we're analyzing which means we collect data about what could be wrong, for each participating part of the complete application. Reviewing the code you wrote is a good tool to get deeper understanding of what is going on for a given task but ultimately it lacks precision and overview what really happens: humans aren't good code interpreters, computers are. We therefore need to utilize tools to get deeper understanding about which parts contribute how much time to the total task, triggered by which other parts and for example how many times are they called. There are two different kind of tools which are necessary: .NET profilers and O/R mapper / RDBMS profilers. .NET profiling .NET profilers (e.g. dotTrace by JetBrains or Ants by Red Gate software) show exactly which pieces of code are called, how many times they're called, and the time it took to run that piece of code, at the method level and sometimes even at the line level. The .NET profilers are essential tools for understanding whether the time taken to complete a given task / area in your application is consumed by .NET code, where exactly in your code, the path to that code, how many times that code was called by other code and thus reveals where hotspots are located: the areas where a solution can be found. Importantly, they also reveal which areas can be left alone: remember our penny wise pound foolish saying: if a profiler reveals that a group of methods are fast, or don't contribute much to the total time taken for a given task, ignore them. Even if the code in them is perhaps complex and looks like a candidate for optimization: you can work all day on that, it won't matter. As we're focusing on a single area of the application, it's best to start profiling right before you actually activate the task/area. Most .NET profilers support this by starting the application without starting the profiling procedure just yet. You navigate to the particular part which is slow, start profiling in the profiler, in your application you perform the actions which are considered slow, and afterwards you get a snapshot in the profiler. The snapshot contains the data collected by the profiler during the slow action, so most data is produced by code in the area to investigate. This is important, because it allows you to stay focused on a single area. O/R mapper and RDBMS profiling .NET profilers give you a good insight in the .NET side of things, but not in the RDBMS side of the application. As this article is about O/R mapper powered applications, we're also looking at databases, and the software making it possible to consume the database in your application: the O/R mapper. To understand which parts of the O/R mapper and database participate how much to the total time taken for task T, we need different tools. There are two kind of tools focusing on O/R mappers and database performance profiling: O/R mapper profilers and RDBMS profilers. For O/R mapper profilers, you can look at LLBLGen Prof by hibernating rhinos or the Linq to Sql/LLBLGen Pro profiler by Huagati. Hibernating rhinos also have profilers for other O/R mappers like NHibernate (NHProf) and Entity Framework (EFProf) and work the same as LLBLGen Prof. For RDBMS profilers, you have to look whether the RDBMS vendor has a profiler. For example for SQL Server, the profiler is shipped with SQL Server, for Oracle it's build into the RDBMS, however there are also 3rd party tools. Which tool you're using isn't really important, what's important is that you get insight in which queries are executed during the task / area we're currently focused on and how long they took. Here, the O/R mapper profilers have an advantage as they collect the time it took to execute the query from the application's perspective so they also collect the time it took to transport data across the network. This is important because a query which returns a massive resultset or a resultset with large blob/clob/ntext/image fields takes more time to get transported across the network than a small resultset and a database profiler doesn't take this into account most of the time. Another tool to use in this case, which is more low level and not all O/R mappers support it (though LLBLGen Pro and NHibernate as well do) is tracing: most O/R mappers offer some form of tracing or logging system which you can use to collect the SQL generated and executed and often also other activity behind the scenes. While tracing can produce a tremendous amount of data in some cases, it also gives insight in what's going on. Interpret After we've completed the analysis step it's time to look at the data we've collected. We've done code reviews to see whether we've done anything stupid and which parts actually take place and if the proper algorithms have been implemented. We've done .NET profiling to see which parts are choke points and how much time they contribute to the total time taken to complete the task we're investigating. We've performed O/R mapper profiling and RDBMS profiling to see which queries were executed during the task, how many queries were generated and executed and how long they took to complete, including network transportation. All this data reveals two things: which parts are big contributors to the total time taken and which parts are irrelevant. Both aspects are very important. The parts which are irrelevant (i.e. don't contribute significantly to the total time taken) can be ignored from now on, we won't look at them. The parts which contribute a lot to the total time taken are important to look at. We now have to first look at the .NET profiler results, to see whether the time taken is consumed in our own code, in .NET framework code, in the O/R mapper itself or somewhere else. For example if most of the time is consumed by DbCommand.ExecuteReader, the time it took to complete the task is depending on the time the data is fetched from the database. If there was just 1 query executed, according to tracing or O/R mapper profilers / RDBMS profilers, check whether that query is optimal, uses indexes or has to deal with a lot of data. Interpret means that you follow the path from begin to end through the data collected and determine where, along the path, the most time is contributed. It also means that you have to check whether this was expected or is totally unexpected. My previous example of the 10 row resultset of a query which groups millions of rows will likely reveal that a long time is spend inside the database and almost no time is spend in the .NET code, meaning the RDBMS part contributes the most to the total time taken, the rest is compared to that time, irrelevant. Considering the vastness of the source data set, it's expected this will take some time. However, does it need tweaking? Perhaps all possible tweaks are already in place. In the interpret step you then have to decide that further action in this area is necessary or not, based on what the analysis results show: if the analysis results were unexpected and in the area where the most time is contributed to the total time taken is room for improvement, action should be taken. If not, you can only accept the situation and move on. In all cases, document your decision together with the analysis you've done. If you decide that the perceived performance problem is actually expected due to the nature of the task performed, it's essential that in the future when someone else looks at the application and starts asking questions you can answer them properly and new analysis is only necessary if situations changed. Fix After interpreting the analysis results you've concluded that some areas need adjustment. This is the fix step: you're actively correcting the performance problem with proper action targeted at the real cause. In many cases related to O/R mapper powered applications it means you'll use different features of the O/R mapper to achieve the same goal, or apply optimizations at the RDBMS level. It could also mean you apply caching inside your application (compromise memory consumption over performance) to avoid unnecessary re-querying data and re-consuming the results. After applying a change, it's key you re-do the analysis and interpretation steps: compare the results and expectations with what you had before, to see whether your actions had any effect or whether it moved the problem to a different part of the application. Don't fall into the trap to do partly analysis: do the full analysis again: .NET profiling and O/R mapper / RDBMS profiling. It might very well be that the changes you've made make one part faster but another part significantly slower, in such a way that the overall problem hasn't changed at all. Performance tuning is dealing with compromises and making choices: to use one feature over the other, to accept a higher memory footprint, to go away from the strict-OO path and execute queries directly onto the RDBMS, these are choices and compromises which will cross your path if you want to fix performance problems with respect to O/R mappers or data-access and databases in general. In most cases it's not a big issue: alternatives are often good choices too and the compromises aren't that hard to deal with. What is important is that you document why you made a choice, a compromise: which analysis data, which interpretation led you to the choice made. This is key for good maintainability in the years to come. Most common performance problems with O/R mappers Below is an incomplete list of common performance problems related to data-access / O/R mappers / RDBMS code. It will help you with fixing the hotspots you found in the interpretation step. SELECT N+1: (Lazy-loading specific). Lazy loading triggered performance bottlenecks. Consider a list of Orders bound to a grid. You have a Field mapped onto a related field in Order, Customer.CompanyName. Showing this column in the grid will make the grid fetch (indirectly) for each row the Customer row. This means you'll get for the single list not 1 query (for the orders) but 1+(the number of orders shown) queries. To solve this: use eager loading using a prefetch path to fetch the customers with the orders. SELECT N+1 is easy to spot with an O/R mapper profiler or RDBMS profiler: if you see a lot of identical queries executed at once, you have this problem. Prefetch paths using many path nodes or sorting, or limiting. Eager loading problem. Prefetch paths can help with performance, but as 1 query is fetched per node, it can be the number of data fetched in a child node is bigger than you think. Also consider that data in every node is merged on the client within the parent. This is fast, but it also can take some time if you fetch massive amounts of entities. If you keep fetches small, you can use tuning parameters like the ParameterizedPrefetchPathThreshold setting to get more optimal queries. Deep inheritance hierarchies of type Target Per Entity/Type. If you use inheritance of type Target per Entity / Type (each type in the inheritance hierarchy is mapped onto its own table/view), fetches will join subtype- and supertype tables in many cases, which can lead to a lot of performance problems if the hierarchy has many types. With this problem, keep inheritance to a minimum if possible, or switch to a hierarchy of type Target Per Hierarchy, which means all entities in the inheritance hierarchy are mapped onto the same table/view. Of course this has its own set of drawbacks, but it's a compromise you might want to take. Fetching massive amounts of data by fetching large lists of entities. LLBLGen Pro supports paging (and limiting the # of rows returned), which is often key to process through large sets of data. Use paging on the RDBMS if possible (so a query is executed which returns only the rows in the page requested). When using paging in a web application, be sure that you switch server-side paging on on the datasourcecontrol used. In this case, paging on the grid alone is not enough: this can lead to fetching a lot of data which is then loaded into the grid and paged there. Keep note that analyzing queries for paging could lead to the false assumption that paging doesn't occur, e.g. when the query contains a field of type ntext/image/clob/blob and DISTINCT can't be applied while it should have (e.g. due to a join): the datareader will do DISTINCT filtering on the client. this is a little slower but it does perform paging functionality on the data-reader so it won't fetch all rows even if the query suggests it does. Fetch massive amounts of data because blob/clob/ntext/image fields aren't excluded. LLBLGen Pro supports field exclusion for queries. You can exclude fields (also in prefetch paths) per query to avoid fetching all fields of an entity, e.g. when you don't need them for the logic consuming the resultset. Excluding fields can greatly reduce the amount of time spend on data-transport across the network. Use this optimization if you see that there's a big difference between query execution time on the RDBMS and the time reported by the .NET profiler for the ExecuteReader method call. Doing client-side aggregates/scalar calculations by consuming a lot of data. If possible, try to formulate a scalar query or group by query using the projection system or GetScalar functionality of LLBLGen Pro to do data consumption on the RDBMS server. It's far more efficient to process data on the RDBMS server than to first load it all in memory, then traverse the data in-memory to calculate a value. Using .ToList() constructs inside linq queries. It might be you use .ToList() somewhere in a Linq query which makes the query be run partially in-memory. Example: var q = from c in metaData.Customers.ToList() where c.Country=="Norway" select c; This will actually fetch all customers in-memory and do an in-memory filtering, as the linq query is defined on an IEnumerable<T>, and not on the IQueryable<T>. Linq is nice, but it can often be a bit unclear where some parts of a Linq query might run. Fetching all entities to delete into memory first. To delete a set of entities it's rather inefficient to first fetch them all into memory and then delete them one by one. It's more efficient to execute a DELETE FROM ... WHERE query on the database directly to delete the entities in one go. LLBLGen Pro supports this feature, and so do some other O/R mappers. It's not always possible to do this operation in the context of an O/R mapper however: if an O/R mapper relies on a cache, these kind of operations are likely not supported because they make it impossible to track whether an entity is actually removed from the DB and thus can be removed from the cache. Fetching all entities to update with an expression into memory first. Similar to the previous point: it is more efficient to update a set of entities directly with a single UPDATE query using an expression instead of fetching the entities into memory first and then updating the entities in a loop, and afterwards saving them. It might however be a compromise you don't want to take as it is working around the idea of having an object graph in memory which is manipulated and instead makes the code fully aware there's a RDBMS somewhere. Conclusion Performance tuning is almost always about compromises and making choices. It's also about knowing where to look and how the systems in play behave and should behave. The four steps I provided should help you stay focused on the real problem and lead you towards the solution. Knowing how to optimally use the systems participating in your own code (.NET framework, O/R mapper, RDBMS, network/services) is key for success as well as knowing what's going on inside the application you built. I hope you'll find this guide useful in tracking down performance problems and dealing with them in a useful way.

Read the article
Red Gate Coder interviews: Robin Hellen

- by Michael Williamson

Robin Hellen is a test engineer here at Red Gate, and is also the latest coder I’ve interviewed. We chatted about debugging code, the roles of software engineers and testers, and why Vala is currently his favourite programming language. How did you get started with programming?It started when I was about six. My dad’s a professional programmer, and he gave me and my sister one of his old computers and taught us a bit about programming. It was an old Amiga 500 with a variant of BASIC. I don’t think I ever successfully completed anything! It was just faffing around. I didn’t really get anywhere with it.But then presumably you did get somewhere with it at some point.At some point. The PC emerged as the dominant platform, and I learnt a bit of Visual Basic. I didn’t really do much, just a couple of quick hacky things. A bit of demo animation. Took me a long time to get anywhere with programming, really.When did you feel like you did start to get somewhere?I think it was when I started doing things for someone else, which was my sister’s final year of university project. She called up my dad two days before she was due to submit, saying “We need something to display a graph!”. Dad says, “I’m too busy, go talk to your brother”. So I hacked up this ugly piece of code, sent it off and they won a prize for that project. Apparently, the graph, the bit that I wrote, was the reason they won a prize! That was when I first felt that I’d actually done something that was worthwhile. That was my first real bit of code, and the ugliest code I’ve ever written. It’s basically an array of pre-drawn line elements that I shifted round the screen to draw a very spikey graph.When did you decide that programming might actually be something that you wanted to do as a career?It’s not really a decision I took, I always wanted to do something with computers. And I had to take a gap year for uni, so I was looking for twelve month internships. I applied to Red Gate, and they gave me a job as a tester. And that’s where I really started having to write code well. To a better standard that I had been up to that point.How did you find coming to Red Gate and working with other coders?I thought it was really nice. I learnt so much just from other people around. I think one of the things that’s really great is that people are just willing to help you learn. Instead of “Don’t you know that, you’re so stupid”, it’s “You can just do it this way”.If you could go back to the very start of that internship, is there something that you would tell yourself?Write shorter code. I have a tendency to write massive, many-thousand line files that I break out of right at the end. And then half-way through a project I’m doing something, I think “Where did I write that bit that does that thing?”, and it’s almost impossible to find. I wrote some horrendous code when I started. Just that principle, just keep things short. Even if looks a bit crazy to be jumping around all over the place all of the time, it’s actually a lot more understandable.And how do you hold yourself to that?Generally, if a function’s going off my screen, it’s probably too long. That’s what I tell myself, and within the team here we have code reviews, so the guys I’m with at the moment are pretty good at pulling me up on, “Doesn’t that look like it’s getting a bit long?”. It’s more just the subjective standard of readability than anything.So you’re an advocate of code review?Yes, definitely. Both to spot errors that you might have made, and to improve your knowledge. The person you’re reviewing will say “Oh, you could have done it that way”. That’s how we learn, by talking to others, and also just sharing knowledge of how your project works around the team, or even outside the team. Definitely a very firm advocate of code reviews.Do you think there’s more we could do with them?I don’t know. We’re struggling with how to add them as part of the process without it becoming too cumbersome. We’ve experimented with a few different ways, and we’ve not found anything that just works.To get more into the nitty gritty: how do you like to debug code?The first thing is to do it in my head. I’ll actually think what piece of code is likely to have caused that error, and take a quick look at it, just to see if there’s anything glaringly obvious there. The next thing I’ll probably do is throw in print statements, or throw some exceptions from various points, just to check: is it going through the code path I expect it to? A last resort is to actually debug code using a debugger.Why is the debugger the last resort?Probably because of the environments I learnt programming in. VB and early BASIC didn’t have much of a debugger, the only way to find out what your program was doing was to add print statements. Also, because a lot of the stuff I tend to work with is non-interactive, if it’s something that takes a long time to run, I can throw in the print statements, set a run off, go and do something else, and look at it again later, rather than trying to remember what happened at that point when I was debugging through it. So it also gives me the record of what happens. I hate just sitting there pressing F5, F5, continually. If you’re having to find out what your code is doing at each line, you’ve probably got a very wrong mental model of what your code’s doing, and you can find that out just as easily by inspecting a couple of values through the print statements.If I were on some codebase that you were also working on, what should I do to make it as easy as possible to understand?I’d say short and well-named methods. The one thing I like to do when I’m looking at code is to find out where a value comes from, and the more layers of indirection there are, particularly DI [dependency injection] frameworks, the harder it is to find out where something’s come from. I really hate that. I want to know if the value come from the user here or is a constant here, and if I can’t find that out, that makes code very hard to understand for me.As a tester, where do you think the split should lie between software engineers and testers?I think the split is less on areas of the code you write and more what you’re designing and creating. The developers put a structure on the code, while my major role is to say which tests we should have, whether we should test that, or it’s not worth testing that because it’s a tiny function in code that nobody’s ever actually going to see. So it’s not a split in the code, it’s a split in what you’re thinking about. Saying what code we should write, but alternatively what code we should take out.In your experience, do the software engineers tend to do much testing themselves?They tend to control the lowest layer of tests. And, depending on how the balance of people is in the team, they might write some of the higher levels of test. Or that might go to the testers. I’m the only tester on my team with three other developers, so they’ll be writing quite a lot of the actual test code, with input from me as to whether we should test that functionality, whereas on other teams, where it’s been more equal numbers, the testers have written pretty much all of the high level tests, just because that’s the best use of resource.If you could shuffle resources around however you liked, do you think that the developers should be writing those high-level tests?I think they should be writing them occasionally. It helps when they have an understanding of how testing code works and possibly what assumptions we’ve made in tests, and they can say “actually, it doesn’t work like that under the hood so you’ve missed this whole area”. It’s one of those agile things that everyone on the team should be at least comfortable doing the various jobs. So if the developers can write test code then I think that’s a very good thing.So you think testers should be able to write production code?Yes, although given most testers skills at coding, I wouldn’t advise it too much! I have written a few things, and I did make a few changes that have actually gone into our production code base. They’re not necessarily running every time but they are there. I think having that mix of skill sets is really useful. In some ways we’re using our own product to test itself, so being able to make those changes where it’s not working saves me a round-trip through the developers. It can be really annoying if the developers have no time to make a change, and I can’t touch the code.If the software engineers are consistently writing tests at all levels, what role do you think the role of a tester is?I think on a team like that, those distinctions aren’t quite so useful. There’ll be two cases. There’s either the case where the developers think they’ve written good tests, but you still need someone with a test engineer mind-set to go through the tests and validate that it’s a useful set, or the correct set for that code. Or they won’t actually be pure developers, they’ll have that mix of test ability in there.I think having slightly more distinct roles is useful. When it starts to blur, then you lose that view of the tests as a whole. The tester job is not to create tests, it’s to validate the quality of the product, and you don’t do that just by writing tests. There’s more things you’ve got to keep in your mind. And I think when you blur the roles, you start to lose that end of the tester.So because you’re working on those features, you lose that holistic view of the whole system?Yeah, and anyone who’s worked on the feature shouldn’t be testing it. You always need to have it tested it by someone who didn’t write it. Otherwise you’re a bit too close and you assume “yes, people will only use it that way”, but the tester will come along and go “how do people use this? How would our most idiotic user use this?”. I might not test that because it might be completely irrelevant. But it’s coming in and trying to have a different set of assumptions.Are you a believer that it should all be automated if possible?Not entirely. So an automated test is always better than a manual test for the long-term, but there’s still nothing that beats a human sitting in front of the application and thinking “What could I do at this point?”. The automated test is very good but they follow that strict path, and they never check anything off the path. The human tester will look at things that they weren’t expecting, whereas the automated test can only ever go “Is that value correct?” in many respects, and it won’t notice that on the other side of the screen you’re showing something completely wrong. And that value might have been checked independently, but you always find a few odd interactions when you’re going through something manually, and you always need to go through something manually to start with anyway, otherwise you won’t know where the important bits to write your automation are.When you’re doing that manual testing, do you think it’s important to do that across the entire product, or just the bits that you’ve touched recently?I think it’s important to do it mostly on the bits you’ve touched, but you can’t ignore the rest of the product. Unless you’re dealing with a very, very self-contained bit, you’re almost always encounter other bits of the product along the way. Most testers I know, even if they are looking at just one path, they’ll keep open and move around a bit anyway, just because they want to find something that’s broken. If we find that your path is right, we’ll go out and hunt something else.How do you think this fits into the idea of continuously deploying, so long as the tests pass?With deploying a website it’s a bit different because you can always pull it back. If you’re deploying an application to customers, when you’ve released it, it’s out there, you can’t pull it back. Someone’s going to keep it, no matter how hard you try there will be a few installations that stay around. So I’d always have at least a human element on that path. With websites, you could probably automate straight out, or at least straight out to an internal environment or a single server in a cloud of fifty that will serve some people. But I don’t think you should release to everyone just on automated tests passing.You’ve already mentioned using BASIC and C# — are there any other languages that you’ve used?I’ve used a few. That’s something that has changed more recently, I’ve become familiar with more languages. Before I started at Red Gate I learnt a bit of C. Then last year, I taught myself Python which I actually really enjoyed using. I’ve also come across another language called Vala, which is sort of a C#-like language. It’s basically a pre-processor for C, but it has very nice syntax. I think that’s currently my favourite language.Any particular reason for trying Vala?I have a completely Linux environment at home, and I’ve been looking for a nice language, and C# just doesn’t cut it because I won’t touch Mono. So, I was looking for something like C# but that was useable in an open source environment, and Vala’s what I found. C#’s got a few features that Vala doesn’t, and Vala’s got a few features where I think “It would be awesome if C# had that”.What are some of the features that it’s missing?Extension methods. And I think that’s the only one that really bugs me. I like to use them when I’m writing C# because it makes some things really easy, especially with libraries that you can’t touch the internals of. It doesn’t have method overloading, which is sometimes annoying.Where it does win over C#?Everything is non-nullable by default, you never have to check that something’s unexpectedly null.Also, Vala has code contracts. This is starting to come in C# 4, but the way it works in Vala is that you specify requirements in short phrases as part of your function signature and they stick to the signature, so that when you inherit it, it has exactly the same code contract as the base one, or when you inherit from an interface, you have to match the signature exactly. Just using those makes you think a bit more about how you’re writing your method, it’s not an afterthought when you’ve got contracts from base classes given to you, you can’t change it. Which I think is a lot nicer than the way C# handles it. When are those actually checked?They’re checked both at compile and run-time. The compile-time checking isn’t very strong yet, it’s quite a new feature in the compiler, and because it compiles down to C, you can write C code and interface with your methods, so you can bypass that compile-time check anyway. So there’s an extra runtime check, and if you violate one of the contracts at runtime, it’s game over for your program, there’s no exception to catch, it’s just goodbye!One thing I dislike about C# is the exceptions. You write a bit of code and fifty exceptions could come from any point in your ten lines, and you can’t mentally model how those exceptions are going to come out, and you can’t even predict them based on the functions you’re calling, because if you’ve accidentally got a derived class there instead of a base class, that can throw a completely different set of exceptions. So I’ve got no way of mentally modelling those, whereas in Vala they’re checked like Java, so you know only these exceptions can come out. You know in advance the error conditions.I think Raymond Chen on Old New Thing says “the only thing you know when you throw an exception is that you’re in an invalid state somewhere in your program, so just kill it and be done with it!”You said you’ve also learnt bits of Python. How did you find that compared to Vala and C#?Very different because of the dynamic typing. I’ve been writing a website for my own use. I’m quite into photography, so I take photos off my camera, post-process them, dump them in a file, and I get a webpage with all my thumbnails. So sort of like Picassa, but written by myself because I wanted something to learn Python with. There are some things that are really nice, I just found it really difficult to cope with the fact that I’m not quite sure what this object type that I’m passed is, I might not ever be sure, so it can randomly blow up on me. But once I train myself to ignore that and just say “well, I’m fairly sure it’s going to be something that looks like this, so I’ll use it like this”, then it’s quite nice.Any particular features that you’ve appreciated?I don’t like any particular feature, it’s just very straightforward to work with. It’s very quick to write something in, particularly as you don’t have to worry that you’ve changed something that affects a different part of the program. If you have, then that part blows up, but I can get this part working right now.If you were doing a big project, would you be willing to do it in Python rather than C# or Vala?I think I might be willing to try something bigger or long term with Python. We’re currently doing an ASP.NET MVC project on C#, and I don’t like the amount of reflection. There’s a lot of magic that pulls values out, and it’s all done under the scenes. It’s almost managed to put a dynamic type system on top of C#, which in many ways destroys the language to me, whereas if you’re already in a dynamic language, having things done dynamically is much more natural. In many ways, you get the worst of both worlds. I think for web projects, I would go with Python again, whereas for anything desktop, command-line or GUI-based, I’d probably go for C# or Vala, depending on what environment I’m in.It’s the fact that you can gain from the strong typing in ways that you can’t so much on the web app. Or, in a web app, you have to use dynamic typing at some point, or you have to write a hell of a lot of boilerplate, and I’d rather use the dynamic typing than write the boilerplate.What do you think separates great programmers from everyone else?Probably design choices. Choosing to write it a piece of code one way or another. For any given program you ask me to write, I could probably do it five thousand ways. A programmer who is capable will see four or five of them, and choose one of the better ones. The excellent programmer will see the largest proportion and manage to pick the best one very quickly without having to think too much about it. I think that’s probably what separates, is the speed at which they can see what’s the best path to write the program in. More Red Gater Coder interviews

Read the article
Advanced TSQL Tuning: Why Internals Knowledge Matters

- by Paul White

There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes. Query tuning is not complete as soon as the query returns results quickly in the development or test environments. In production, your query will compete for memory, CPU, locks, I/O and other resources on the server. Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data. In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row. The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type. A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL, CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL, CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table. The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results. This seems slow, considering that the table only has 50,000 rows. Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time. Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring. The script below monitors STATISTICS IO output and the amount of tempdb used by the test query. We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB. The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row. With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first). This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts. In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted. SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed. Sorts typically require a very large number of comparisons, so this is usually a very effective optimization. One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself. SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use. The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory. Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant. The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources. More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1). Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB. The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file. Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case. In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms. If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much. Why is the data size so much smaller? The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored. By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output. You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead. SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows. The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes. The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page. There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option. By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row. SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage. You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now. This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query). SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant. No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers. Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’. Why? Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed. SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages. This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this? Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need. 245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory. Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily? That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column. That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal. I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator. Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted. We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need. The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb. The small remaining inefficiency is in reading the id column values from the clustered primary key index. As a clustered index, it contains all the in-row data at its leaf. The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead. The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure. This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only. This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data. The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings. It isn’t about the internal differences between TEXT and MAX data types either. It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load. No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance. Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows. 5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is! If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb. Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough. Tuning for logical reads and adding covering indexes is not enough. If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on. The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008. They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator. Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

Read the article
The Incremental Architect´s Napkin – #3 – Make Evolvability inevitable

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/06/04/the-incremental-architectacutes-napkin-ndash-3-ndash-make-evolvability-inevitable.aspxThe easier something to measure the more likely it will be produced. Deviations between what is and what should be can be readily detected. That´s what automated acceptance tests are for. That´s what sprint reviews in Scrum are for. It´s no small wonder our software looks like it looks. It has all the traits whose conformance with requirements can easily be measured. And it´s lacking traits which cannot easily be measured. Evolvability (or Changeability) is such a trait. If an operation is correct, if an operation if fast enough, that can be checked very easily. But whether Evolvability is high or low, that cannot be checked by taking a measure or two. Evolvability might correlate with certain traits, e.g. number of lines of code (LOC) per function or Cyclomatic Complexity or test coverage. But there is no threshold value signalling “evolvability too low”; also Evolvability is hardly tangible for the customer. Nevertheless Evolvability is of great importance - at least in the long run. You can get away without much of it for a short time. Eventually, though, it´s needed like any other requirement. Or even more. Because without Evolvability no other requirement can be implemented. Evolvability is the foundation on which all else is build. Such fundamental importance is in stark contrast with its immeasurability. To compensate this, Evolvability must be put at the very center of software development. It must become the hub around everything else revolves. Since we cannot measure Evolvability, though, we cannot start watching it more. Instead we need to establish practices to keep it high (enough) at all times. Chefs have known that for long. That´s why everybody in a restaurant kitchen is constantly seeing after cleanliness. Hygiene is important as is to have clean tools at standardized locations. Only then the health of the patrons can be guaranteed and production efficiency is constantly high. Still a kitchen´s level of cleanliness is easier to measure than software Evolvability. That´s why important practices like reviews, pair programming, or TDD are not enough, I guess. What we need to keep Evolvability in focus and high is… to continually evolve. Change must not be something to avoid but too embrace. To me that means the whole change cycle from requirement analysis to delivery needs to be gone through more often. Scrum´s sprints of 4, 2 even 1 week are too long. Kanban´s flow of user stories across is too unreliable; it takes as long as it takes. Instead we should fix the cycle time at 2 days max. I call that Spinning. No increment must take longer than from this morning until tomorrow evening to finish. Then it should be acceptance checked by the customer (or his/her representative, e.g. a Product Owner). For me there are several resasons for such a fixed and short cycle time for each increment: Clear expectations Absolute estimates (“This will take X days to complete.”) are near impossible in software development as explained previously. Too much unplanned research and engineering work lurk in every feature. And then pervasive interruptions of work by peers and management. However, the smaller the scope the better our absolute estimates become. That´s because we understand better what really are the requirements and what the solution should look like. But maybe more importantly the shorter the timespan the more we can control how we use our time. So much can happen over the course of a week and longer timespans. But if push comes to shove I can block out all distractions and interruptions for a day or possibly two. That´s why I believe we can give rough absolute estimates on 3 levels: Noon Tonight Tomorrow Think of a meeting with a Product Owner at 8:30 in the morning. If she asks you, how long it will take you to implement a user story or bug fix, you can say, “It´ll be fixed by noon.”, or you can say, “I can manage to implement it until tonight before I leave.”, or you can say, “You´ll get it by tomorrow night at latest.” Yes, I believe all else would be naive. If you´re not confident to get something done by tomorrow night (some 34h from now) you just cannot reliably commit to any timeframe. That means you should not promise anything, you should not even start working on the issue. So when estimating use these four categories: Noon, Tonight, Tomorrow, NoClue - with NoClue meaning the requirement needs to be broken down further so each aspect can be assigned to one of the first three categories. If you like absolute estimates, here you go. But don´t do deep estimates. Don´t estimate dozens of issues; don´t think ahead (“Issue A is a Tonight, then B will be a Tomorrow, after that it´s C as a Noon, finally D is a Tonight - that´s what I´ll do this week.”). Just estimate so Work-in-Progress (WIP) is 1 for everybody - plus a small number of buffer issues. To be blunt: Yes, this makes promises impossible as to what a team will deliver in terms of scope at a certain date in the future. But it will give a Product Owner a clear picture of what to pull for acceptance feedback tonight and tomorrow. Trust through reliability Our trade is lacking trust. Customers don´t trust software companies/departments much. Managers don´t trust developers much. I find that perfectly understandable in the light of what we´re trying to accomplish: delivering software in the face of uncertainty by means of material good production. Customers as well as managers still expect software development to be close to production of houses or cars. But that´s a fundamental misunderstanding. Software development ist development. It´s basically research. As software developers we´re constantly executing experiments to find out what really provides value to users. We don´t know what they need, we just have mediated hypothesises. That´s why we cannot reliably deliver on preposterous demands. So trust is out of the window in no time. If we switch to delivering in short cycles, though, we can regain trust. Because estimates - explicit or implicit - up to 32 hours at most can be satisfied. I´d say: reliability over scope. It´s more important to reliably deliver what was promised then to cover a lot of requirement area. So when in doubt promise less - but deliver without delay. Deliver on scope (Functionality and Quality); but also deliver on Evolvability, i.e. on inner quality according to accepted principles. Always. Trust will be the reward. Less complexity of communication will follow. More goodwill buffer will follow. So don´t wait for some Kanban board to show you, that flow can be improved by scheduling smaller stories. You don´t need to learn that the hard way. Just start with small batch sizes of three different sizes. Fast feedback What has been finished can be checked for acceptance. Why wait for a sprint of several weeks to end? Why let the mental model of the issue and its solution dissipate? If you get final feedback after one or two weeks, you hardly remember what you did and why you did it. Resoning becomes hard. But more importantly youo probably are not in the mood anymore to go back to something you deemed done a long time ago. It´s boring, it´s frustrating to open up that mental box again. Learning is harder the longer it takes from event to feedback. Effort can be wasted between event (finishing an issue) and feedback, because other work might go in the wrong direction based on false premises. Checking finished issues for acceptance is the most important task of a Product Owner. It´s even more important than planning new issues. Because as long as work started is not released (accepted) it´s potential waste. So before starting new work better make sure work already done has value. By putting the emphasis on acceptance rather than planning true pull is established. As long as planning and starting work is more important, it´s a push process. Accept a Noon issue on the same day before leaving. Accept a Tonight issue before leaving today or first thing tomorrow morning. Accept a Tomorrow issue tomorrow night before leaving or early the day after tomorrow. After acceptance the developer(s) can start working on the next issue. Flexibility As if reliability/trust and fast feedback for less waste weren´t enough economic incentive, there is flexibility. After each issue the Product Owner can change course. If on Monday morning feature slices A, B, C, D, E were important and A, B, C were scheduled for acceptance by Monday evening and Tuesday evening, the Product Owner can change her mind at any time. Maybe after A got accepted she asks for continuation with D. But maybe, just maybe, she has gotten a completely different idea by then. Maybe she wants work to continue on F. And after B it´s neither D nor E, but G. And after G it´s D. With Spinning every 32 hours at latest priorities can be changed. And nothing is lost. Because what got accepted is of value. It provides an incremental value to the customer/user. Or it provides internal value to the Product Owner as increased knowledge/decreased uncertainty. I find such reactivity over commitment economically very benefical. Why commit a team to some workload for several weeks? It´s unnecessary at beast, and inflexible and wasteful at worst. If we cannot promise delivery of a certain scope on a certain date - which is what customers/management usually want -, we can at least provide them with unpredecented flexibility in the face of high uncertainty. Where the path is not clear, cannot be clear, make small steps so you´re able to change your course at any time. Premature completion Customers/management are used to premeditating budgets. They want to know exactly how much to pay for a certain amount of requirements. That´s understandable. But it does not match with the nature of software development. We should know that by now. Maybe there´s somewhere in the world some team who can consistently deliver on scope, quality, and time, and budget. Great! Congratulations! I, however, haven´t seen such a team yet. Which does not mean it´s impossible, but I think it´s nothing I can recommend to strive for. Rather I´d say: Don´t try this at home. It might hurt you one way or the other. However, what we can do, is allow customers/management stop work on features at any moment. With spinning every 32 hours a feature can be declared as finished - even though it might not be completed according to initial definition. I think, progress over completion is an important offer software development can make. Why think in terms of completion beyond a promise for the next 32 hours? Isn´t it more important to constantly move forward? Step by step. We´re not running sprints, we´re not running marathons, not even ultra-marathons. We´re in the sport of running forever. That makes it futile to stare at the finishing line. The very concept of a burn-down chart is misleading (in most cases). Whoever can only think in terms of completed requirements shuts out the chance for saving money. The requirements for a features mostly are uncertain. So how does a Product Owner know in the first place, how much is needed. Maybe more than specified is needed - which gets uncovered step by step with each finished increment. Maybe less than specified is needed. After each 4–32 hour increment the Product Owner can do an experient (or invite users to an experiment) if a particular trait of the software system is already good enough. And if so, she can switch the attention to a different aspect. In the end, requirements A, B, C then could be finished just 70%, 80%, and 50%. What the heck? It´s good enough - for now. 33% money saved. Wouldn´t that be splendid? Isn´t that a stunning argument for any budget-sensitive customer? You can save money and still get what you need? Pull on practices So far, in addition to more trust, more flexibility, less money spent, Spinning led to “doing less” which also means less code which of course means higher Evolvability per se. Last but not least, though, I think Spinning´s short acceptance cycles have one more effect. They excert pull-power on all sorts of practices known for increasing Evolvability. If, for example, you believe high automated test coverage helps Evolvability by lowering the fear of inadverted damage to a code base, why isn´t 90% of the developer community practicing automated tests consistently? I think, the answer is simple: Because they can do without. Somehow they manage to do enough manual checks before their rare releases/acceptance checks to ensure good enough correctness - at least in the short term. The same goes for other practices like component orientation, continuous build/integration, code reviews etc. None of that is compelling, urgent, imperative. Something else always seems more important. So Evolvability principles and practices fall through the cracks most of the time - until a project hits a wall. Then everybody becomes desperate; but by then (re)gaining Evolvability has become as very, very difficult and tedious undertaking. Sometimes up to the point where the existence of a project/company is in danger. With Spinning that´s different. If you´re practicing Spinning you cannot avoid all those practices. With Spinning you very quickly realize you cannot deliver reliably even on your 32 hour promises. Spinning thus is pulling on developers to adopt principles and practices for Evolvability. They will start actively looking for ways to keep their delivery rate high. And if not, management will soon tell them to do that. Because first the Product Owner then management will notice an increasing difficulty to deliver value within 32 hours. There, finally there emerges a way to measure Evolvability: The more frequent developers tell the Product Owner there is no way to deliver anything worth of feedback until tomorrow night, the poorer Evolvability is. Don´t count the “WTF!”, count the “No way!” utterances. In closing For sustainable software development we need to put Evolvability first. Functionality and Quality must not rule software development but be implemented within a framework ensuring (enough) Evolvability. Since Evolvability cannot be measured easily, I think we need to put software development “under pressure”. Software needs to be changed more often, in smaller increments. Each increment being relevant to the customer/user in some way. That does not mean each increment is worthy of shipment. It´s sufficient to gain further insight from it. Increments primarily serve the reduction of uncertainty, not sales. Sales even needs to be decoupled from this incremental progress. No more promises to sales. No more delivery au point. Rather sales should look at a stream of accepted increments (or incremental releases) and scoup from that whatever they find valuable. Sales and marketing need to realize they should work on what´s there, not what might be possible in the future. But I digress… In my view a Spinning cycle - which is not easy to reach, which requires practice - is the core practice to compensate the immeasurability of Evolvability. From start to finish of each issue in 32 hours max - that´s the challenge we need to accept if we´re serious increasing Evolvability. Fortunately higher Evolvability is not the only outcome of Spinning. Customer/management will like the increased flexibility and “getting more bang for the buck”.

Read the article
What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article
Array sorting efficiency... Beginner need advice

- by SoleSoft

I'll start by saying I am very much a beginner programmer, this is essentially my first real project outside of using learning material. I've been making a 'Simon Says' style game (the game where you repeat the pattern generated by the computer) using C# and XNA, the actual game is complete and working fine but while creating it, I wanted to also create a 'top 10' scoreboard. The scoreboard would record player name, level (how many 'rounds' they've completed) and combo (how many buttons presses they got correct), the scoreboard would then be sorted by combo score. This led me to XML, the first time using it, and I eventually got to the point of having an XML file that recorded the top 10 scores. The XML file is managed within a scoreboard class, which is also responsible for adding new scores and sorting scores. Which gets me to the point... I'd like some feedback on the way I've gone about sorting the score list and how I could have done it better, I have no other way to gain feedback =(. I know .NET features Array.Sort() but I wasn't too sure of how to use it as it's not just a single array that needs to be sorted. When a new score needs to be entered into the scoreboard, the player name and level also have to be added. These are stored within an 'array of arrays' (10 = for 'top 10' scores) scoreboardComboData = new int[10]; // Combo scoreboardTextData = new string[2][]; scoreboardTextData[0] = new string[10]; // Name scoreboardTextData[1] = new string[10]; // Level as string The scoreboard class works as follows: - Checks to see if 'scoreboard.xml' exists, if not it creates it - Initialises above arrays and adds any player data from scoreboard.xml, from previous run - when AddScore(name, level, combo) is called the sort begins - Another method can also be called that populates the XML file with above array data The sort checks to see if the new score (combo) is less than or equal to any recorded scores within the scoreboardComboData array (if it's greater than a score, it moves onto the next element). If so, it moves all scores below the score it is less than or equal to down one element, essentially removing the last score and then places the new score within the element below the score it is less than or equal to. If the score is greater than all recorded scores, it moves all scores down one and inserts the new score within the first element. If it's the only score, it simply adds it to the first element. When a new score is added, the Name and Level data is also added to their relevant arrays, in the same way. What a tongue twister. Below is the AddScore method, I've added comments in the hope that it makes things clearer O_o. You can get the actual source file HERE. Below the method is an example of the quickest way to add a score to follow through with a debugger. public static void AddScore(string name, string level, int combo) { // If the scoreboard has not yet been filled, this adds another 'active' // array element each time a new score is added. The actual array size is // defined within PopulateScoreBoard() (set to 10 - for 'top 10' if (totalScores < scoreboardComboData.Length) totalScores++; // Does the scoreboard even need sorting? if (totalScores > 1) { for (int i = totalScores - 1; i > - 1; i--) { // Check to see if score (combo) is greater than score stored in // array if (combo > scoreboardComboData[i] && i != 0) { // If so continue to next element continue; } // Check to see if score (combo) is less or equal to element 'i' // score && that the element is not the last in the // array, if so the score does not need to be added to the scoreboard else if (combo <= scoreboardComboData[i] && i != scoreboardComboData.Length - 1) { // If the score is lower than element 'i' and greater than the last // element within the array, it needs to be added to the scoreboard. This is achieved // by moving each element under element 'i' down an element. The new score is then inserted // into the array under element 'i' for (int j = totalScores - 1; j > i; j--) { // Name and level data are moved down in their relevant arrays scoreboardTextData[0][j] = scoreboardTextData[0][j - 1]; scoreboardTextData[1][j] = scoreboardTextData[1][j - 1]; // Score (combo) data is moved down in relevant array scoreboardComboData[j] = scoreboardComboData[j - 1]; } // The new Name, level and score (combo) data is inserted into the relevant array under element 'i' scoreboardTextData[0][i + 1] = name; scoreboardTextData[1][i + 1] = level; scoreboardComboData[i + 1] = combo; break; } // If the method gets the this point, it means that the score is greater than all scores within // the array and therefore cannot be added in the above way. As it is not less than any score within // the array. else if (i == 0) { // All Names, levels and scores are moved down within their relevant arrays for (int j = totalScores - 1; j != 0; j--) { scoreboardTextData[0][j] = scoreboardTextData[0][j - 1]; scoreboardTextData[1][j] = scoreboardTextData[1][j - 1]; scoreboardComboData[j] = scoreboardComboData[j - 1]; } // The new number 1 top name, level and score, are added into the first element // within each of their relevant arrays. scoreboardTextData[0][0] = name; scoreboardTextData[1][0] = level; scoreboardComboData[0] = combo; break; } // If the methods get to this point, the combo score is not high enough // to be on the top10 score list and therefore needs to break break; } } // As totalScores < 1, the current score is the first to be added. Therefore no checks need to be made // and the Name, Level and combo data can be entered directly into the first element of their relevant // array. else { scoreboardTextData[0][0] = name; scoreboardTextData[1][0] = level; scoreboardComboData[0] = combo; } } } Example for adding score: private static void Initialize() { scoreboardDoc = new XmlDocument(); if (!File.Exists("Scoreboard.xml")) GenerateXML("Scoreboard.xml"); PopulateScoreBoard("Scoreboard.xml"); // ADD TEST SCORES HERE! AddScore("EXAMPLE", "10", 100); AddScore("EXAMPLE2", "24", 999); PopulateXML("Scoreboard.xml"); } In it's current state the source file is just used for testing, initialize is called within main and PopulateScoreBoard handles the majority of other initialising, so nothing else is needed, except to add a test score. I thank you for your time!

Read the article
The Incremental Architect’s Napkin - #5 - Design functions for extensibility and readability

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/24/the-incremental-architectrsquos-napkin---5---design-functions-for.aspx The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points. Designing software thus consists of at least three phases: Analyzing the requirements to find the Entry Points and their signatures Designing the functionality to be executed when those Entry Points get triggered Implementing the functionality according to the design aka coding I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language. But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought. Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”. And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality. Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do). But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions. Functions are not relevant to functionality. Strange, isn´t it. What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent). That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development. To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it. Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it. But how to do functional design? An example of functional design Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e. There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line C:\>deduplicate "a, b, a, c, d, b, e, c, a" a, b, c, d, e …or by clicking on a GUI button. This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable. What then happens is a three step process: Transform the input data from the user into a request. Call the request handler. Transform the output of the request handler into a tangible result for the user. Or to phrase it a bit more generally: Accept input. Transform input into output. Present output. This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP). Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself. But it´s still on a meta-level. The application to the domain at hand is easy, though: Accept string list from command line De-duplicate Present de-duplicated strings on standard output And this concrete list of processing steps can easily be transformed into code:static void Main(string[] args) { var input = Accept_string_list(args); var output = Deduplicate(input); Present_deduplicated_string_list(output); } Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point. But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.private static string Accept_string_list(string[] args) { return args[0]; } private static void Present_deduplicated_string_list( string[] output) { var line = string.Join(", ", output); Console.WriteLine(line); } Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call. And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done? Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion: De-duplicate Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:private static string[] Parse_string_list(string input) { return input.Split(',') .Select(s => s.Trim()) .ToArray(); } private static Dictionary<string,object> Compile_unique_strings(string[] strings) { return strings.Aggregate( new Dictionary<string, object>(), (agg, s) => { agg[s] = null; return agg; }); } private static string[] Serialize_unique_strings( Dictionary<string,object> dict) { return dict.Keys.ToArray(); } With these three additional functions Main() now looks like this:static void Main(string[] args) { var input = Accept_string_list(args); var strings = Parse_string_list(input); var dict = Compile_unique_strings(strings); var output = Serialize_unique_strings(dict); Present_deduplicated_string_list(output); } I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design: Accept string list from command line Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Present de-duplicated strings on standard output You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later. And as a bonus: all the functions making up the process are small - which means easy to understand, too. So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface. Introducing Flow Design Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm. An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1] In its simplest form a process can be written as a bullet point list of steps, e.g. Get data from user Output result to user Transform data Parse data Map result for output Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming. Next comes ordering the steps. What should happen first, what next etc.? Get data from user Parse data Transform data Map result for output Output result to user That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD. Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion? My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title. Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with. So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple. However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text. In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps. That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient. Visualizing processes The building block of the functional design notation is a functional unit. I mostly draw it like this: Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware. Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design. It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details. That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview. But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”. Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want. Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call). TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers. Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem: For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure. Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units. Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified. Empty brackets mean “no data is flowing”, but nevertheless a signal is sent. A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose. A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type. If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]). But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent. Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output. A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects. A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input. If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings). The Principle of Mutual Oblivion A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other. Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving. Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel. Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream. That makes functional units very easy to test. At least as long as they don´t depend on state or resources. I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility. How the whole is built, how a larger goal is achieved, is of no concern to the single functional units. By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other. Take a nerve cell “controlling” a muscle cell for example:[2] The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way. Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about. Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism. How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment. My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”? So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units. That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology: Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware. Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle. Translating functional units into code Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules: Translate an input port to a function. Translate an output port either to a return statement in that function or to a function pointer visible to that function. The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO. Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks. Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input. Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed. Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”. In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data. Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value. So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]void filter_stop_words( string word, Action<string> onNoStopWord) { if (...check if not a stop word...) onNoStopWord(word); } If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation. Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow. Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only. Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically. Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.class Filter { public void filter_stop_words( string word) { if (...check if not a stop word...) onNoStopWord(word); } public event Action<string> onNoStopWord; } When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road. Another example of 1D functional design Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design. The function signature given is:string WordWrap(string text, int maxLineLength) {...} That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality. The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified. If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line. Flow Design Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed? Words need to be assembled into lines Words need to be extracted from the input text The resulting lines need to be assembled into the output text Words too long to fit in a line need to be split Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line). Here´s the Flow Design for this increment: The the first three bullet points turned into functional units with explicit data added. As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit. In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are. But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced. The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail. Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem. Implementation The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility. And the process flow becomes just a sequence of function calls: Easy to understand. It clearly states how word wrapping works - on a high level of abstraction. And it´s easy to evolve as you´ll see. Flow Design - Increment 2 So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long. Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope. Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible? Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future. I just “phase in” the remaining processing step: Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step. Implementation - Increment 2 A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on: Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with. Compare this to Robert C. Martin´s solution:[4] How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach. Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess. But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design. But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter. In closing I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly. Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here. For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.). Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members. Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities. PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step. ? Image taken from www.physioweb.org ? My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer. ? Taken from his blog post “The Craftsman 62, The Dark Path”. ?

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Android app crashes on emulator - logCat shows no errors

- by David Miler

I have just added the SherlockActionBar library to my android project. After some small changes (FragmentActivity - SherlockFragmentActivity, getActionBar() - getSupportActionBar(), imports) it all compiled nicely. After I run the app, however, the debugger stops, as though it had encountered an exception. However, there are no errors shown in the LogCat output. I just can't wrap my head around what's going on. Here is the logCat output after I terminate the app. 10-02 14:11:19.227: I/SystemUpdateService(174): UpdateTask at time 1349187079227 10-02 14:11:19.237: I/ActivityThread(328): Pub com.android.email.attachmentprovider: com.android.email.provider.AttachmentProvider 10-02 14:11:19.687: I/dalvikvm(81): Jit: resizing JitTable from 512 to 1024 10-02 14:11:19.809: D/MediaScannerService(150): start scanning volume internal: [/system/media] 10-02 14:11:20.047: V/AlarmClock(239): AlarmInitReceiver finished 10-02 14:11:20.087: I/ActivityManager(81): Start proc com.android.quicksearchbox for broadcast com.android.quicksearchbox/.SearchWidgetProvider: pid=346 uid=10012 gids={3003} 10-02 14:11:20.127: D/ExchangeService(320): !!! EAS ExchangeService, onStartCommand, startingUp = false, running = false 10-02 14:11:20.427: I/ActivityThread(346): Pub com.android.quicksearchbox.google: com.android.quicksearchbox.google.GoogleSuggestionProvider 10-02 14:11:20.497: I/ActivityThread(346): Pub com.android.quicksearchbox.shortcuts: com.android.quicksearchbox.ShortcutsProvider 10-02 14:11:20.657: I/ActivityManager(81): Start proc com.android.music for broadcast com.android.music/.MediaAppWidgetProvider: pid=358 uid=10028 gids={3003, 1015} 10-02 14:11:20.927: D/ExchangeService(320): !!! EAS ExchangeService, onCreate 10-02 14:11:20.967: D/dalvikvm(260): GC_CONCURRENT freed 213K, 6% free 6409K/6791K, paused 5ms+101ms 10-02 14:11:21.077: D/ExchangeService(320): !!! EAS ExchangeService, onStartCommand, startingUp = true, running = false 10-02 14:11:21.567: D/GTalkService(174): [ReonnectMgr] ### report Inet condition: status=false, networkType=0 10-02 14:11:21.587: D/ConnectivityService(81): reportNetworkCondition(0, 0) 10-02 14:11:21.597: D/ConnectivityService(81): Inet connectivity change, net=0, condition=0,mActiveDefaultNetwork=0 10-02 14:11:21.597: D/ConnectivityService(81): starting a change hold 10-02 14:11:21.697: D/GTalkService(174): [RawStanzaProvidersMgr] ##### searchProvidersFromIntent 10-02 14:11:21.697: D/GTalkService(174): [RawStanzaProvidersMgr] no intent receivers found 10-02 14:11:21.847: I/SystemUpdateService(174): cancelUpdate (empty URL) 10-02 14:11:21.847: E/TelephonyManager(174): Hidden constructor called more than once per process! 10-02 14:11:21.867: D/dalvikvm(174): GC_CONCURRENT freed 337K, 7% free 6561K/7047K, paused 5ms+4ms 10-02 14:11:21.917: D/GTalkService(174): [ReonnectMgr] ### report Inet condition: status=false, networkType=0 10-02 14:11:21.917: D/ConnectivityService(81): reportNetworkCondition(0, 0) 10-02 14:11:21.917: D/ConnectivityService(81): Inet connectivity change, net=0, condition=0,mActiveDefaultNetwork=0 10-02 14:11:21.917: D/ConnectivityService(81): currently in hold - not setting new end evt 10-02 14:11:21.990: E/TelephonyManager(174): Original: com.google.android.location, new: com.google.android.gsf 10-02 14:11:22.027: I/SystemUpdateService(174): removeAllDownloads (cancelUpdate) 10-02 14:11:22.127: D/dalvikvm(328): GC_CONCURRENT freed 205K, 6% free 6506K/6855K, paused 660ms+3ms 10-02 14:11:22.197: D/Eas Debug(320): Logging: 10-02 14:11:22.319: D/dalvikvm(81): GREF has increased to 401 10-02 14:11:22.947: D/ExchangeService(320): !!! EAS ExchangeService, onStartCommand, startingUp = true, running = false 10-02 14:11:23.130: D/Eas Debug(320): Logging: 10-02 14:11:23.307: I//system/bin/fsck_msdos(29): Attempting to allocate 2044 KB for FAT 10-02 14:11:23.560: I/ActivityManager(81): Starting: Intent { flg=0x10000000 cmp=com.google.android.gsf/.update.SystemUpdateInstallDialog } from pid 174 10-02 14:11:23.587: I/ActivityManager(81): Starting: Intent { flg=0x10000000 cmp=com.google.android.gsf/.update.SystemUpdateDownloadDialog } from pid 174 10-02 14:11:24.087: W/ActivityManager(81): Activity pause timeout for ActivityRecord{407c7320 com.android.launcher/com.android.launcher2.Launcher} 10-02 14:11:24.237: E/TelephonyManager(174): Hidden constructor called more than once per process! 10-02 14:11:24.237: E/TelephonyManager(174): Original: com.google.android.location, new: com.google.android.gsf 10-02 14:11:24.507: D/dalvikvm(174): GC_EXPLICIT freed 231K, 7% free 6596K/7047K, paused 4ms+6ms 10-02 14:11:24.607: D/ConnectivityService(81): Inet hold end, net=0, condition =0, published condition =0 10-02 14:11:24.607: D/ConnectivityService(81): no change in condition - aborting 10-02 14:11:24.707: D/dalvikvm(174): GC_EXPLICIT freed 17K, 7% free 6579K/7047K, paused 4ms+4ms 10-02 14:11:24.947: I//system/bin/fsck_msdos(29): ** Phase 2 - Check Cluster Chains 10-02 14:11:25.117: I//system/bin/fsck_msdos(29): ** Phase 3 - Checking Directories 10-02 14:11:25.128: I//system/bin/fsck_msdos(29): ** Phase 4 - Checking for Lost Files 10-02 14:11:25.167: I//system/bin/fsck_msdos(29): 12 files, 1044448 free (522224 clusters) 10-02 14:11:25.227: I/Vold(29): Filesystem check completed OK 10-02 14:11:25.227: I/Vold(29): Device /dev/block/vold/179:0, target /mnt/sdcard mounted @ /mnt/secure/staging 10-02 14:11:25.237: D/Vold(29): Volume sdcard state changing 3 (Checking) -> 4 (Mounted) 10-02 14:11:25.257: I/PackageManager(81): Updating external media status from unmounted to mounted 10-02 14:11:25.457: D/dalvikvm(303): GC_EXPLICIT freed 35K, 6% free 6242K/6595K, paused 3ms+312ms 10-02 14:11:25.987: D/ExchangeService(320): !!! EAS ExchangeService, onStartCommand, startingUp = true, running = false 10-02 14:11:26.157: D/MediaScanner(150): prescan time: 2905ms 10-02 14:11:26.167: D/MediaScanner(150): scan time: 148ms 10-02 14:11:26.167: D/MediaScanner(150): postscan time: 2ms 10-02 14:11:26.167: D/MediaScanner(150): total time: 3055ms 10-02 14:11:26.197: D/MediaScannerService(150): done scanning volume internal 10-02 14:11:26.237: D/MediaScannerService(150): start scanning volume external: [/mnt/sdcard] 10-02 14:11:26.497: D/dalvikvm(143): GC_EXPLICIT freed 234K, 8% free 7735K/8327K, paused 3ms+5ms 10-02 14:11:27.180: D/dalvikvm(143): GC_CONCURRENT freed 150K, 4% free 8004K/8327K, paused 7ms+3ms 10-02 14:11:27.397: D/dalvikvm(143): GC_FOR_ALLOC freed 96K, 6% free 8310K/8775K, paused 76ms 10-02 14:11:27.580: D/dalvikvm(143): GC_FOR_ALLOC freed 515K, 11% free 8135K/9095K, paused 79ms 10-02 14:11:27.829: D/dalvikvm(143): GC_CONCURRENT freed 3K, 5% free 8694K/9095K, paused 7ms+6ms 10-02 14:11:28.137: V/TLINE(143): new: android.text.TextLine@4065b280 10-02 14:11:28.527: D/dalvikvm(143): GC_CONCURRENT freed 729K, 10% free 8764K/9671K, paused 5ms+13ms 10-02 14:11:28.677: D/dalvikvm(143): GC_FOR_ALLOC freed 152K, 11% free 8683K/9671K, paused 99ms 10-02 14:11:28.717: I/dalvikvm-heap(143): Grow heap (frag case) to 11.434MB for 2975968-byte allocation 10-02 14:11:28.807: D/dalvikvm(143): GC_FOR_ALLOC freed 0K, 9% free 11589K/12615K, paused 84ms 10-02 14:11:29.159: D/dalvikvm(143): GC_CONCURRENT freed 197K, 7% free 12195K/12999K, paused 8ms+6ms 10-02 14:11:29.647: D/dalvikvm(143): GC_EXPLICIT freed 351K, 6% free 12790K/13511K, paused 8ms+17ms 10-02 14:11:29.717: I/SurfaceFlinger(32): Boot is finished (70768 ms) 10-02 14:11:29.877: I/ARMAssembler(32): generated scanline__00000177:03010104_00000002_00000000 [ 44 ipp] (66 ins) at [0x407c7290:0x407c7398] in 990662 ns 10-02 14:11:29.907: I/ARMAssembler(32): generated scanline__00000177:03515104_00000001_00000000 [ 73 ipp] (95 ins) at [0x407c73a0:0x407c751c] in 989381 ns 10-02 14:11:30.287: D/dalvikvm(174): GC_EXPLICIT freed 25K, 8% free 6554K/7047K, paused 4ms+32ms 10-02 14:11:30.380: D/dalvikvm(143): GC_EXPLICIT freed 349K, 6% free 13124K/13895K, paused 5ms+25ms 10-02 14:11:30.957: D/dalvikvm(143): GC_FOR_ALLOC freed 1069K, 10% free 13860K/15239K, paused 81ms 10-02 14:11:32.177: D/dalvikvm(150): GC_CONCURRENT freed 183K, 6% free 6438K/6791K, paused 5ms+4ms 10-02 14:11:32.187: W/ActivityManager(81): No content provider found for: 10-02 14:11:32.607: V/MediaScanner(150): pruneDeadThumbnailFiles... android.database.sqlite.SQLiteCursor@406724a8 10-02 14:11:32.617: V/MediaScanner(150): /pruneDeadThumbnailFiles... android.database.sqlite.SQLiteCursor@406724a8 10-02 14:11:32.640: W/ActivityManager(81): No content provider found for: 10-02 14:11:32.640: D/VoldCmdListener(29): asec list 10-02 14:11:32.647: I/PackageManager(81): No secure containers on sdcard 10-02 14:11:32.667: D/MediaScanner(150): prescan time: 107ms 10-02 14:11:32.667: D/MediaScanner(150): scan time: 89ms 10-02 14:11:32.667: D/MediaScanner(150): postscan time: 61ms 10-02 14:11:32.667: D/MediaScanner(150): total time: 257ms 10-02 14:11:32.697: W/PackageManager(81): Unknown permission android.permission.ADD_SYSTEM_SERVICE in package com.android.phone 10-02 14:11:32.707: W/PackageManager(81): Unknown permission com.android.smspush.WAPPUSH_MANAGER_BIND in package com.android.phone 10-02 14:11:32.737: W/PackageManager(81): Not granting permission android.permission.SEND_DOWNLOAD_COMPLETED_INTENTS to package com.android.browser (protectionLevel=2 flags=0x9be45) 10-02 14:11:32.737: W/PackageManager(81): Not granting permission android.permission.BIND_APPWIDGET to package com.android.widgetpreview (protectionLevel=3 flags=0x28be44) 10-02 14:11:32.767: W/PackageManager(81): Unknown permission android.permission.READ_OWNER_DATA in package com.android.exchange 10-02 14:11:32.778: W/PackageManager(81): Unknown permission android.permission.READ_OWNER_DATA in package com.android.email 10-02 14:11:32.788: W/PackageManager(81): Unknown permission com.android.providers.im.permission.READ_ONLY in package com.google.android.apps.maps 10-02 14:11:32.797: W/PackageManager(81): Not granting permission android.permission.DEVICE_POWER to package com.android.deskclock (protectionLevel=2 flags=0x8be45) 10-02 14:11:33.137: D/MediaScannerService(150): done scanning volume external 10-02 14:11:33.197: D/PackageParser(81): Scanning package: /data/app/vmdl257911298.tmp 10-02 14:11:33.837: I/InputReader(81): Device reconfigured: id=0, name='qwerty2', surface size is now 1024x800 10-02 14:11:34.097: D/dalvikvm(81): GC_CONCURRENT freed 12185K, 47% free 13966K/26311K, paused 8ms+23ms 10-02 14:11:36.798: I/TabletStatusBar(124): DISABLE_CLOCK: no 10-02 14:11:36.798: I/TabletStatusBar(124): DISABLE_NAVIGATION: no 10-02 14:11:37.348: I/ARMAssembler(32): generated scanline__00000177:03515104_00001001_00000000 [ 91 ipp] (114 ins) at [0x407c7520:0x407c76e8] in 919320 ns 10-02 14:11:37.598: I/TabletStatusBar(124): DISABLE_BACK: no 10-02 14:11:37.710: I/ActivityManager(81): Displayed com.android.launcher/com.android.launcher2.Launcher: +46s212ms 10-02 14:11:38.817: D/dalvikvm(143): GC_CONCURRENT freed 969K, 8% free 14867K/16007K, paused 4ms+10ms 10-02 14:11:39.437: I/dalvikvm(81): Jit: resizing JitTable from 1024 to 2048 10-02 14:11:40.267: D/dalvikvm(143): GC_FOR_ALLOC freed 2357K, 16% free 14395K/17031K, paused 80ms 10-02 14:11:40.717: D/dalvikvm(143): GC_EXPLICIT freed 742K, 16% free 14358K/17031K, paused 8ms+4ms 10-02 14:11:41.617: D/dalvikvm(81): GC_CONCURRENT freed 1955K, 48% free 13869K/26311K, paused 9ms+10ms 10-02 14:11:42.559: D/dalvikvm(81): GC_CONCURRENT freed 1830K, 48% free 13881K/26311K, paused 9ms+9ms 10-02 14:11:42.758: I/PackageManager(81): Removing non-system package:cz.trilimi.sfaui 10-02 14:11:42.758: I/ActivityManager(81): Force stopping package cz.trilimi.sfaui uid=10036 10-02 14:11:42.967: D/PackageManager(81): Scanning package cz.trilimi.sfaui 10-02 14:11:42.967: I/PackageManager(81): Package cz.trilimi.sfaui codePath changed from /data/app/cz.trilimi.sfaui-1.apk to /data/app/cz.trilimi.sfaui-2.apk; Retaining data and using new 10-02 14:11:42.967: I/PackageManager(81): Unpacking native libraries for /data/app/cz.trilimi.sfaui-2.apk 10-02 14:11:43.097: D/installd(35): DexInv: --- BEGIN '/data/app/cz.trilimi.sfaui-2.apk' --- 10-02 14:11:45.317: D/dalvikvm(391): DexOpt: load 434ms, verify+opt 1260ms 10-02 14:11:45.407: D/installd(35): DexInv: --- END '/data/app/cz.trilimi.sfaui-2.apk' (success) --- 10-02 14:11:45.407: W/PackageManager(81): Code path for pkg : cz.trilimi.sfaui changing from /data/app/cz.trilimi.sfaui-1.apk to /data/app/cz.trilimi.sfaui-2.apk 10-02 14:11:45.407: W/PackageManager(81): Resource path for pkg : cz.trilimi.sfaui changing from /data/app/cz.trilimi.sfaui-1.apk to /data/app/cz.trilimi.sfaui-2.apk 10-02 14:11:45.407: D/PackageManager(81): Activities: cz.trilimi.sfaui.ItemListActivity cz.trilimi.sfaui.ItemDetailActivity 10-02 14:11:45.427: I/ActivityManager(81): Force stopping package cz.trilimi.sfaui uid=10036 10-02 14:11:45.657: I/installd(35): move /data/dalvik-cache/data@[email protected]@classes.dex -> /data/dalvik-cache/data@[email protected]@classes.dex 10-02 14:11:45.657: D/PackageManager(81): New package installed in /data/app/cz.trilimi.sfaui-2.apk 10-02 14:11:45.997: I/ActivityManager(81): Force stopping package cz.trilimi.sfaui uid=10036 10-02 14:11:46.147: D/dalvikvm(143): GC_EXPLICIT freed 3K, 16% free 14356K/17031K, paused 10ms+9ms 10-02 14:11:46.237: D/PackageManager(81): generateServicesMap(android.accounts.AccountAuthenticator): 3 services unchanged 10-02 14:11:46.277: D/PackageManager(81): generateServicesMap(android.content.SyncAdapter): 5 services unchanged 10-02 14:11:46.337: D/PackageManager(81): generateServicesMap(android.accounts.AccountAuthenticator): 3 services unchanged 10-02 14:11:46.347: D/PackageManager(81): generateServicesMap(android.content.SyncAdapter): 5 services unchanged 10-02 14:11:46.437: D/dalvikvm(208): GC_EXPLICIT freed 258K, 7% free 6488K/6919K, paused 3ms+5ms 10-02 14:11:46.477: W/RecognitionManagerService(81): no available voice recognition services found 10-02 14:11:46.897: I/ActivityManager(81): Start proc com.svox.pico for broadcast com.svox.pico/.VoiceDataInstallerReceiver: pid=398 uid=10006 gids={} 10-02 14:11:47.087: I/ActivityThread(398): Pub com.svox.pico.providers.SettingsProvider: com.svox.pico.providers.SettingsProvider 10-02 14:11:47.138: D/GTalkService(174): [GTalkService.1] handlePackageInstalled: re-initialize providers 10-02 14:11:47.147: D/GTalkService(174): [RawStanzaProvidersMgr] ##### searchProvidersFromIntent 10-02 14:11:47.147: D/GTalkService(174): [RawStanzaProvidersMgr] no intent receivers found 10-02 14:11:47.718: I/AccountTypeManager(208): Loaded meta-data for 1 account types, 0 accounts in 186ms 10-02 14:11:48.377: D/dalvikvm(143): GC_CONCURRENT freed 1865K, 15% free 14513K/17031K, paused 7ms+4ms 10-02 14:11:48.917: D/dalvikvm(208): GC_CONCURRENT freed 219K, 6% free 6788K/7175K, paused 7ms+73ms 10-02 14:11:49.207: D/dalvikvm(143): GC_FOR_ALLOC freed 4558K, 31% free 11866K/17031K, paused 89ms 10-02 14:11:49.587: D/dalvikvm(143): GC_CONCURRENT freed 713K, 24% free 13010K/17031K, paused 5ms+4ms 10-02 14:11:49.967: D/dalvikvm(143): GC_CONCURRENT freed 1046K, 19% free 13922K/17031K, paused 5ms+4ms 10-02 14:11:50.437: D/dalvikvm(81): GC_EXPLICIT freed 898K, 47% free 13955K/26311K, paused 6ms+39ms 10-02 14:11:50.467: I/installd(35): unlink /data/dalvik-cache/data@[email protected]@classes.dex 10-02 14:11:50.477: D/AndroidRuntime(227): Shutting down VM 10-02 14:11:50.507: D/dalvikvm(227): GC_CONCURRENT freed 97K, 84% free 331K/2048K, paused 1ms+2ms 10-02 14:11:50.507: I/AndroidRuntime(227): NOTE: attach of thread 'Binder Thread #3' failed 10-02 14:11:50.517: D/jdwp(227): adbd disconnected 10-02 14:11:51.177: D/AndroidRuntime(410): >>>>>> AndroidRuntime START com.android.internal.os.RuntimeInit <<<<<< 10-02 14:11:51.177: D/AndroidRuntime(410): CheckJNI is ON 10-02 14:11:51.897: D/AndroidRuntime(410): Calling main entry com.android.commands.am.Am 10-02 14:11:51.937: I/ActivityManager(81): Force stopping package cz.trilimi.sfaui uid=10036 10-02 14:11:51.937: I/ActivityManager(81): Starting: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=cz.trilimi.sfaui/.ItemListActivity } from pid 410 10-02 14:11:51.968: W/WindowManager(81): Failure taking screenshot for (230x179) to layer 21005 10-02 14:11:51.997: I/ActivityManager(81): Start proc cz.trilimi.sfaui for activity cz.trilimi.sfaui/.ItemListActivity: pid=418 uid=10036 gids={} 10-02 14:11:52.007: D/AndroidRuntime(410): Shutting down VM 10-02 14:11:52.057: I/AndroidRuntime(410): NOTE: attach of thread 'Binder Thread #3' failed 10-02 14:11:52.097: D/dalvikvm(410): GC_CONCURRENT freed 98K, 83% free 360K/2048K, paused 1ms+0ms 10-02 14:11:52.097: D/jdwp(410): adbd disconnected 10-02 14:11:53.147: W/ActivityThread(418): Application cz.trilimi.sfaui is waiting for the debugger on port 8100... 10-02 14:11:53.207: I/System.out(418): Sending WAIT chunk 10-02 14:11:53.217: I/dalvikvm(418): Debugger is active 10-02 14:11:53.447: I/System.out(418): Debugger has connected 10-02 14:11:53.457: I/System.out(418): waiting for debugger to settle... 10-02 14:11:53.637: I/ARMAssembler(32): generated scanline__00000177:03515104_00001002_00000000 [ 87 ipp] (110 ins) at [0x407c76f0:0x407c78a8] in 598498 ns 10-02 14:11:53.660: I/System.out(418): waiting for debugger to settle... 10-02 14:11:53.857: I/System.out(418): waiting for debugger to settle... 10-02 14:11:54.057: I/System.out(418): waiting for debugger to settle... 10-02 14:11:54.257: I/System.out(418): waiting for debugger to settle... 10-02 14:11:54.317: V/TLINE(81): new: android.text.TextLine@4155dde8 10-02 14:11:54.467: I/System.out(418): waiting for debugger to settle... 10-02 14:11:54.667: I/System.out(418): waiting for debugger to settle... 10-02 14:11:54.870: I/System.out(418): waiting for debugger to settle... 10-02 14:11:55.027: D/dalvikvm(143): GC_EXPLICIT freed 900K, 16% free 14420K/17031K, paused 7ms+4ms 10-02 14:11:55.067: I/System.out(418): waiting for debugger to settle... 10-02 14:11:55.292: I/System.out(418): debugger has settled (1315) 10-02 14:12:02.008: W/ActivityManager(81): Launch timeout has expired, giving up wake lock! 10-02 14:12:02.971: W/ActivityManager(81): Activity idle timeout for ActivityRecord{4078c6b0 cz.trilimi.sfaui/.ItemListActivity} 10-02 14:12:08.359: D/ExchangeService(320): Received deviceId from Email app: androidc259148960 10-02 14:12:08.507: D/ExchangeService(320): Reconciling accounts... 10-02 14:16:11.437: D/SntpClient(81): request time failed: java.net.SocketException: Address family not supported by protocol 10-02 14:17:21.573: W/jdwp(418): Debugger is telling the VM to exit with code=1 10-02 14:17:21.573: I/dalvikvm(418): GC lifetime allocation: 8642 bytes 10-02 14:17:21.637: D/Zygote(33): Process 418 exited cleanly (1) 10-02 14:17:21.651: I/ActivityManager(81): Process cz.trilimi.sfaui (pid 418) has died. 10-02 14:17:21.847: D/dalvikvm(143): GC_EXPLICIT freed <1K, 16% free 14420K/17031K, paused 7ms+7ms 10-02 14:17:21.917: W/InputManagerService(81): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@40bfbf28

Read the article
socket operation on nonsocket or bad file descriptor

- by Magn3s1um

I'm writing a pthread server which takes requests from clients and sends them back a bunch of .ppm files. Everything seems to go well, but sometimes when I have just 1 client connected, when trying to read from the file descriptor (for the file), it says Bad file Descriptor. This doesn't make sense, since my int fd isn't -1, and the file most certainly exists. Other times, I get this "Socket operation on nonsocket" error. This is weird because other times, it doesn't give me this error and everything works fine. When trying to connect multiple clients, for some reason, it will only send correctly to one, and then the other client gets the bad file descriptor or "nonsocket" error, even though both threads are processing the same messages and do the same routines. Anyone have an idea why? Here's the code that is giving me that error: while(mqueue.head != mqueue.tail && count < dis_m){ printf("Sending to client %s: %s\n", pointer->id, pointer->message); int fd; fd = open(pointer->message, O_RDONLY); char buf[58368]; int bytesRead; printf("This is fd %d\n", fd); bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); fflush(stdout); close(fd); mqueue.mcount--; mqueue.head = mqueue.head->next; free(pointer->message); free(pointer); pointer = mqueue.head; count++; } printf("Sending %s\n", pointer->message); int fd; fd = open(pointer->message, O_RDONLY); printf("This is fd %d\n", fd); printf("I am hhere2\n"); char buf[58368]; int bytesRead; bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); close(fd); mqueue.mcount--; if(mqueue.head != mqueue.tail){ mqueue.head = mqueue.head->next; } else{ mqueue.head->next = malloc(sizeof(struct message)); mqueue.head = mqueue.head->next; mqueue.head->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.head->next; mqueue.head->message = NULL; } free(pointer->message); free(pointer); pthread_mutex_unlock(&numm); pthread_mutex_unlock(&circ); pthread_mutex_unlock(&slots); The messages for both threads are the same, being of the form ./path/imageXX.ppm where XX is the number that should go to the client. The file size of each image is 58368 bytes. Sometimes, this code hangs on the read, and stops execution. I don't know this would be either, because the file descriptor comes back as valid. Thanks in advanced. Edit: Here's some sample output: Sending to client a: ./support/images/sw90.ppm This is fd 4 Error: : Socket operation on non-socket Sending to client a: ./support/images/sw91.ppm This is fd 4 Error: : Socket operation on non-socket Sending ./support/images/sw92.ppm This is fd 4 I am hhere2 Error: : Socket operation on non-socket My dispatcher has defeated evil Sample with 2 clients (client b was serviced first) Sending to client b: ./support/images/sw87.ppm This is fd 6 Error: : Success Sending to client b: ./support/images/sw88.ppm This is fd 6 Error: : Success Sending to client b: ./support/images/sw89.ppm This is fd 6 Error: : Success This is fd 6 Error: : Bad file descriptor Sending to client a: ./support/images/sw85.ppm This is fd 6 Error: As you can see, who ever is serviced first in this instance can open the files, but not the 2nd person. Edit2: Full code. Sorry, its pretty long and terribly formatted. #include <netinet/in.h> #include <netinet/in.h> #include <netdb.h> #include <arpa/inet.h> #include <sys/types.h> #include <sys/socket.h> #include <errno.h> #include <stdio.h> #include <unistd.h> #include <pthread.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include "ring.h" /* Version 1 Here is what is implemented so far: The threads are created from the arguments specified (number of threads that is) The server will lock and update variables based on how many clients are in the system and such. The socket that is opened when a new client connects, must be passed to the threads. To do this, we need some sort of global array. I did this by specifying an int client and main_pool_busy, and two pointers poolsockets and nonpoolsockets. My thinking on this was that when a new client enters the system, the server thread increments the variable client. When a thread is finished with this client (after it sends it the data), the thread will decrement client and close the socket. HTTP servers act this way sometimes (they terminate the socket as soon as one transmission is sent). *Note down at bottom After the server portion increments the client counter, we must open up a new socket (denoted by new_sd) and get this value to the appropriate thread. To do this, I created global array poolsockets, which will hold all the socket descriptors for our pooled threads. The server portion gets the new socket descriptor, and places the value in the first spot of the array that has a 0. We only place a value in this array IF: 1. The variable main_pool_busy < worknum (If we have more clients in the system than in our pool, it doesn't mean we should always create a new thread. At the end of this, the server signals on the condition variable clientin that a new client has arrived. In our pooled thread, we then must walk this array and check the array until we hit our first non-zero value. This is the socket we will give to that thread. The thread then changes the array to have a zero here. What if our all threads in our pool our busy? If this is the case, then we will know it because our threads in this pool will increment main_pool_busy by one when they are working on a request and decrement it when they are done. If main_pool_busy >= worknum, then we must dynamically create a new thread. Then, we must realloc the size of our nonpoolsockets array by 1 int. We then add the new socket descriptor to our pool. Here's what we need to figure out: NOTE* Each worker should generate 100 messages which specify the worker thread ID, client socket descriptor and a copy of the client message. Additionally, each message should include a message number, starting from 0 and incrementing for each subsequent message sent to the same client. I don't know how to keep track of how many messages were to the same client. Maybe we shouldn't close the socket descriptor, but rather keep an array of structs for each socket that includes how many messages they have been sent. Then, the server adds the struct, the threads remove it, then the threads add it back once they've serviced one request (unless the count is 100). ------------------------------------------------------------- CHANGES Version 1 ---------- NONE: this is the first version. */ #define MAXSLOTS 30 #define dis_m 15 //problems with dis_m ==1 //Function prototypes void inc_clients(); void init_mutex_stuff(pthread_t*, pthread_t*); void *threadpool(void *); void server(int); void add_to_socket_pool(int); void inc_busy(); void dec_busy(); void *dispatcher(); void create_message(long, int, int, char *, char *); void init_ring(); void add_to_ring(char *, char *, int, int, int); int socket_from_string(char *); void add_to_head(char *); void add_to_tail(char *); struct message * reorder(struct message *, struct message *, int); int get_threadid(char *); void delete_socket_messages(int); struct message * merge(struct message *, struct message *, int); int get_request(char *, char *, char*); ///////////////////// //Global mutexes and condition variables pthread_mutex_t startservice; pthread_mutex_t numclients; pthread_mutex_t pool_sockets; pthread_mutex_t nonpool_sockets; pthread_mutex_t m_pool_busy; pthread_mutex_t slots; pthread_mutex_t numm; pthread_mutex_t circ; pthread_cond_t clientin; pthread_cond_t m; /////////////////////////////////////// //Global variables int clients; int main_pool_busy; int * poolsockets, nonpoolsockets; int worknum; struct ring mqueue; /////////////////////////////////////// int main(int argc, char ** argv){ //error handling if not enough arguments to program if(argc != 3){ printf("Not enough arguments to server: ./server portnum NumThreadsinPool\n"); _exit(-1); } //Convert arguments from strings to integer values int port = atoi(argv[1]); worknum = atoi(argv[2]); //Start server portion server(port); } /////////////////////////////////////////////////////////////////////////////////////////////// //The listen server thread///////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// void server(int port){ int sd, new_sd; struct sockaddr_in name, cli_name; int sock_opt_val = 1; int cli_len; pthread_t threads[worknum]; //create our pthread id array pthread_t dis[1]; //create our dispatcher array (necessary to create thread) init_mutex_stuff(threads, dis); //initialize mutexes and stuff //Server setup /////////////////////////////////////////////////////// if ((sd = socket (AF_INET, SOCK_STREAM, 0)) < 0) { perror("(servConn): socket() error"); _exit (-1); } if (setsockopt (sd, SOL_SOCKET, SO_REUSEADDR, (char *) &sock_opt_val, sizeof(sock_opt_val)) < 0) { perror ("(servConn): Failed to set SO_REUSEADDR on INET socket"); _exit (-1); } name.sin_family = AF_INET; name.sin_port = htons (port); name.sin_addr.s_addr = htonl(INADDR_ANY); if (bind (sd, (struct sockaddr *)&name, sizeof(name)) < 0) { perror ("(servConn): bind() error"); _exit (-1); } listen (sd, 5); //End of server Setup ////////////////////////////////////////////////// for (;;) { cli_len = sizeof (cli_name); new_sd = accept (sd, (struct sockaddr *) &cli_name, &cli_len); printf ("Assigning new socket descriptor: %d\n", new_sd); inc_clients(); //New client has come in, increment clients add_to_socket_pool(new_sd); //Add client to the pool of sockets if (new_sd < 0) { perror ("(servConn): accept() error"); _exit (-1); } } pthread_exit(NULL); //Quit } //Adds the new socket to the array designated for pthreads in the pool void add_to_socket_pool(int socket){ pthread_mutex_lock(&m_pool_busy); //Lock so that we can check main_pool_busy int i; //If not all our main pool is busy, then allocate to one of them if(main_pool_busy < worknum){ pthread_mutex_unlock(&m_pool_busy); //unlock busy, we no longer need to hold it pthread_mutex_lock(&pool_sockets); //Lock the socket pool array so that we can edit it without worry for(i = 0; i < worknum; i++){ //Find a poolsocket that is -1; then we should put the real socket there. This value will be changed back to -1 when the thread grabs the sockfd if(poolsockets[i] == -1){ poolsockets[i] = socket; pthread_mutex_unlock(&pool_sockets); //unlock our pool array, we don't need it anymore inc_busy(); //Incrememnt busy (locks the mutex itself) pthread_cond_signal(&clientin); //Signal first thread waiting on a client that a client needs to be serviced break; } } } else{ //Dynamic thread creation goes here pthread_mutex_unlock(&m_pool_busy); } } //Increments the client number. If client number goes over worknum, we must dynamically create new pthreads void inc_clients(){ pthread_mutex_lock(&numclients); clients++; pthread_mutex_unlock(&numclients); } //Increments busy void inc_busy(){ pthread_mutex_lock(&m_pool_busy); main_pool_busy++; pthread_mutex_unlock(&m_pool_busy); } //Initialize all of our mutexes at the beginning and create our pthreads void init_mutex_stuff(pthread_t * threads, pthread_t * dis){ pthread_mutex_init(&startservice, NULL); pthread_mutex_init(&numclients, NULL); pthread_mutex_init(&pool_sockets, NULL); pthread_mutex_init(&nonpool_sockets, NULL); pthread_mutex_init(&m_pool_busy, NULL); pthread_mutex_init(&circ, NULL); pthread_cond_init (&clientin, NULL); main_pool_busy = 0; poolsockets = malloc(sizeof(int)*worknum); int threadreturn; //error checking variables long i = 0; //Loop and create pthreads for(i; i < worknum; i++){ threadreturn = pthread_create(&threads[i], NULL, threadpool, (void *) i); poolsockets[i] = -1; if(threadreturn){ perror("Thread pool created unsuccessfully"); _exit(-1); } } pthread_create(&dis[0], NULL, dispatcher, NULL); } ////////////////////////////////////////////////////////////////////////////////////////// /////////Main pool routines ///////////////////////////////////////////////////////////////////////////////////////// void dec_busy(){ pthread_mutex_lock(&m_pool_busy); main_pool_busy--; pthread_mutex_unlock(&m_pool_busy); } void dec_clients(){ pthread_mutex_lock(&numclients); clients--; pthread_mutex_unlock(&numclients); } //This is what our threadpool pthreads will be running. void *threadpool(void * threadid){ long id = (long) threadid; //Id of this thread int i; int socket; int counter = 0; //Try and gain access to the next client that comes in and wait until server signals that a client as arrived while(1){ pthread_mutex_lock(&startservice); //lock start service (required for cond wait) pthread_cond_wait(&clientin, &startservice); //wait for signal from server that client exists pthread_mutex_unlock(&startservice); //unlock mutex. pthread_mutex_lock(&pool_sockets); //Lock the pool socket so we can get the socket fd unhindered/interrupted for(i = 0; i < worknum; i++){ if(poolsockets[i] != -1){ socket = poolsockets[i]; poolsockets[i] = -1; pthread_mutex_unlock(&pool_sockets); } } printf("Thread #%d is past getting the socket\n", id); int incoming = 1; while(counter < 100 && incoming != 0){ char buffer[512]; bzero(buffer,512); int startcounter = 0; incoming = read(socket, buffer, 512); if(buffer[0] != 0){ //client ID:priority:request:arguments char id[100]; long prior; char request[100]; char arg1[100]; char message[100]; char arg2[100]; char * point; point = strtok(buffer, ":"); strcpy(id, point); point = strtok(NULL, ":"); prior = atoi(point); point = strtok(NULL, ":"); strcpy(request, point); point = strtok(NULL, ":"); strcpy(arg1, point); point = strtok(NULL, ":"); if(point != NULL){ strcpy(arg2, point); } int fd; if(strcmp(request, "start_movie") == 0){ int count = 1; while(count <= 100){ char temp[10]; snprintf(temp, 50, "%d\0", count); strcpy(message, "./support/images/"); strcat(message, arg1); strcat(message, temp); strcat(message, ".ppm"); printf("This is message %s to %s\n", message, id); count++; add_to_ring(message, id, prior, counter, socket); //Adds our created message to the ring counter++; } printf("I'm out of the loop\n"); } else if(strcmp(request, "seek_movie") == 0){ int count = atoi(arg2); while(count <= 100){ char temp[10]; snprintf(temp, 10, "%d\0", count); strcpy(message, "./support/images/"); strcat(message, arg1); strcat(message, temp); strcat(message, ".ppm"); printf("This is message %s\n", message); count++; } } //create_message(id, socket, counter, buffer, message); //Creates our message from the input from the client. Stores it in buffer } else{ delete_socket_messages(socket); break; } } counter = 0; close(socket);//Zero out counter again } dec_clients(); //client serviced, decrement clients dec_busy(); //thread finished, decrement busy } //Creates a message void create_message(long threadid, int socket, int counter, char * buffer, char * message){ snprintf(message, strlen(buffer)+15, "%d:%d:%d:%s", threadid, socket, counter, buffer); } //Gets the socket from the message string (maybe I should just pass in the socket to another method) int socket_from_string(char * message){ char * substr1 = strstr(message, ":"); char * substr2 = substr1; substr2++; int occurance = strcspn(substr2, ":"); char sock[10]; strncpy(sock, substr2, occurance); return atoi(sock); } //Adds message to our ring buffer's head void add_to_head(char * message){ printf("Adding to head of ring\n"); mqueue.head->message = malloc(strlen(message)+1); //Allocate space for message strcpy(mqueue.head->message, message); //copy bytes into allocated space } //Adds our message to our ring buffer's tail void add_to_tail(char * message){ printf("Adding to tail of ring\n"); mqueue.tail->message = malloc(strlen(message)+1); //allocate space for message strcpy(mqueue.tail->message, message); //copy bytes into allocated space mqueue.tail->next = malloc(sizeof(struct message)); //allocate space for the next message struct } //Adds a message to our ring void add_to_ring(char * message, char * id, int prior, int mnum, int socket){ //printf("This is message %s:" , message); pthread_mutex_lock(&circ); //Lock the ring buffer pthread_mutex_lock(&numm); //Lock the message count (will need this to make sure we can't fill the buffer over the max slots) if(mqueue.head->message == NULL){ add_to_head(message); //Adds it to head mqueue.head->socket = socket; //Set message socket mqueue.head->priority = prior; //Set its priority (thread id) mqueue.head->mnum = mnum; //Set its message number (used for sorting) mqueue.head->id = malloc(sizeof(id)); strcpy(mqueue.head->id, id); } else if(mqueue.tail->message == NULL){ //This is the problem for dis_m 1 I'm pretty sure add_to_tail(message); mqueue.tail->socket = socket; mqueue.tail->priority = prior; mqueue.tail->mnum = mnum; mqueue.tail->id = malloc(sizeof(id)); strcpy(mqueue.tail->id, id); } else{ mqueue.tail->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.tail->next; add_to_tail(message); mqueue.tail->socket = socket; mqueue.tail->priority = prior; mqueue.tail->mnum = mnum; mqueue.tail->id = malloc(sizeof(id)); strcpy(mqueue.tail->id, id); } mqueue.mcount++; pthread_mutex_unlock(&circ); if(mqueue.mcount >= dis_m){ pthread_mutex_unlock(&numm); pthread_cond_signal(&m); } else{ pthread_mutex_unlock(&numm); } printf("out of add to ring\n"); fflush(stdout); } ////////////////////////////////// //Dispatcher routines ///////////////////////////////// void *dispatcher(){ init_ring(); while(1){ pthread_mutex_lock(&slots); pthread_cond_wait(&m, &slots); pthread_mutex_lock(&numm); pthread_mutex_lock(&circ); printf("Dispatcher to the rescue!\n"); mqueue.head = reorder(mqueue.head, mqueue.tail, mqueue.mcount); //printf("This is the head %s\n", mqueue.head->message); //printf("This is the tail %s\n", mqueue.head->message); fflush(stdout); struct message * pointer = mqueue.head; int count = 0; while(mqueue.head != mqueue.tail && count < dis_m){ printf("Sending to client %s: %s\n", pointer->id, pointer->message); int fd; fd = open(pointer->message, O_RDONLY); char buf[58368]; int bytesRead; printf("This is fd %d\n", fd); bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); fflush(stdout); close(fd); mqueue.mcount--; mqueue.head = mqueue.head->next; free(pointer->message); free(pointer); pointer = mqueue.head; count++; } printf("Sending %s\n", pointer->message); int fd; fd = open(pointer->message, O_RDONLY); printf("This is fd %d\n", fd); printf("I am hhere2\n"); char buf[58368]; int bytesRead; bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); close(fd); mqueue.mcount--; if(mqueue.head != mqueue.tail){ mqueue.head = mqueue.head->next; } else{ mqueue.head->next = malloc(sizeof(struct message)); mqueue.head = mqueue.head->next; mqueue.head->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.head->next; mqueue.head->message = NULL; } free(pointer->message); free(pointer); pthread_mutex_unlock(&numm); pthread_mutex_unlock(&circ); pthread_mutex_unlock(&slots); printf("My dispatcher has defeated evil\n"); } } void init_ring(){ mqueue.head = malloc(sizeof(struct message)); mqueue.head->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.head->next; mqueue.mcount = 0; } struct message * reorder(struct message * begin, struct message * end, int num){ //printf("I am reordering for size %d\n", num); fflush(stdout); int i; if(num == 1){ //printf("Begin: %s\n", begin->message); begin->next = NULL; return begin; } else{ struct message * left = begin; struct message * right; int middle = num/2; for(i = 1; i < middle; i++){ left = left->next; } right = left -> next; left -> next = NULL; //printf("Begin: %s\nLeft: %s\nright: %s\nend:%s\n", begin->message, left->message, right->message, end->message); left = reorder(begin, left, middle); if(num%2 != 0){ right = reorder(right, end, middle+1); } else{ right = reorder(right, end, middle); } return merge(left, right, num); } } struct message * merge(struct message * left, struct message * right, int num){ //printf("I am merginging! left: %s %d, right: %s %dnum: %d\n", left->message,left->priority, right->message, right->priority, num); struct message * start, * point; int lenL= 0; int lenR = 0; int flagL = 0; int flagR = 0; int count = 0; int middle1 = num/2; int middle2; if(num%2 != 0){ middle2 = middle1+1; } else{ middle2 = middle1; } while(lenL < middle1 && lenR < middle2){ count++; //printf("In here for count %d\n", count); if(lenL == 0 && lenR == 0){ if(left->priority < right->priority){ start = left; //Set the start point point = left; //set our enum; left = left->next; //move the left pointer point->next = NULL; //Set the next node to NULL lenL++; } else if(left->priority > right->priority){ start = right; point = right; right = right->next; point->next = NULL; lenR++; } else{ if(left->mnum < right->mnum){ ////printf("This is where we are\n"); start = left; //Set the start point point = left; //set our enum; left = left->next; //move the left pointer point->next = NULL; //Set the next node to NULL lenL++; } else{ start = right; point = right; right = right->next; point->next = NULL; lenR++; } } } else{ if(left->priority < right->priority){ point->next = left; left = left->next; //move the left pointer point = point->next; point->next = NULL; //Set the next node to NULL lenL++; } else if(left->priority > right->priority){ point->next = right; right = right->next; point = point->next; point->next = NULL; lenR++; } else{ if(left->mnum < right->mnum){ point->next = left; //set our enum; left = left->next; point = point->next;//move the left pointer point->next = NULL; //Set the next node to NULL lenL++; } else{ point->next = right; right = right->next; point = point->next; point->next = NULL; lenR++; } } } if(lenL == middle1){ flagL = 1; break; } if(lenR == middle2){ flagR = 1; break; } } if(flagL == 1){ point->next = right; point = point->next; for(lenR; lenR< middle2-1; lenR++){ point = point->next; } point->next = NULL; mqueue.tail = point; } else{ point->next = left; point = point->next; for(lenL; lenL< middle1-1; lenL++){ point = point->next; } point->next = NULL; mqueue.tail = point; } //printf("This is the start %s\n", start->message); //printf("This is mqueue.tail %s\n", mqueue.tail->message); return start; } void delete_socket_messages(int a){ }

Read the article

Search Results

Search found 1695 results on 68 pages for 'drew gain'.

Page 68/68 | < Previous Page | 64 65 66 67 68

- by James Michael Hare

- by SubinDaniVarughese

- by ErikE

- by Emtucifor

- by rajesh ahuja

- by Ken

- by Georgek

- by Kooper

- by Rick Strahl

- by Rick Strahl

- by FransBouma

- by Michael Williamson

- by Paul White

- by Ralf Westphal

- by Alex Dunlop

- by SoleSoft

- by Ralf Westphal

- by rchrd

- by David Miler

- by Magn3s1um

< Previous Page | 64 65 66 67 68