Search Results

Search found 890 results on 36 pages for 'technorati'.

Page 36/36 | < Previous Page | 32 33 34 35 36 

  • C#/.NET Little Wonders: Constraining Generics with Where Clause

    - by James Michael Hare
    Back when I was primarily a C++ developer, I loved C++ templates.  The power of writing very reusable generic classes brought the art of programming to a brand new level.  Unfortunately, when .NET 1.0 came about, they didn’t have a template equivalent.  With .NET 2.0 however, we finally got generics, which once again let us spread our wings and program more generically in the world of .NET However, C# generics behave in some ways very differently from their C++ template cousins.  There is a handy clause, however, that helps you navigate these waters to make your generics more powerful. The Problem – C# Assumes Lowest Common Denominator In C++, you can create a template and do nearly anything syntactically possible on the template parameter, and C++ will not check if the method/fields/operations invoked are valid until you declare a realization of the type.  Let me illustrate with a C++ example: 1: // compiles fine, C++ makes no assumptions as to T 2: template <typename T> 3: class ReverseComparer 4: { 5: public: 6: int Compare(const T& lhs, const T& rhs) 7: { 8: return rhs.CompareTo(lhs); 9: } 10: }; Notice that we are invoking a method CompareTo() off of template type T.  Because we don’t know at this point what type T is, C++ makes no assumptions and there are no errors. C++ tends to take the path of not checking the template type usage until the method is actually invoked with a specific type, which differs from the behavior of C#: 1: // this will NOT compile! C# assumes lowest common denominator. 2: public class ReverseComparer<T> 3: { 4: public int Compare(T lhs, T rhs) 5: { 6: return lhs.CompareTo(rhs); 7: } 8: } So why does C# give us a compiler error even when we don’t yet know what type T is?  This is because C# took a different path in how they made generics.  Unless you specify otherwise, for the purposes of the code inside the generic method, T is basically treated like an object (notice I didn’t say T is an object). That means that any operations, fields, methods, properties, etc that you attempt to use of type T must be available at the lowest common denominator type: object.  Now, while object has the broadest applicability, it also has the fewest specific.  So how do we allow our generic type placeholder to do things more than just what object can do? Solution: Constraint the Type With Where Clause So how do we get around this in C#?  The answer is to constrain the generic type placeholder with the where clause.  Basically, the where clause allows you to specify additional constraints on what the actual type used to fill the generic type placeholder must support. You might think that narrowing the scope of a generic means a weaker generic.  In reality, though it limits the number of types that can be used with the generic, it also gives the generic more power to deal with those types.  In effect these constraints says that if the type meets the given constraint, you can perform the activities that pertain to that constraint with the generic placeholders. Constraining Generic Type to Interface or Superclass One of the handiest where clause constraints is the ability to specify the type generic type must implement a certain interface or be inherited from a certain base class. For example, you can’t call CompareTo() in our first C# generic without constraints, but if we constrain T to IComparable<T>, we can: 1: public class ReverseComparer<T> 2: where T : IComparable<T> 3: { 4: public int Compare(T lhs, T rhs) 5: { 6: return lhs.CompareTo(rhs); 7: } 8: } Now that we’ve constrained T to an implementation of IComparable<T>, this means that our variables of generic type T may now call any members specified in IComparable<T> as well.  This means that the call to CompareTo() is now legal. If you constrain your type, also, you will get compiler warnings if you attempt to use a type that doesn’t meet the constraint.  This is much better than the syntax error you would get within C++ template code itself when you used a type not supported by a C++ template. Constraining Generic Type to Only Reference Types Sometimes, you want to assign an instance of a generic type to null, but you can’t do this without constraints, because you have no guarantee that the type used to realize the generic is not a value type, where null is meaningless. Well, we can fix this by specifying the class constraint in the where clause.  By declaring that a generic type must be a class, we are saying that it is a reference type, and this allows us to assign null to instances of that type: 1: public static class ObjectExtensions 2: { 3: public static TOut Maybe<TIn, TOut>(this TIn value, Func<TIn, TOut> accessor) 4: where TOut : class 5: where TIn : class 6: { 7: return (value != null) ? accessor(value) : null; 8: } 9: } In the example above, we want to be able to access a property off of a reference, and if that reference is null, pass the null on down the line.  To do this, both the input type and the output type must be reference types (yes, nullable value types could also be considered applicable at a logical level, but there’s not a direct constraint for those). Constraining Generic Type to only Value Types Similarly to constraining a generic type to be a reference type, you can also constrain a generic type to be a value type.  To do this you use the struct constraint which specifies that the generic type must be a value type (primitive, struct, enum, etc). Consider the following method, that will convert anything that is IConvertible (int, double, string, etc) to the value type you specify, or null if the instance is null. 1: public static T? ConvertToNullable<T>(IConvertible value) 2: where T : struct 3: { 4: T? result = null; 5:  6: if (value != null) 7: { 8: result = (T)Convert.ChangeType(value, typeof(T)); 9: } 10:  11: return result; 12: } Because T was constrained to be a value type, we can use T? (System.Nullable<T>) where we could not do this if T was a reference type. Constraining Generic Type to Require Default Constructor You can also constrain a type to require existence of a default constructor.  Because by default C# doesn’t know what constructors a generic type placeholder does or does not have available, it can’t typically allow you to call one.  That said, if you give it the new() constraint, it will mean that the type used to realize the generic type must have a default (no argument) constructor. Let’s assume you have a generic adapter class that, given some mappings, will adapt an item from type TFrom to type TTo.  Because it must create a new instance of type TTo in the process, we need to specify that TTo has a default constructor: 1: // Given a set of Action<TFrom,TTo> mappings will map TFrom to TTo 2: public class Adapter<TFrom, TTo> : IEnumerable<Action<TFrom, TTo>> 3: where TTo : class, new() 4: { 5: // The list of translations from TFrom to TTo 6: public List<Action<TFrom, TTo>> Translations { get; private set; } 7:  8: // Construct with empty translation and reverse translation sets. 9: public Adapter() 10: { 11: // did this instead of auto-properties to allow simple use of initializers 12: Translations = new List<Action<TFrom, TTo>>(); 13: } 14:  15: // Add a translator to the collection, useful for initializer list 16: public void Add(Action<TFrom, TTo> translation) 17: { 18: Translations.Add(translation); 19: } 20:  21: // Add a translator that first checks a predicate to determine if the translation 22: // should be performed, then translates if the predicate returns true 23: public void Add(Predicate<TFrom> conditional, Action<TFrom, TTo> translation) 24: { 25: Translations.Add((from, to) => 26: { 27: if (conditional(from)) 28: { 29: translation(from, to); 30: } 31: }); 32: } 33:  34: // Translates an object forward from TFrom object to TTo object. 35: public TTo Adapt(TFrom sourceObject) 36: { 37: var resultObject = new TTo(); 38:  39: // Process each translation 40: Translations.ForEach(t => t(sourceObject, resultObject)); 41:  42: return resultObject; 43: } 44:  45: // Returns an enumerator that iterates through the collection. 46: public IEnumerator<Action<TFrom, TTo>> GetEnumerator() 47: { 48: return Translations.GetEnumerator(); 49: } 50:  51: // Returns an enumerator that iterates through a collection. 52: IEnumerator IEnumerable.GetEnumerator() 53: { 54: return GetEnumerator(); 55: } 56: } Notice, however, you can’t specify any other constructor, you can only specify that the type has a default (no argument) constructor. Summary The where clause is an excellent tool that gives your .NET generics even more power to perform tasks higher than just the base "object level" behavior.  There are a few things you cannot specify with constraints (currently) though: Cannot specify the generic type must be an enum. Cannot specify the generic type must have a certain property or method without specifying a base class or interface – that is, you can’t say that the generic must have a Start() method. Cannot specify that the generic type allows arithmetic operations. Cannot specify that the generic type requires a specific non-default constructor. In addition, you cannot overload a template definition with different, opposing constraints.  For example you can’t define a Adapter<T> where T : struct and Adapter<T> where T : class.  Hopefully, in the future we will get some of these things to make the where clause even more useful, but until then what we have is extremely valuable in making our generics more user friendly and more powerful!   Technorati Tags: C#,.NET,Little Wonders,BlackRabbitCoder,where,generics

    Read the article

  • Xml Serialization and the [Obsolete] Attribute

    - by PSteele
    I learned something new today: Starting with .NET 3.5, the XmlSerializer no longer serializes properties that are marked with the Obsolete attribute.  I can’t say that I really agree with this.  Marking something Obsolete is supposed to be something for a developer to deal with in source code.  Once an object is serialized to XML, it becomes data.  I think using the Obsolete attribute as both a compiler flag as well as controlling XML serialization is a bad idea. In this post, I’ll show you how I ran into this and how I got around it. The Setup Let’s start with some make-believe code to demonstrate the issue.  We have a simple data class for storing some information.  We use XML serialization to read and write the data: public class MyData { public int Age { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public List<String> Hobbies { get; set; }   public MyData() { this.Hobbies = new List<string>(); } } Now a few simple lines of code to serialize it to XML: static void Main(string[] args) { var data = new MyData {    FirstName = "Zachary", LastName = "Smith", Age = 50, Hobbies = {"Mischief", "Sabotage"}, }; var serializer = new XmlSerializer(typeof (MyData)); serializer.Serialize(Console.Out, data); Console.ReadKey(); } And this is what we see on the console: <?xml version="1.0" encoding="IBM437"?> <MyData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Age>50</Age> <FirstName>Zachary</FirstName> <LastName>Smith</LastName> <Hobbies> <string>Mischief</string> <string>Sabotage</string> </Hobbies> </MyData>   The Change So we decided to track the hobbies as a list of strings.  As always, things change and we have more information we need to store per-hobby.  We create a custom “Hobby” object, add a List<Hobby> to our MyData class and we obsolete the old “Hobbies” list to let developers know they shouldn’t use it going forward: public class Hobby { public string Name { get; set; } public int Frequency { get; set; } public int TimesCaught { get; set; }   public override string ToString() { return this.Name; } } public class MyData { public int Age { get; set; } public string FirstName { get; set; } public string LastName { get; set; } [Obsolete("Use HobbyData collection instead.")] public List<String> Hobbies { get; set; } public List<Hobby> HobbyData { get; set; }   public MyData() { this.Hobbies = new List<string>(); this.HobbyData = new List<Hobby>(); } } Here’s the kicker: This serialization is done in another application.  The consumers of the XML will be older clients (clients that expect only a “Hobbies” collection) as well as newer clients (that support the new “HobbyData” collection).  This really shouldn’t be a problem – the obsolete attribute is metadata for .NET compilers.  Unfortunately, the XmlSerializer also looks at the compiler attribute to determine what items to serialize/deserialize.  Here’s an example of our problem: static void Main(string[] args) { var xml = @"<?xml version=""1.0"" encoding=""IBM437""?> <MyData xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema""> <Age>50</Age> <FirstName>Zachary</FirstName> <LastName>Smith</LastName> <Hobbies> <string>Mischief</string> <string>Sabotage</string> </Hobbies> </MyData>"; var serializer = new XmlSerializer(typeof(MyData)); var stream = new StringReader(xml); var data = (MyData) serializer.Deserialize(stream);   if( data.Hobbies.Count != 2) { throw new ApplicationException("Hobbies did not deserialize properly"); } } If you run the code above, you’ll hit the exception.  Even though the XML contains a “<Hobbies>” node, the obsolete attribute prevents the node from being processed.  This will break old clients that use the new library, but don’t yet access the HobbyData collection. The Fix This fix (in this case), isn’t too painful.  The XmlSerializer exposes events for times when it runs into items (Elements, Attributes, Nodes, etc…) it doesn’t know what to do with.  We can hook in to those events and check and see if we’re getting something that we want to support (like our “Hobbies” node). Here’s a way to read in the old XML data with full support of the new data structure (and keeping the Hobbies collection marked as obsolete): static void Main(string[] args) { var xml = @"<?xml version=""1.0"" encoding=""IBM437""?> <MyData xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema""> <Age>50</Age> <FirstName>Zachary</FirstName> <LastName>Smith</LastName> <Hobbies> <string>Mischief</string> <string>Sabotage</string> </Hobbies> </MyData>"; var serializer = new XmlSerializer(typeof(MyData)); serializer.UnknownElement += serializer_UnknownElement; var stream = new StringReader(xml); var data = (MyData)serializer.Deserialize(stream);   if (data.Hobbies.Count != 2) { throw new ApplicationException("Hobbies did not deserialize properly"); } }   static void serializer_UnknownElement(object sender, XmlElementEventArgs e) { if( e.Element.Name != "Hobbies") { return; }   var target = (MyData) e.ObjectBeingDeserialized; foreach(XmlElement hobby in e.Element.ChildNodes) { target.Hobbies.Add(hobby.InnerText); target.HobbyData.Add(new Hobby{Name = hobby.InnerText}); } } As you can see, we hook in to the “UnknownElement” event.  Once we determine it’s our “Hobbies” node, we deserialize it ourselves – as well as populating the new HobbyData collection.  In this case, we have a fairly simple solution to a small change in XML layout.  If you make more extensive changes, it would probably be easier to do some custom serialization to support older data. A sample project with all of this code is available from my repository on bitbucket. Technorati Tags: XmlSerializer,Obsolete,.NET

    Read the article

  • The Product Owner

    - by Robert May
    In a previous post, I outlined the rules of Scrum.  This post details one of those rules. Picking a most important part of Scrum is difficult.  All of the rules are required, but if there were one rule that is “more” required that every other rule, its having a good Product Owner.  Simply put, the Product Owner can make or break the project. Duties of the Product Owner A Product Owner has many duties and responsibilities.  I’ll talk about each of these duties in detail below. A Product Owner: Discovers and records stories for the backlog. Prioritizes stories in the Product Backlog, Release Backlog and Iteration Backlog. Determines Release dates and Iteration Dates. Develops story details and helps the team understand those details. Helps QA to develop acceptance tests. Interact with the Customer to make sure that the product is meeting the customer’s needs. Discovers and Records Stories for the Backlog When I do Scrum, I always use User Stories as the means for capturing functionality that’s required in the system.  Some people will use Use Cases, but the same rule applies.  The Product Owner has the ultimate responsibility for figuring out what functionality will be in the system.  Many different mechanisms for capturing this input can be used.  User interviews are great, but all sources should be considered, including talking with Customer Support types.  Often, they hear what users are struggling with the most and are a great source for stories that can make the application easier to use. Care should be taken when soliciting user stories from technical types such as programmers and the people that manage them.  They will almost always give stories that are very technical in nature and may not have a direct benefit for the end user.  Stories are about adding value to the company.  If the stories don’t have direct benefit to the end user, the Product Owner should question whether or not the story should be implemented.  In general, technical stories should be included as tasks in User Stories.  Technical stories are often needed, but the ultimate value to the user is in user based functionality, so technical stories should be considered nothing more than overhead in providing that user functionality. Until the iteration prior to development, stories should be nothing more than short, one line placeholders. An exercise called Story Planning can be used to brainstorm and come up with stories.  I’ll save the description of this activity for another blog post. For more information on User Stories, please read the book User Stories Applied by Mike Cohn. Prioritizes Stories in the Product Backlog, Release Backlog and Iteration Backlog Prioritization of stories is one of the most difficult tasks that a Product Owner must do.  A key concept of Scrum done right is the need to have the team working from a single set of prioritized stories.  If the team does not have a single set of prioritized stories, Scrum will likely fail at your organization.  The Product Owner is the ONLY person who has the responsibility to prioritize that list.  The Product Owner must be very diplomatic and sincerely listen to the people around him so that he can get the priorities correct. Just listening will still not yield the proper priorities.  Care must also be taken to ensure that Return on Investment is also considered.  Ultimately, determining which stories give the most value to the company for the least cost is the most important factor in determining priorities.  Product Owners should be willing to look at cold, hard numbers to determine the order for stories.  Even when many people want a feature, if that features is costly to develop, it may not have as high of a return on investment as features that are cheaper, but not as popular. The act of prioritization often causes conflict in an environment.  Customer Service thinks that feature X is the most important, because it will stop people from calling.  Operations thinks that feature Y is the most important, because it will stop servers from crashing.  Developers think that feature Z is most important because it will make writing software much easier for them.  All of these are useful goals, but the team can have only one list of items, and each item must have a priority that is different from all other stories.  The Product Owner will determine which feature gives the best return on investment and the other features will have to wait their turn, which means that someone will not have their top priority feature implemented first. A weak Product Owner will refuse to do prioritization.  I’ve heard from multiple Product Owners the following phrase, “Well, it’s all got to be done, so what does it matter what order we do it in?”  If your product owner is using this phrase, you need a new Product Owner.  Order is VERY important.  In Scrum, every release is potentially shippable.  If the wrong priority items are developed, then the value added in each release isn’t what it should be.  Additionally, the Product Owner with this mindset doesn’t understand Agile.  A product is NEVER finished, until the company has decided that it is no longer a going concern and they are no longer going to sell the product.  Therefore, prioritization isn’t an event, its something that continues every day.  The logical extension of the phrase “It’s all got to be done” is that you will never ship your product, since a product is never “done.”  Once stories have been prioritized, assigning them to the Release Backlog and the Iteration Backlog becomes relatively simple.  The top priority items are copied into the respective backlogs in order and the task is complete.  The team does have the right to shuffle things around a little in the iteration backlog.  For example, they may determine that working on story C with story A is appropriate because they’re related, even though story B is technically a higher priority than story C.  Or they may decide that story B is too big to complete in the time available after Story A has tasks created, so they’ll work on Story C since it’s smaller.  They can’t, however, go deep into the backlog to pick stories to implement.  The team and the Product Owner should work together to determine what’s best for the company. Prioritization is time consuming, but its one of the most important things a Product Owner does. Determines Release Dates and Iteration Dates Product owners are responsible for determining release dates for a product.  A common misconception that Product Owners have is that every “release” needs to correspond with an actual release to customers.  This is not the case.  In general, releases should be no more than 3 months long.  You  may decide to release the product to the customers, and many companies do release the product to customers, but it may also be an internal release. If a release date is too far away, developers will fall into the trap of not feeling a sense of urgency.  The date is far enough away that they don’t need to give the release their full attention.  Additionally, important tasks, such as performance tuning, regression testing, user documentation, and release preparation, will not happen regularly, making them much more difficult and time consuming to do.  The more frequently you do these tasks, the easier they are to accomplish. The Product Owner will be a key participant in determining whether or not a release should be sent out to the customers.  The determination should be made on whether or not the features contained in the release are valuable enough  and complete enough that the customers will see real value in the release.  Often, some features will take more than three months to get them to a state where they qualify for a release or need additional supporting features to be released.  The product owner has the right to make this determination. In addition to release dates, the Product Owner also will help determine iteration dates.  In general, an iteration length should be chosen and the team should follow that iteration length for an extended period of time.  If the iteration length is changed every iteration, you’re not doing Scrum.  Iteration lengths help the team and company get into a rhythm of developing quality software.  Iterations should be somewhere between 2 and 4 weeks in length.  Any shorter, and significant software will likely not be developed.  Any longer, and the team won’t feel urgency and planning will become very difficult. Iterations may not be extended during the iteration.  Companies where Scrum isn’t really followed will often use this as a strategy to complete all stories.  They don’t want to face the harsh reality of what their true performance is, and looking good is more important than seeking visibility and improving the process and team.  Companies like this typically don’t allow failure.  This is unhealthy.  Failure is part of life and unless we learn from it, we can’t improve.  I would much rather see a team push out stories to the next iteration and then have healthy discussions about why they failed rather than extend the iteration and not deal with the core problems. If iteration length varies, retrospectives become more difficult.  For example, evaluating the performance of the team’s estimation efforts becomes much more difficult if the iteration length varies.  Also, the team must have a velocity measurement.  If the iteration length varies, measuring velocity becomes impossible and upper management no longer will have the ability to evaluate the teams performance.  People external to the team will no longer have the ability to determine when key features are likely to be developed.  Variable iterations cause the entire company to fail and likely cause Scrum to fail at an organization. Develops Story Details and Helps the Team Understand Those Details A key concept in Scrum is that the stories are nothing more than a placeholder for a conversation.  Stories should be nothing more than short, one line statements about the functionality.  The team will then converse with the Product Owner about the details about that story.  The product owner needs to have a very good idea about what the details of the story are and needs to be able to help the team understand those details. Too often, we see this requirement as being translated into the need for comprehensive documentation about the story, including old fashioned requirements documentation.  The team should only develop the documentation that is required and should not develop documentation that is only created because their is a process to do so. In general, what we see that works best is the iteration before a team starts development work on a story, the Product Owner, with other appropriate business analysts, will develop the details of that story.  They’ll figure out what business rules are required, potentially make paper prototypes or other light weight mock-ups, and they seek to understand the story and what is implied.  Note that the time allowed for this task is deliberately short.  The Product Owner only has a single iteration to develop all of the stories for the next iteration. If more than one iteration is used, I’ve found that teams will end up with Big Design Up Front and traditional requirements documents.  This is a waste of time, since the team will need to then have discussions with the Product Owner to figure out what the requirements document says.  Instead of this, skip making the pretty pictures and detailing the nuances of the requirements and build only what is minimally needed by the team to do development.  If something comes up during development, you can address it at that time and figure out what you want to do.  The goal is to keep things as light weight as possible so that everyone can move as quickly as possible. Helps QA to Develop Acceptance Tests In Scrum, no story can be counted until it is accepted by QA.  Because of this, acceptance tests are very important to the team.  In general, acceptance tests need to be developed prior to the iteration or at the very beginning of the iteration so that the team can make sure that the tasks that they develop will fulfill the acceptance criteria. The Product Owner will help the team, including QA, understand what will make the story acceptable.  Note that the Product Owner needs to be careful about specifying that the feature will work “Perfectly” at the end of the iteration.  In general, features are developed a little bit at a time, so only the bit that is being developed should be considered as necessary for acceptance. A weak Product Owner will make statements like “Do it right the first time.”  Not only are these statements damaging to the team (like they would try to do it WRONG the first time . . .), they’re also ignoring the iterative nature of Scrum.  Additionally, a weak product owner will seek to add scope in the acceptance testing.  For example, they will refuse to determine acceptance at the beginning of the iteration, and then, after the team has planned and committed to the iteration, they will expand scope by defining acceptance.  This often causes the team to miss the iteration because scope that wasn’t planned on is included.  There are ways that the team can mitigate this problem.  For example, include extra “Product Owner” time to deal with the uncertainty that you know will be introduced by the Product Owner.  This will slow the perceived velocity of the team and is not ideal, since they’ll be doing more work than they get credit for. Interact with the Customer to Make Sure that the Product is Meeting the Customer’s Needs Once development is complete, what the team has worked on should be put in front of real live people to see if it meets the needs of the customer.  One of the great things about Agile is that if something doesn’t work, we can revisit it in a future iteration!  This frees up the team to make the best decision now and know that if that decision proves to be incorrect, the team can revisit it and change that decision. Features are about adding value to the customer, so if the customer doesn’t find them useful, then having the team make tweaks is valuable.  In general, most software will be 80 to 90 percent “right” after the initial round and only minor tweaks are required.  If proper coding standards are followed, these tweaks are usually minor and easy to accomplish.  Product Owners that are doing a good job will encourage real users to see and use the software, since they know that they are trying to add value to the customer. Poor product owners will think that they know the answers already, that their customers are silly and do stupid things and that they don’t need customer input.  If you have a product owner that is afraid to show the team’s work to real customers, you probably need a different product owner. Up Next, “Who Makes a Good Product Owner.” Followed by, “Messing with the Team.” Technorati Tags: Scrum,Product Owner

    Read the article

  • C#/.NET Little Wonders: The Joy of Anonymous Types

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. In the .NET 3 Framework, Microsoft introduced the concept of anonymous types, which provide a way to create a quick, compiler-generated types at the point of instantiation.  These may seem trivial, but are very handy for concisely creating lightweight, strongly-typed objects containing only read-only properties that can be used within a given scope. Creating an Anonymous Type In short, an anonymous type is a reference type that derives directly from object and is defined by its set of properties base on their names, number, types, and order given at initialization.  In addition to just holding these properties, it is also given appropriate overridden implementations for Equals() and GetHashCode() that take into account all of the properties to correctly perform property comparisons and hashing.  Also overridden is an implementation of ToString() which makes it easy to display the contents of an anonymous type instance in a fairly concise manner. To construct an anonymous type instance, you use basically the same initialization syntax as with a regular type.  So, for example, if we wanted to create an anonymous type to represent a particular point, we could do this: 1: var point = new { X = 13, Y = 7 }; Note the similarity between anonymous type initialization and regular initialization.  The main difference is that the compiler generates the type name and the properties (as readonly) based on the names and order provided, and inferring their types from the expressions they are assigned to. It is key to remember that all of those factors (number, names, types, order of properties) determine the anonymous type.  This is important, because while these two instances share the same anonymous type: 1: // same names, types, and order 2: var point1 = new { X = 13, Y = 7 }; 3: var point2 = new { X = 5, Y = 0 }; These similar ones do not: 1: var point3 = new { Y = 3, X = 5 }; // different order 2: var point4 = new { X = 3, Y = 5.0 }; // different type for Y 3: var point5 = new {MyX = 3, MyY = 5 }; // different names 4: var point6 = new { X = 1, Y = 2, Z = 3 }; // different count Limitations on Property Initialization Expressions The expression for a property in an anonymous type initialization cannot be null (though it can evaluate to null) or an anonymous function.  For example, the following are illegal: 1: // Null can't be used directly. Null reference of what type? 2: var cantUseNull = new { Value = null }; 3:  4: // Anonymous methods cannot be used. 5: var cantUseAnonymousFxn = new { Value = () => Console.WriteLine(“Can’t.”) }; Note that the restriction on null is just that you can’t use it directly as the expression, because otherwise how would it be able to determine the type?  You can, however, use it indirectly assigning a null expression such as a typed variable with the value null, or by casting null to a specific type: 1: string str = null; 2: var fineIndirectly = new { Value = str }; 3: var fineCast = new { Value = (string)null }; All of the examples above name the properties explicitly, but you can also implicitly name properties if they are being set from a property, field, or variable.  In these cases, when a field, property, or variable is used alone, and you don’t specify a property name assigned to it, the new property will have the same name.  For example: 1: int variable = 42; 2:  3: // creates two properties named varriable and Now 4: var implicitProperties = new { variable, DateTime.Now }; Is the same type as: 1: var explicitProperties = new { variable = variable, Now = DateTime.Now }; But this only works if you are using an existing field, variable, or property directly as the expression.  If you use a more complex expression then the name cannot be inferred: 1: // can't infer the name variable from variable * 2, must name explicitly 2: var wontWork = new { variable * 2, DateTime.Now }; In the example above, since we typed variable * 2, it is no longer just a variable and thus we would have to assign the property a name explicitly. ToString() on Anonymous Types One of the more trivial overrides that an anonymous type provides you is a ToString() method that prints the value of the anonymous type instance in much the same format as it was initialized (except actual values instead of expressions as appropriate of course). For example, if you had: 1: var point = new { X = 13, Y = 42 }; And then print it out: 1: Console.WriteLine(point.ToString()); You will get: 1: { X = 13, Y = 42 } While this isn’t necessarily the most stunning feature of anonymous types, it can be handy for debugging or logging values in a fairly easy to read format. Comparing Anonymous Type Instances Because anonymous types automatically create appropriate overrides of Equals() and GetHashCode() based on the underlying properties, we can reliably compare two instances or get hash codes.  For example, if we had the following 3 points: 1: var point1 = new { X = 1, Y = 2 }; 2: var point2 = new { X = 1, Y = 2 }; 3: var point3 = new { Y = 2, X = 1 }; If we compare point1 and point2 we’ll see that Equals() returns true because they overridden version of Equals() sees that the types are the same (same number, names, types, and order of properties) and that the values are the same.   In addition, because all equal objects should have the same hash code, we’ll see that the hash codes evaluate to the same as well: 1: // true, same type, same values 2: Console.WriteLine(point1.Equals(point2)); 3:  4: // true, equal anonymous type instances always have same hash code 5: Console.WriteLine(point1.GetHashCode() == point2.GetHashCode()); However, if we compare point2 and point3 we get false.  Even though the names, types, and values of the properties are the same, the order is not, thus they are two different types and cannot be compared (and thus return false).  And, since they are not equal objects (even though they have the same value) there is a good chance their hash codes are different as well (though not guaranteed): 1: // false, different types 2: Console.WriteLine(point2.Equals(point3)); 3:  4: // quite possibly false (was false on my machine) 5: Console.WriteLine(point2.GetHashCode() == point3.GetHashCode()); Using Anonymous Types Now that we’ve created instances of anonymous types, let’s actually use them.  The property names (whether implicit or explicit) are used to access the individual properties of the anonymous type.  The main thing, once again, to keep in mind is that the properties are readonly, so you cannot assign the properties a new value (note: this does not mean that instances referred to by a property are immutable – for more information check out C#/.NET Fundamentals: Returning Data Immutably in a Mutable World). Thus, if we have the following anonymous type instance: 1: var point = new { X = 13, Y = 42 }; We can get the properties as you’d expect: 1: Console.WriteLine(“The point is: ({0},{1})”, point.X, point.Y); But we cannot alter the property values: 1: // compiler error, properties are readonly 2: point.X = 99; Further, since the anonymous type name is only known by the compiler, there is no easy way to pass anonymous type instances outside of a given scope.  The only real choices are to pass them as object or dynamic.  But really that is not the intention of using anonymous types.  If you find yourself needing to pass an anonymous type outside of a given scope, you should really consider making a POCO (Plain Old CLR Type – i.e. a class that contains just properties to hold data with little/no business logic) instead. Given that, why use them at all?  Couldn’t you always just create a POCO to represent every anonymous type you needed?  Sure you could, but then you might litter your solution with many small POCO classes that have very localized uses. It turns out this is the key to when to use anonymous types to your advantage: when you just need a lightweight type in a local context to store intermediate results, consider an anonymous type – but when that result is more long-lived and used outside of the current scope, consider a POCO instead. So what do we mean by intermediate results in a local context?  Well, a classic example would be filtering down results from a LINQ expression.  For example, let’s say we had a List<Transaction>, where Transaction is defined something like: 1: public class Transaction 2: { 3: public string UserId { get; set; } 4: public DateTime At { get; set; } 5: public decimal Amount { get; set; } 6: // … 7: } And let’s say we had this data in our List<Transaction>: 1: var transactions = new List<Transaction> 2: { 3: new Transaction { UserId = "Jim", At = DateTime.Now, Amount = 2200.00m }, 4: new Transaction { UserId = "Jim", At = DateTime.Now, Amount = -1100.00m }, 5: new Transaction { UserId = "Jim", At = DateTime.Now.AddDays(-1), Amount = 900.00m }, 6: new Transaction { UserId = "John", At = DateTime.Now.AddDays(-2), Amount = 300.00m }, 7: new Transaction { UserId = "John", At = DateTime.Now, Amount = -10.00m }, 8: new Transaction { UserId = "Jane", At = DateTime.Now, Amount = 200.00m }, 9: new Transaction { UserId = "Jane", At = DateTime.Now, Amount = -50.00m }, 10: new Transaction { UserId = "Jaime", At = DateTime.Now.AddDays(-3), Amount = -100.00m }, 11: new Transaction { UserId = "Jaime", At = DateTime.Now.AddDays(-3), Amount = 300.00m }, 12: }; So let’s say we wanted to get the transactions for each day for each user.  That is, for each day we’d want to see the transactions each user performed.  We could do this very simply with a nice LINQ expression, without the need of creating any POCOs: 1: // group the transactions based on an anonymous type with properties UserId and Date: 2: byUserAndDay = transactions 3: .GroupBy(tx => new { tx.UserId, tx.At.Date }) 4: .OrderBy(grp => grp.Key.Date) 5: .ThenBy(grp => grp.Key.UserId); Now, those of you who have attempted to use custom classes as a grouping type before (such as GroupBy(), Distinct(), etc.) may have discovered the hard way that LINQ gets a lot of its speed by utilizing not on Equals(), but also GetHashCode() on the type you are grouping by.  Thus, when you use custom types for these purposes, you generally end up having to write custom Equals() and GetHashCode() implementations or you won’t get the results you were expecting (the default implementations of Equals() and GetHashCode() are reference equality and reference identity based respectively). As we said before, it turns out that anonymous types already do these critical overrides for you.  This makes them even more convenient to use!  Instead of creating a small POCO to handle this grouping, and then having to implement a custom Equals() and GetHashCode() every time, we can just take advantage of the fact that anonymous types automatically override these methods with appropriate implementations that take into account the values of all of the properties. Now, we can look at our results: 1: foreach (var group in byUserAndDay) 2: { 3: // the group’s Key is an instance of our anonymous type 4: Console.WriteLine("{0} on {1:MM/dd/yyyy} did:", group.Key.UserId, group.Key.Date); 5:  6: // each grouping contains a sequence of the items. 7: foreach (var tx in group) 8: { 9: Console.WriteLine("\t{0}", tx.Amount); 10: } 11: } And see: 1: Jaime on 06/18/2012 did: 2: -100.00 3: 300.00 4:  5: John on 06/19/2012 did: 6: 300.00 7:  8: Jim on 06/20/2012 did: 9: 900.00 10:  11: Jane on 06/21/2012 did: 12: 200.00 13: -50.00 14:  15: Jim on 06/21/2012 did: 16: 2200.00 17: -1100.00 18:  19: John on 06/21/2012 did: 20: -10.00 Again, sure we could have just built a POCO to do this, given it an appropriate Equals() and GetHashCode() method, but that would have bloated our code with so many extra lines and been more difficult to maintain if the properties change.  Summary Anonymous types are one of those Little Wonders of the .NET language that are perfect at exactly that time when you need a temporary type to hold a set of properties together for an intermediate result.  While they are not very useful beyond the scope in which they are defined, they are excellent in LINQ expressions as a way to create and us intermediary values for further expressions and analysis. Anonymous types are defined by the compiler based on the number, type, names, and order of properties created, and they automatically implement appropriate Equals() and GetHashCode() overrides (as well as ToString()) which makes them ideal for LINQ expressions where you need to create a set of properties to group, evaluate, etc. Technorati Tags: C#,CSharp,.NET,Little Wonders,Anonymous Types,LINQ

    Read the article

  • C#/.NET Fundamentals: Choosing the Right Collection Class

    - by James Michael Hare
    The .NET Base Class Library (BCL) has a wide array of collection classes at your disposal which make it easy to manage collections of objects. While it's great to have so many classes available, it can be daunting to choose the right collection to use for any given situation. As hard as it may be, choosing the right collection can be absolutely key to the performance and maintainability of your application! This post will look at breaking down any confusion between each collection and the situations in which they excel. We will be spending most of our time looking at the System.Collections.Generic namespace, which is the recommended set of collections. The Generic Collections: System.Collections.Generic namespace The generic collections were introduced in .NET 2.0 in the System.Collections.Generic namespace. This is the main body of collections you should tend to focus on first, as they will tend to suit 99% of your needs right up front. It is important to note that the generic collections are unsynchronized. This decision was made for performance reasons because depending on how you are using the collections its completely possible that synchronization may not be required or may be needed on a higher level than simple method-level synchronization. Furthermore, concurrent read access (all writes done at beginning and never again) is always safe, but for concurrent mixed access you should either synchronize the collection or use one of the concurrent collections. So let's look at each of the collections in turn and its various pros and cons, at the end we'll summarize with a table to help make it easier to compare and contrast the different collections. The Associative Collection Classes Associative collections store a value in the collection by providing a key that is used to add/remove/lookup the item. Hence, the container associates the value with the key. These collections are most useful when you need to lookup/manipulate a collection using a key value. For example, if you wanted to look up an order in a collection of orders by an order id, you might have an associative collection where they key is the order id and the value is the order. The Dictionary<TKey,TVale> is probably the most used associative container class. The Dictionary<TKey,TValue> is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers. Because the keys are hashed, the key type should correctly implement GetHashCode() and Equals() appropriately or you should provide an external IEqualityComparer to the dictionary on construction. The insert/delete/lookup time of items in the dictionary is amortized constant time - O(1) - which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant. This is highly desirable for high-speed lookups. The only downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order. The SortedDictionary<TKey,TValue> is similar to the Dictionary<TKey,TValue> in usage but very different in implementation. The SortedDictionary<TKey,TValye> uses a binary tree under the covers to maintain the items in order by the key. As a consequence of sorting, the type used for the key must correctly implement IComparable<TKey> so that the keys can be correctly sorted. The sorted dictionary trades a little bit of lookup time for the ability to maintain the items in order, thus insert/delete/lookup times in a sorted dictionary are logarithmic - O(log n). Generally speaking, with logarithmic time, you can double the size of the collection and it only has to perform one extra comparison to find the item. Use the SortedDictionary<TKey,TValue> when you want fast lookups but also want to be able to maintain the collection in order by the key. The SortedList<TKey,TValue> is the other ordered associative container class in the generic containers. Once again SortedList<TKey,TValue>, like SortedDictionary<TKey,TValue>, uses a key to sort key-value pairs. Unlike SortedDictionary, however, items in a SortedList are stored as an ordered array of items. This means that insertions and deletions are linear - O(n) - because deleting or adding an item may involve shifting all items up or down in the list. Lookup time, however is O(log n) because the SortedList can use a binary search to find any item in the list by its key. So why would you ever want to do this? Well, the answer is that if you are going to load the SortedList up-front, the insertions will be slower, but because array indexing is faster than following object links, lookups are marginally faster than a SortedDictionary. Once again I'd use this in situations where you want fast lookups and want to maintain the collection in order by the key, and where insertions and deletions are rare. The Non-Associative Containers The other container classes are non-associative. They don't use keys to manipulate the collection but rely on the object itself being stored or some other means (such as index) to manipulate the collection. The List<T> is a basic contiguous storage container. Some people may call this a vector or dynamic array. Essentially it is an array of items that grow once its current capacity is exceeded. Because the items are stored contiguously as an array, you can access items in the List<T> by index very quickly. However inserting and removing in the beginning or middle of the List<T> are very costly because you must shift all the items up or down as you delete or insert respectively. However, adding and removing at the end of a List<T> is an amortized constant operation - O(1). Typically List<T> is the standard go-to collection when you don't have any other constraints, and typically we favor a List<T> even over arrays unless we are sure the size will remain absolutely fixed. The LinkedList<T> is a basic implementation of a doubly-linked list. This means that you can add or remove items in the middle of a linked list very quickly (because there's no items to move up or down in contiguous memory), but you also lose the ability to index items by position quickly. Most of the time we tend to favor List<T> over LinkedList<T> unless you are doing a lot of adding and removing from the collection, in which case a LinkedList<T> may make more sense. The HashSet<T> is an unordered collection of unique items. This means that the collection cannot have duplicates and no order is maintained. Logically, this is very similar to having a Dictionary<TKey,TValue> where the TKey and TValue both refer to the same object. This collection is very useful for maintaining a collection of items you wish to check membership against. For example, if you receive an order for a given vendor code, you may want to check to make sure the vendor code belongs to the set of vendor codes you handle. In these cases a HashSet<T> is useful for super-quick lookups where order is not important. Once again, like in Dictionary, the type T should have a valid implementation of GetHashCode() and Equals(), or you should provide an appropriate IEqualityComparer<T> to the HashSet<T> on construction. The SortedSet<T> is to HashSet<T> what the SortedDictionary<TKey,TValue> is to Dictionary<TKey,TValue>. That is, the SortedSet<T> is a binary tree where the key and value are the same object. This once again means that adding/removing/lookups are logarithmic - O(log n) - but you gain the ability to iterate over the items in order. For this collection to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>. Finally, the Stack<T> and Queue<T> are two very specific collections that allow you to handle a sequential collection of objects in very specific ways. The Stack<T> is a last-in-first-out (LIFO) container where items are added and removed from the top of the stack. Typically this is useful in situations where you want to stack actions and then be able to undo those actions in reverse order as needed. The Queue<T> on the other hand is a first-in-first-out container which adds items at the end of the queue and removes items from the front. This is useful for situations where you need to process items in the order in which they came, such as a print spooler or waiting lines. So that's the basic collections. Let's summarize what we've learned in a quick reference table.  Collection Ordered? Contiguous Storage? Direct Access? Lookup Efficiency Manipulate Efficiency Notes Dictionary No Yes Via Key Key: O(1) O(1) Best for high performance lookups. SortedDictionary Yes No Via Key Key: O(log n) O(log n) Compromise of Dictionary speed and ordering, uses binary search tree. SortedList Yes Yes Via Key Key: O(log n) O(n) Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads. List No Yes Via Index Index: O(1) Value: O(n) O(n) Best for smaller lists where direct access required and no ordering. LinkedList No No No Value: O(n) O(1) Best for lists where inserting/deleting in middle is common and no direct access required. HashSet No Yes Via Key Key: O(1) O(1) Unique unordered collection, like a Dictionary except key and value are same object. SortedSet Yes No Via Key Key: O(log n) O(log n) Unique ordered collection, like SortedDictionary except key and value are same object. Stack No Yes Only Top Top: O(1) O(1)* Essentially same as List<T> except only process as LIFO Queue No Yes Only Front Front: O(1) O(1) Essentially same as List<T> except only process as FIFO   The Original Collections: System.Collections namespace The original collection classes are largely considered deprecated by developers and by Microsoft itself. In fact they indicate that for the most part you should always favor the generic or concurrent collections, and only use the original collections when you are dealing with legacy .NET code. Because these collections are out of vogue, let's just briefly mention the original collection and their generic equivalents: ArrayList A dynamic, contiguous collection of objects. Favor the generic collection List<T> instead. Hashtable Associative, unordered collection of key-value pairs of objects. Favor the generic collection Dictionary<TKey,TValue> instead. Queue First-in-first-out (FIFO) collection of objects. Favor the generic collection Queue<T> instead. SortedList Associative, ordered collection of key-value pairs of objects. Favor the generic collection SortedList<T> instead. Stack Last-in-first-out (LIFO) collection of objects. Favor the generic collection Stack<T> instead. In general, the older collections are non-type-safe and in some cases less performant than their generic counterparts. Once again, the only reason you should fall back on these older collections is for backward compatibility with legacy code and libraries only. The Concurrent Collections: System.Collections.Concurrent namespace The concurrent collections are new as of .NET 4.0 and are included in the System.Collections.Concurrent namespace. These collections are optimized for use in situations where multi-threaded read and write access of a collection is desired. The concurrent queue, stack, and dictionary work much as you'd expect. The bag and blocking collection are more unique. Below is the summary of each with a link to a blog post I did on each of them. ConcurrentQueue Thread-safe version of a queue (FIFO). For more information see: C#/.NET Little Wonders: The ConcurrentStack and ConcurrentQueue ConcurrentStack Thread-safe version of a stack (LIFO). For more information see: C#/.NET Little Wonders: The ConcurrentStack and ConcurrentQueue ConcurrentBag Thread-safe unordered collection of objects. Optimized for situations where a thread may be bother reader and writer. For more information see: C#/.NET Little Wonders: The ConcurrentBag and BlockingCollection ConcurrentDictionary Thread-safe version of a dictionary. Optimized for multiple readers (allows multiple readers under same lock). For more information see C#/.NET Little Wonders: The ConcurrentDictionary BlockingCollection Wrapper collection that implement producers & consumers paradigm. Readers can block until items are available to read. Writers can block until space is available to write (if bounded). For more information see C#/.NET Little Wonders: The ConcurrentBag and BlockingCollection Summary The .NET BCL has lots of collections built in to help you store and manipulate collections of data. Understanding how these collections work and knowing in which situations each container is best is one of the key skills necessary to build more performant code. Choosing the wrong collection for the job can make your code much slower or even harder to maintain if you choose one that doesn’t perform as well or otherwise doesn’t exactly fit the situation. Remember to avoid the original collections and stick with the generic collections.  If you need concurrent access, you can use the generic collections if the data is read-only, or consider the concurrent collections for mixed-access if you are running on .NET 4.0 or higher.   Tweet Technorati Tags: C#,.NET,Collecitons,Generic,Concurrent,Dictionary,List,Stack,Queue,SortedList,SortedDictionary,HashSet,SortedSet

    Read the article

  • Designing for the future

    - by Dennis Vroegop
    User interfaces and user experience design is a fast moving field. It’s something that changes pretty quick: what feels fresh today will look outdated tomorrow. I remember the day I first got a beta version of Windows 95 and I felt swept away by the user interface of the OS. It felt so modern! If I look back now, it feels old. Well, it should: the design is 17 years old which is an eternity in our field. Of course, this is not limited to UI. Same goes for many industries. I want you to think back of the cars that amazed you when you were in your teens (if you are in your teens then this may not apply to you). Didn’t they feel like part of the future? Didn’t you think that this was the ultimate in designs? And aren’t those designs hopelessly outdated today (again, depending on your age, it may just be me)? Let’s review the Win95 design: And let’s compare that to Windows 7: There are so many differences here, I wouldn’t even know where to start explaining them. The general feeling however is one of more usability: studies have shown Windows 7 is much easier to understand for new users than the older versions of Windows did. Of course, experienced Windows users didn’t like it: people are usually afraid of changes and like to stick to what they know. But for new users this was a huge improvement. And that is what UX design is all about: make a product easier to use, with less training required and make users feel more productive. Still, there are areas where this doesn’t hold up. There are plenty examples of designs from the past that are still fresh today. But if you look closely at them, you’ll notice some subtle differences. This differences are what keep the designs fresh. A good example is the signs you’ll find on the road. They haven’t changed much over the years (otherwise people wouldn’t recognize them anymore) but they have been changing gradually to reflect changes in traffic. The same goes for computer interfaces. With each new product or version of a product, the UI and UX is changed gradually. Every now and then however, a bigger change is needed. Just think about the introduction of the Ribbon in Microsoft Office 2007: the whole UI was redesigned. A lot of old users (not in age, but in times of using older versions) didn’t like it a bit, but new users or casual users seem to be more efficient using the product. Which, of course, is exactly the reason behind the changes. I believe that a big engine behind the changes in User Experience design has been the web. In the old days (i.e. before the explosion of the internet) user interface design in Windows applications was limited to choosing the margins between your battleship gray buttons. When the web came along, and especially the web 2.0 where the browsers started to act more and more as application platforms, designers stepped in and made a huge impact. In the browser, they could do whatever they wanted. In the beginning this was limited to the darn blink tag but gradually people really started to think about UX. Even more so: the design of the UI and the whole experience was taken away from the developers and put into the hands of people who knew what they were doing: UX designers. This caused some problems. Everyone who has done a web project in the early 2000’s must have had the same experience: the designers give you a set of Photoshop files and tell you to translate it to HTML. Which, of course, is very hard to do. However, with new tooling and new standards this became much easier. The latest version of HTML and CSS has taken the responsibility for the design away from the developers and placed them in the capable hands of the designers. And that’s where that responsibility belongs, after all, I don’t want a designer to muck around in my c# code just as much as he or she doesn’t want me to poke in the sites style definitions. This change in responsibilities resulted in good looking but more important: better thought out user interfaces in websites. And when websites became more and more interactive, people started to expect the same sort of look and feel from their desktop applications. But that didn’t really happen. Most business applications still have that battleship gray look and feel. Ok, they may use a different color but we’re not talking colors here but usability. Now, you may not be able to read the Dutch captions, but even if you did you wouldn’t understand what was going on. At least, not when you first see it. You have to scan the screen, read all the labels, see how they are related to the other elements on the screen and then figure out what they do. If you’re an experienced user of this application however, this might be a good thing: you know what to do and you get all the information you need in one single screen. But for most applications this isn’t the case. A lot of people only use their computer for a limited time a day (a weird concept for me, but it happens) and need it to get something done and then get on with their lives. For them, a user interface experience like the above isn’t working. (disclaimer: I just picked a screenshot, I am not saying this is bad software but it is an example of about 95% of the Windows applications out there). For the knowledge worker, this isn’t a problem. They use one or two systems and they know exactly what they need to do to achieve their goal. They don’t want any clutter on their screen that distracts them from their task, they just want to be as efficient as possible. When they know the systems they are very productive. The point is, how long does it take to become productive? And: could they be even more productive if the UX was better? Are there things missing that they don’t know about? Are there better ways to achieve what they want to achieve? Also: could a system be designed in such a way that it is not only much more easy to work with but also less tiring? in the example above you need to switch between the keyboard and mouse a lot, something that we now know can be very tiring. The goal of most applications (being client apps or websites on any kind of device) is to provide information. Information is data that when given to the right people, on the right time, in the right place and when it is correct adds value for that person (please, remember that definition: I still hear the statement “the information was wrong” which doesn’t make sense: data can be wrong, information cannot be). So if a system provides data, how can we make sure the chances of becoming information is as high as possible? A good example of a well thought-out system that attempts this is the Zune client. It is a very good application, and I think the UX is much better than it’s main competitor iTunes. Have a look at both: On the left you see the iTunes screenshot, on the right the Zune. As you notice, the Zune screen has more images but less chrome (chrome being visuals not part of the data you want to show, i.e. edges around buttons). The whole thing is text oriented or image oriented, where that text or image is part of the information you need. What is important is big, what’s less important is smaller. Yet, everything you need to know at that point is present and your attention is drawn immediately to what you’re trying to achieve: information about music. You can easily switch between the content on your machine and content on your Zune player but clicking on the image of the player. But if you didn’t know that, you’d find out soon enough: the whole UX is designed in such a way that it invites you to play around. So sooner or later (probably sooner) you’d click on that image and you would see what it does. In the iTunes version it’s harder to find: the discoverability is a lot lower. For inexperienced people the Zune player feels much more natural than the iTunes player, and they get up to speed a lot faster. How does this all work? Why is this UX better? The answer lies in a project from Microsoft with the codename (it seems to be becoming the official name though) “Metro”. Metro is a design language, based on certain principles. When they thought about UX they took a good long look around them and went out in search of metaphors. And they found them. The team noticed that signage in streets, airports, roads, buildings and so on are usually very clear and very precise. These signs give you the information you need and nothing more. It’s simple, clearly understood and fast to understand. A good example are airport signs. Airports can be intimidating places, especially for the non-experienced traveler. In the early 1990’s Amsterdam Airport Schiphol decided to redesign all the signage to make the traveller feel less disoriented. They developed a set of guidelines for signs and implemented those. Soon, most airports around the world adopted these ideas and you see variations of the Dutch signs everywhere on the globe. The signs are text-oriented. Yes, there are icons explaining what it all means for the people who can’t read or don’t understand the language, but the basic sign language is text. It’s clear, it’s high-contrast and it’s easy to understand. One look at the sign and you know where to go. The only thing I don’t like is the green sign pointing to the emergency exit, but since this is the default style for emergency exits I understand why they did this. If you look at the Zune UI again, you’ll notice the similarities. Text oriented, little or no icons, clear usage of fonts and all the information you need. This design language has a set of principles: Clean, light, open and fast Content, not chrome Soulful and alive These are just a couple of the principles, you can read the whole philosophy behind Metro for Windows Phone 7 here. These ideas seem to work. I love my Windows Phone 7. It’s easy to use, it’s clear, there’s no clutter that I do not need. It works for me. And I noticed it works for a lot of other people as well, especially people who aren’t as proficient with computers as I am. You see these ideas in a lot other places. Corning, a manufacturer of glass, has made a video of possible usages of their products. It’s their glimpse into the future. You’ll notice that a lot of the UI in the screens look a lot like what Microsoft is doing with Metro (not coincidentally Corning is the supplier for the Gorilla glass display surface on the new SUR40 device (or Surface v2.0 as a lot of people call it)). The idea behind this vision is that data should be available everywhere where you it. Systems should be available at all times and data is presented in a clear and light manner so that you can turn that data into information. You don’t need a lot of fancy animations that only distract from the data. You want the data and you want it fast. Have a look at this truly inspiring video that made: This is what I believe the future will look like. Of course, not everything is possible, or even desirable. But it is a nice way to think about the future . I feel very strongly about designing applications in such a way that they add value to the user. Designing applications that turn data into information. Applications that make the user feel happy to use them. So… when are you going to drop the battleship-gray designs? Tags van Technorati: surface,design,windows phone 7,wp7,metro

    Read the article

  • C#/.NET Little Wonders: ConcurrentBag and BlockingCollection

    - by James Michael Hare
    In the first week of concurrent collections, began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>.  The last post discussed the ConcurrentDictionary<T> .  Finally this week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see C#/.NET Little Wonders: A Redux. Recap As you'll recall from the previous posts, the original collections were object-based containers that accomplished synchronization through a Synchronized member.  With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization.  With .NET 4.0, a new breed of collections was born in the System.Collections.Concurrent namespace.  Of these, the final concurrent collection we will examine is the ConcurrentBag and a very useful wrapper class called the BlockingCollection. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this informative whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentBag<T> – Thread-safe unordered collection. Unlike the other concurrent collections, the ConcurrentBag<T> has no non-concurrent counterpart in the .NET collections libraries.  Items can be added and removed from a bag just like any other collection, but unlike the other collections, the items are not maintained in any order.  This makes the bag handy for those cases when all you care about is that the data be consumed eventually, without regard for order of consumption or even fairness – that is, it’s possible new items could be consumed before older items given the right circumstances for a period of time. So why would you ever want a container that can be unfair?  Well, to look at it another way, you can use a ConcurrentQueue and get the fairness, but it comes at a cost in that the ordering rules and synchronization required to maintain that ordering can affect scalability a bit.  Thus sometimes the bag is great when you want the fastest way to get the next item to process, and don’t care what item it is or how long its been waiting. The way that the ConcurrentBag works is to take advantage of the new ThreadLocal<T> type (new in System.Threading for .NET 4.0) so that each thread using the bag has a list local to just that thread.  This means that adding or removing to a thread-local list requires very low synchronization.  The problem comes in where a thread goes to consume an item but it’s local list is empty.  In this case the bag performs “work-stealing” where it will rob an item from another thread that has items in its list.  This requires a higher level of synchronization which adds a bit of overhead to the take operation. So, as you can imagine, this makes the ConcurrentBag good for situations where each thread both produces and consumes items from the bag, but it would be less-than-idea in situations where some threads are dedicated producers and the other threads are dedicated consumers because the work-stealing synchronization would outweigh the thread-local optimization for a thread taking its own items. Like the other concurrent collections, there are some curiosities to keep in mind: IsEmpty(), Count, ToArray(), and GetEnumerator() lock collection Each of these needs to take a snapshot of whole bag to determine if empty, thus they tend to be more expensive and cause Add() and Take() operations to block. ToArray() and GetEnumerator() are static snapshots Because it is based on a snapshot, will not show subsequent updates after snapshot. Add() is lightweight Since adding to the thread-local list, there is very little overhead on Add. TryTake() is lightweight if items in thread-local list As long as items are in the thread-local list, TryTake() is very lightweight, much more so than ConcurrentStack() and ConcurrentQueue(), however if the local thread list is empty, it must steal work from another thread, which is more expensive. Remember, a bag is not ideal for all situations, it is mainly ideal for situations where a process consumes an item and either decomposes it into more items to be processed, or handles the item partially and places it back to be processed again until some point when it will complete.  The main point is that the bag works best when each thread both takes and adds items. For example, we could create a totally contrived example where perhaps we want to see the largest power of a number before it crosses a certain threshold.  Yes, obviously we could easily do this with a log function, but bare with me while I use this contrived example for simplicity. So let’s say we have a work function that will take a Tuple out of a bag, this Tuple will contain two ints.  The first int is the original number, and the second int is the last multiple of that number.  So we could load our bag with the initial values (let’s say we want to know the last multiple of each of 2, 3, 5, and 7 under 100. 1: var bag = new ConcurrentBag<Tuple<int, int>> 2: { 3: Tuple.Create(2, 1), 4: Tuple.Create(3, 1), 5: Tuple.Create(5, 1), 6: Tuple.Create(7, 1) 7: }; Then we can create a method that given the bag, will take out an item, apply the multiplier again, 1: public static void FindHighestPowerUnder(ConcurrentBag<Tuple<int,int>> bag, int threshold) 2: { 3: Tuple<int,int> pair; 4:  5: // while there are items to take, this will prefer local first, then steal if no local 6: while (bag.TryTake(out pair)) 7: { 8: // look at next power 9: var result = Math.Pow(pair.Item1, pair.Item2 + 1); 10:  11: if (result < threshold) 12: { 13: // if smaller than threshold bump power by 1 14: bag.Add(Tuple.Create(pair.Item1, pair.Item2 + 1)); 15: } 16: else 17: { 18: // otherwise, we're done 19: Console.WriteLine("Highest power of {0} under {3} is {0}^{1} = {2}.", 20: pair.Item1, pair.Item2, Math.Pow(pair.Item1, pair.Item2), threshold); 21: } 22: } 23: } Now that we have this, we can load up this method as an Action into our Tasks and run it: 1: // create array of tasks, start all, wait for all 2: var tasks = new[] 3: { 4: new Task(() => FindHighestPowerUnder(bag, 100)), 5: new Task(() => FindHighestPowerUnder(bag, 100)), 6: }; 7:  8: Array.ForEach(tasks, t => t.Start()); 9:  10: Task.WaitAll(tasks); Totally contrived, I know, but keep in mind the main point!  When you have a thread or task that operates on an item, and then puts it back for further consumption – or decomposes an item into further sub-items to be processed – you should consider a ConcurrentBag as the thread-local lists will allow for quick processing.  However, if you need ordering or if your processes are dedicated producers or consumers, this collection is not ideal.  As with anything, you should performance test as your mileage will vary depending on your situation! BlockingCollection<T> – A producers & consumers pattern collection The BlockingCollection<T> can be treated like a collection in its own right, but in reality it adds a producers and consumers paradigm to any collection that implements the interface IProducerConsumerCollection<T>.  If you don’t specify one at the time of construction, it will use a ConcurrentQueue<T> as its underlying store. If you don’t want to use the ConcurrentQueue, the ConcurrentStack and ConcurrentBag also implement the interface (though ConcurrentDictionary does not).  In addition, you are of course free to create your own implementation of the interface. So, for those who don’t remember the producers and consumers classical computer-science problem, the gist of it is that you have one (or more) processes that are creating items (producers) and one (or more) processes that are consuming these items (consumers).  Now, the crux of the problem is that there is a bin (queue) where the produced items are placed, and typically that bin has a limited size.  Thus if a producer creates an item, but there is no space to store it, it must wait until an item is consumed.  Also if a consumer goes to consume an item and none exists, it must wait until an item is produced. The BlockingCollection makes it trivial to implement any standard producers/consumers process set by providing that “bin” where the items can be produced into and consumed from with the appropriate blocking operations.  In addition, you can specify whether the bin should have a limited size or can be (theoretically) unbounded, and you can specify timeouts on the blocking operations. As far as your choice of “bin”, for the most part the ConcurrentQueue is the right choice because it is fairly light and maximizes fairness by ordering items so that they are consumed in the same order they are produced.  You can use the concurrent bag or stack, of course, but your ordering would be random-ish in the case of the former and LIFO in the case of the latter. So let’s look at some of the methods of note in BlockingCollection: BoundedCapacity returns capacity of the “bin” If the bin is unbounded, the capacity is int.MaxValue. Count returns an internally-kept count of items This makes it O(1), but if you modify underlying collection directly (not recommended) it is unreliable. CompleteAdding() is used to cut off further adds. This sets IsAddingCompleted and begins to wind down consumers once empty. IsAddingCompleted is true when producers are “done”. Once you are done producing, should complete the add process to alert consumers. IsCompleted is true when producers are “done” and “bin” is empty. Once you mark the producers done, and all items removed, this will be true. Add() is a blocking add to collection. If bin is full, will wait till space frees up Take() is a blocking remove from collection. If bin is empty, will wait until item is produced or adding is completed. GetConsumingEnumerable() is used to iterate and consume items. Unlike the standard enumerator, this one consumes the items instead of iteration. TryAdd() attempts add but does not block completely If adding would block, returns false instead, can specify TimeSpan to wait before stopping. TryTake() attempts to take but does not block completely Like TryAdd(), if taking would block, returns false instead, can specify TimeSpan to wait. Note the use of CompleteAdding() to signal the BlockingCollection that nothing else should be added.  This means that any attempts to TryAdd() or Add() after marked completed will throw an InvalidOperationException.  In addition, once adding is complete you can still continue to TryTake() and Take() until the bin is empty, and then Take() will throw the InvalidOperationException and TryTake() will return false. So let’s create a simple program to try this out.  Let’s say that you have one process that will be producing items, but a slower consumer process that handles them.  This gives us a chance to peek inside what happens when the bin is bounded (by default, the bin is NOT bounded). 1: var bin = new BlockingCollection<int>(5); Now, we create a method to produce items: 1: public static void ProduceItems(BlockingCollection<int> bin, int numToProduce) 2: { 3: for (int i = 0; i < numToProduce; i++) 4: { 5: // try for 10 ms to add an item 6: while (!bin.TryAdd(i, TimeSpan.FromMilliseconds(10))) 7: { 8: Console.WriteLine("Bin is full, retrying..."); 9: } 10: } 11:  12: // once done producing, call CompleteAdding() 13: Console.WriteLine("Adding is completed."); 14: bin.CompleteAdding(); 15: } And one to consume them: 1: public static void ConsumeItems(BlockingCollection<int> bin) 2: { 3: // This will only be true if CompleteAdding() was called AND the bin is empty. 4: while (!bin.IsCompleted) 5: { 6: int item; 7:  8: if (!bin.TryTake(out item, TimeSpan.FromMilliseconds(10))) 9: { 10: Console.WriteLine("Bin is empty, retrying..."); 11: } 12: else 13: { 14: Console.WriteLine("Consuming item {0}.", item); 15: Thread.Sleep(TimeSpan.FromMilliseconds(20)); 16: } 17: } 18: } Then we can fire them off: 1: // create one producer and two consumers 2: var tasks = new[] 3: { 4: new Task(() => ProduceItems(bin, 20)), 5: new Task(() => ConsumeItems(bin)), 6: new Task(() => ConsumeItems(bin)), 7: }; 8:  9: Array.ForEach(tasks, t => t.Start()); 10:  11: Task.WaitAll(tasks); Notice that the producer is faster than the consumer, thus it should be hitting a full bin often and displaying the message after it times out on TryAdd(). 1: Consuming item 0. 2: Consuming item 1. 3: Bin is full, retrying... 4: Bin is full, retrying... 5: Consuming item 3. 6: Consuming item 2. 7: Bin is full, retrying... 8: Consuming item 4. 9: Consuming item 5. 10: Bin is full, retrying... 11: Consuming item 6. 12: Consuming item 7. 13: Bin is full, retrying... 14: Consuming item 8. 15: Consuming item 9. 16: Bin is full, retrying... 17: Consuming item 10. 18: Consuming item 11. 19: Bin is full, retrying... 20: Consuming item 12. 21: Consuming item 13. 22: Bin is full, retrying... 23: Bin is full, retrying... 24: Consuming item 14. 25: Adding is completed. 26: Consuming item 15. 27: Consuming item 16. 28: Consuming item 17. 29: Consuming item 19. 30: Consuming item 18. Also notice that once CompleteAdding() is called and the bin is empty, the IsCompleted property returns true, and the consumers will exit. Summary The ConcurrentBag is an interesting collection that can be used to optimize concurrency scenarios where tasks or threads both produce and consume items.  In this way, it will choose to consume its own work if available, and then steal if not.  However, in situations where you want fair consumption or ordering, or in situations where the producers and consumers are distinct processes, the bag is not optimal. The BlockingCollection is a great wrapper around all of the concurrent queue, stack, and bag that allows you to add producer and consumer semantics easily including waiting when the bin is full or empty. That’s the end of my dive into the concurrent collections.  I’d also strongly recommend, once again, you read this excellent Microsoft white paper that goes into much greater detail on the efficiencies you can gain using these collections judiciously (here). Tweet Technorati Tags: C#,.NET,Concurrent Collections,Little Wonders

    Read the article

  • C#/.NET Little Wonders: The Useful But Overlooked Sets

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>.  Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items.  In other words, the set {2,3,5} is identical to the set {3,5,2}.  In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2).  In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both.  Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set.  Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets.  Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation?  The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order.  Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for.  Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list.  If it is, great, we already have that market data feed!  If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it.  This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6:  7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4:  5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13:  14: // place the order logic! 15: } What’s wrong with this?  In short: performance!  Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion.  Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long.  Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection.  Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage.  So performing a sort after each add would probably add more time.  As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant.  The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface.  As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>.  That is, a Dictionary where the TKey and TValue refer to the same object.  This is really an oversimplification, but logically it makes sense.  I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups!  Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n)   Now, these times are amortized and represent the typical case.  In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains!  This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection.  Compare this to the List where almost any add or remove must rearrange potentially all the elements!  And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T>  is change our declaration.  Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4:  5: // ... 6:  7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22:  23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>.  Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order.  Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally.  Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes.  There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>.  Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n)   The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error.  There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity.  Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection.  Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection.  This is very good performance!  It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000   Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count.  This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search).  So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets.  Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets.  Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets.  Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets.  You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order.  That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it.  This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection.  Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value.  In addition, if you have need for the mathematical set operations, the C# sets support those as well.  The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order.  In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance.   Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

    Read the article

  • Creating Item Templates as Visual Studio 2010 Extensions

    - by maziar
    Technorati Tags: Visual Studio 2010 Extension,T4 Template,VSIX,Item Template Wizard This blog post briefly introduces creation of an item template as a Visual studio 2010 extension. Problem specification Assume you are writing a Framework for data-oriented applications and you decide to include all your application messages in a SQL server database table. After creating the table, your create a class in your framework for getting messages with a string key specified.   var message = FrameworkMessages.Get("ChangesSavedSuccess");   Everyone would say this code is so error prone, because message keys are not strong-typed, and might create errors in application that are not caught in tests. So we think of a way to make it strong-typed, i.e. create a class to use it like this:   var message = Messages.ChangesSavedSuccess; in Messages class the code looks like this: public string ChangesSavedSuccess {     get { return FrameworkMessages.Get("ChangesSavedSuccess"); } }   And clearly, we are not going to create the Messages class manually; we need a class generator for it.   Again assume that the application(s) that intend to use our framework, contain multiple sub-systems. So each sub-system need to have it’s own strong-typed message class that call FrameworkMessages.Get method. So we would like to make our code generator an Item Template so that each developer would easily add the item to his project and no other works would be necessary.   Solution We create a T4 Text Template to generate our strong typed class from database. Then create a Visual Studio Item Template with this generator and publish it.   What Are T4 Templates You might be already familiar with T4 templates. If it’s so, you can skip this section. T4 Text Template is a fine Visual Studio file type (.tt) that generates output text. This file is a mixture of text blocks and code logic (in C# or VB). For example, you can generate HTML files, C# classes, Resource files and etc with use of a T4 template.   Syntax highlighting In Visual Studio 2010 a T4 Template by default will no be syntax highlighted and no auto-complete is supported for it. But there is a fine visual studio extension named ‘Visual T4’ which can be downloaded free from VisualStudioGallery. This tool offers IntelliSense, syntax coloring, validation, transformation preview and more for T4 templates.     How Item Templates work in Visual Studio Visual studio extensions allow us to add some functionalities to visual studio. In our case we need to create a .vsix file that adds a template to visual studio item templates. Item templates are zip files containing the template file and a meta-data file with .vstemplate extension. This .vstemplate file is an XML file that provides some information about the template. A .vsix file also is a zip file (renamed to .vsix) that are open with visual studio extension installer. (Re-installing a vsix file requires that old one to be uninstalled from VS: Tools > Extension Manager.) Installing a vsix will need Visual Studio to be closed and re-opened to take effect. Visual studio extension installer will easily find the item template’s zip file and copy it to visual studio’s template items folder. You can find other visual studio templates in [<VS Install Path>\Common7\IDE\ItemTemplates] and you can edit them; but be very careful with VS default templates.   How Can I Create a VSIX file 1. Visual Studio SDK depending on your Visual Studio’s version, you need to download Microsoft Visual Studio SDK. Note that if you have VS 2010 SP1, you will need to download and install VS 2010 SP1 SDK; VS 2010 SDK will not be installed (unless you change registry value that indicated your service pack number). Here is the link for VS 2010 SP1 SDK. After downloading, Run it and follow the wizard to complete the installation.   2. Create the file you want to make it an Item Template Create a project (or use an existing one) and add you file, edit it to make it work fine.   Back to our own problem, we need to create a T4 (.tt) template. VS-Prok: Add > New Item > General > Text Template Type a file name, ex. Message.tt, and press Add. Create the T4 template file (in this blog I do not intend to include T4 syntaxes so I just write down the code which is clear enough for you to understand)   <#@ template debug="false" hostspecific="true" language="C#" #> <#@ output extension=".cs" #> <#@ Assembly Name="System.Data" #> <#@ Import Namespace="System.Data.SqlClient" #> <#@ Import Namespace="System.Text" #> <#@ Import Namespace="System.IO" #> <#     var connectionString = "server=Maziar-PC; Database=MyDatabase; Integrated Security=True";     var systemName = "Sys1";     var builder = new StringBuilder();     using (var connection = new SqlConnection(connectionString))     {         connection.Open();         var command = connection.CreateCommand();         command.CommandText = string.Format("SELECT [Key] FROM [Message] WHERE System = '{0}'", systemName);         var reader = command.ExecuteReader();         while (reader.Read())         {             builder.AppendFormat("        public static string {0} {{ get {{ return FrameworkMessages.Get(\"{0}\"); }} }}\r\n", reader[0]);         }     } #> namespace <#= System.Runtime.Remoting.Messaging.CallContext.LogicalGetData("NamespaceHint") #> {     public static class <#= Path.GetFileNameWithoutExtension(Host.TemplateFile) #>     { <#= builder.ToString() #>     } } As you can see the T4 template connects to a database, reads message keys and generates a class. Here is the output: namespace MyProject.MyFolder {     public static class Messages     {         public static string ChangesSavedSuccess { get { return FrameworkMessages.Get("ChangesSavedSuccess"); } }         public static string ErrorSavingChanges { get { return FrameworkMessages.Get("ErrorSavingChanges"); } }     } }   The output looks fine but there is one problem. The connectionString and systemName are hard coded. so how can I create an flexible item template? One of features of item templates in visual studio is that you can create a designer wizard for your item template, so I can get connection information and system name there. now lets go on creating the vsix file.   3. Create Template In visual studio click on File > Export Template a wizard will show up. if first step click on Item Template on in the combo box select the project that contains Messages.tt. click next. Select Messages.tt from list in second step. click next. In the third step, you should choose References. For this template, System and System.Data are needed so choose them. click next. write down template name, description, if you like add a .ico file as the icon file and also preview image. Uncheck automatically add the templare … . Copy the output location in clip board. click finish.     4. Create VSIX Project In VS, Click File > New > Project > Extensibility > VSIX Project Type a name, ex. FrameworkMessages, Location, etc. The project will include a .vsixmanifest file. Fill in fields like Author, Product Name, Description, etc.   In Content section, click on Add Content. choose content type as Item Template. choose source as file. remember you have the template file address in clipboard? now paste it in front of file. click OK.     5. Build VSIX Project That’s it, build the project and go to output directory. You see a .vsix file. you can run it now. After restarting VS, if you click on a project > Add > New Item, you will see your item in list and you can add it. When you add this item to a project, if it has not references to System and System.Data they will be added. but the problem mentioned in step 2 is seen now.     6. Create Design Wizard for your Item Template Create a project i.e. Windows Application named ‘Framework.Messages.Design’, but better change its output type to Class Library. Add References to Microsoft.VisualStudio.TemplateWizardInterface and envdte Add a class Named MessagesDesigner in your project and Implement IWizard interface in it. This is what you should write: using System; using System.Collections.Generic; using System.Linq; using System.Text; using Microsoft.VisualStudio.TemplateWizard; using EnvDTE; namespace Framework.Messages.Design {     class MessageDesigner : IWizard     {         private bool CanAddProjectItem;         public void RunStarted(object automationObject, Dictionary<string, string> replacementsDictionary, WizardRunKind runKind, object[] customParams)         {             // Prompt user for Connection String and System Name in a Windows form (ShowDialog) here             // (try to provide good interface)             // if user clicks on cancel of your windows form return;             string connectionString = "connection;string"; // Set value from the form             string systemName = "system;name"; // Set value from the form             CanAddProjectItem = true;             replacementsDictionary.Add("$connectionString$", connectionString);             replacementsDictionary.Add("$systemName$", systemName);         }         public bool ShouldAddProjectItem(string filePath)         {             return CanAddProjectItem;         }         public void BeforeOpeningFile(ProjectItem projectItem)         {         }         public void ProjectFinishedGenerating(Project project)         {         }         public void ProjectItemFinishedGenerating(ProjectItem projectItem)         {         }         public void RunFinished()         {         }     } }   before your code runs  replacementsDictionary contains list of default template parameters. After that, two other parameters are added. Now build this project and copy the output assembly to [<VS Install Path>\Common7\IDE] folder.   your designer is ready.     The template that you had created is now added to your VSIX project. In windows explorer open your template zip file (extract it somewhere). open the .vstemplate file. first of all remove <ProjectItem SubType="Code" TargetFileName="$fileinputname$.cs" ReplaceParameters="true">Messages.cs</ProjectItem> because the .cs file is not to be intended to be a part of template and it will be generated. change value of ReplaceParameters for your .tt file to true to enable parameter replacement in this file. now right after </TemplateContent> end element, write this: <WizardExtension>   <Assembly>Framework.Messages.Design</Assembly>   <FullClassName>Framework.Messages.Design.MessageDesigner</FullClassName> </WizardExtension>   one other thing that you should do is to edit your .tt file and remove your .cs file. Lines 8 and 9 of your .tt file should be:     var connectionString = "$connectionString$";     var systemName = "$systemName$"; this parameters will be replaced when the item is added to a project. Save the contents to a zip file with same file name and replace the original file.   now again build your VSIX project, uninstall your extension. close VS. now run .vsix file. open vs, add an item of type messages to your project, bingo, your wizard form will show up. fill the fields and click ok, values are replaced in .tt file added.     that’s it. tried so hard to make this post brief, hope it was not so long…   Cheers Maziar

    Read the article

  • Down Tools Week Cometh: Kissing Goodbye to CVs/Resumes and Cover Letters

    - by Bart Read
    I haven't blogged about what I'm doing in my (not so new) temporary role as Red Gate's technical recruiter, mostly because it's been routine, business as usual stuff, and because I've been trying to understand the role by doing it. I think now though the time has come to get a little more radical, so I'm going to tell you why I want to largely eliminate CVs/resumes and cover letters from the application process for some of our technical roles, and why I think that might be a good thing for candidates (and for us). I have a terrible confession to make, or at least it's a terrible confession for a recruiter: I don't really like CV sifting, or reading cover letters, and, unless I've misread the mood around here, neither does anybody else. It's dull, it's time-consuming, and it's somewhat soul destroying because, when all is said and done, you're being paid to be incredibly judgemental about people based on relatively little information. I feel like I've dirtied myself by saying that - I mean, after all, it's a core part of my job - but it sucks, it really does. (And, of course, the truth is I'm still a software engineer at heart, and I'm always looking for ways to do things better.) On the flip side, I've never met anyone who likes writing their CV. It takes hours and hours of faffing around and massaging it into shape, and the whole process is beset by a gnawing anxiety, frustration, and insecurity. All you really want is a chance to demonstrate your skills - not just talk about them - and how do you do that in a CV or cover letter? Often the best candidates will include samples of their work (a portfolio, screenshots, links to websites, product downloads, etc.), but sometimes this isn't possible, or may not be appropriate, or you just don't think you're allowed because of what your school/university careers service has told you (more commonly an issue with grads, obviously). And what are we actually trying to find out about people with all of this? I think the common criteria are actually pretty basic: Smart Gets things done (thanks for these two Joel) Not an a55hole* (sorry, have to get around Simple Talk's swear filter - and thanks to Professor Robert I. Sutton for this one) *Of course, everyone has off days, and I don't honestly think we're too worried about somebody being a bit grumpy every now and again. We can do a bit better than this in the context of the roles I'm talking about: we can be more specific about what "gets things done" means, at least in part. For software engineers and interns, the non-exhaustive meaning of "gets things done" is: Excellent coder For test engineers, the non-exhaustive meaning of "gets things done" is: Good at finding problems in software Competent coder Team player, etc., to me, are covered by "not an a55hole". I don't expect people to be the life and soul of the party, or a wild extrovert - that's not what team player means, and it's not what "not an a55hole" means. Some of our best technical staff are quiet, introverted types, but they're still pleasant to work with. My problem is that I don't think the initial sift really helps us find out whether people are smart and get things done with any great efficacy. It's better than nothing, for sure, but it's not as good as it could be. It's also contentious, and potentially unfair/inequitable - if you want to get an idea of what I mean by this, check out the background information section at the bottom. Before I go any further, let's look at the Red Gate recruitment process for technical staff* as it stands now: (LOTS of) People apply for jobs. All these applications go through a brutal process of manual sifting, which eliminates between 75 and 90% of them, depending upon the role, and the time of year**. Depending upon the role, those who pass the sift will be sent an assessment or telescreened. For the purposes of this blog post I'm only interested in those that are sent some sort of programming assessment, or bug hunt. This means software engineers, test engineers, and software interns, which are the roles for which I receive the most applications. The telescreen tends to be reserved for project or product managers. Those that pass the assessment are invited in for first interview. This interview is mostly about assessing their technical skills***, although we're obviously on the look out for cultural fit red flags as well. If the first interview goes well we'll invite candidates back for a second interview. This is where team/cultural fit is really scoped out. We also use this interview to dive more deeply into certain areas of their skillset, and explore any concerns that may have come out of the first interview (these obviously won't have been serious or obvious enough to cause a rejection at that point, but are things we do need to look into before we'd consider making an offer). We might subsequently invite them in for lunch before we make them an offer. This tends to happen when we're recruiting somebody for a specific team and we'd like them to meet all the people they'll be working with directly. It's not an interview per se, but can prove pivotal if they don't gel with the team. Anyone who's made it this far will receive an offer from us. *We have a slightly quirky definition of "technical staff" as it relates to the technical recruiter role here. It includes software engineers, test engineers, software interns, user experience specialists, technical authors, project managers, product managers, and development managers, but does not include product support or information systems roles. **For example, the quality of graduate applicants overall noticeably drops as the academic year wears on, which is not to say that by now there aren't still stars in there, just that they're fewer and further between. ***Some organisations prefer to assess for team fit first, but I think assessing technical skills is a more effective initial filter - if they're the nicest person in the world, but can't cut a line of code they're not going to work out. Now, as I suggested in the title, Red Gate's Down Tools Week is upon us once again - next week in fact - and I had proposed as a project that we refactor and automate the first stage of marking our programming assessments. Marking assessments, and in fact organising the marking of them, is a somewhat time-consuming process, and we receive many assessment solutions that just don't make the cut, for whatever reason. Whilst I don't think it's possible to fully automate marking, I do think it ought to be possible to run a suite of automated tests over each candidate's solution to see whether or not it behaves correctly and, if it does, move on to a manual stage where we examine the code for structure, decomposition, style, readability, maintainability, etc. Obviously it's possible to use tools to generate potentially helpful metrics for some of these indices as well. This would obviously reduce the marking workload, and would provide candidates with quicker feedback about whether they've been successful - though I do wonder if waiting a tactful interval before sending a (nicely written) rejection might be wise. I duly scrawled out a picture of my ideal process, which looked like this: The problem is, as soon as I'd roughed it out, I realised that fundamentally it wasn't an ideal process at all, which explained the gnawing feeling of cognitive dissonance I'd been wrestling with all week, whilst I'd been trying to find time to do this. Here's what I mean. Automated assessment marking, and the associated infrastructure around that, makes it much easier for us to deal with large numbers of assessments. This means we can be much more permissive about who we send assessments out to or, in other words, we can give more candidates the opportunity to really demonstrate their skills to us. And this leads to a question: why not give everyone the opportunity to demonstrate their skills, to show that they're smart and can get things done? (Two or three of us even discussed this in the down tools week hustings earlier this week.) And isn't this a lot simpler than the alternative we'd been considering? (FYI, this was automated CV/cover letter sifting by some form of textual analysis to ideally eliminate the worst 50% or so of applications based on an analysis of the 20,000 or so historical applications we've received since 2007 - definitely not the basic keyword analysis beloved of recruitment agencies, since this would eliminate hardly anyone who was awful, but definitely would eliminate stellar Oxbridge candidates - #fail - or some nightmarishly complex Google-like system where we profile all our currently employees, only to realise that we're never going to get representative results because we don't have a statistically significant sample size in any given role - also #fail.) No, I think the new way is better. We let people self-select. We make them the masters (or mistresses) of their own destiny. We give applicants the power - we put their fate in their hands - by giving them the chance to demonstrate their skills, which is what they really want anyway, instead of requiring that they spend hours and hours creating a CV and cover letter that I'm going to evaluate for suitability, and make a value judgement about, in approximately 1 minute (give or take). It doesn't matter what university you attended, it doesn't matter if you had a bad year when you took your A-levels - here's your chance to shine, so take it and run with it. (As a side benefit, we cut the number of applications we have to sift by something like two thirds.) WIN! OK, yeah, sounds good, but will it actually work? That's an excellent question. My gut feeling is yes, and I'll justify why below (and hopefully have gone some way towards doing that above as well), but what I'm proposing here is really that we run an experiment for a period of time - probably a couple of months or so - and measure the outcomes we see: How many people apply? (Wouldn't be surprised or alarmed to see this cut by a factor of ten.) How many of them submit a good assessment? (More/less than at present?) How much overhead is there for us in dealing with these assessments compared to now? What are the success and failure rates at each interview stage compared to now? How many people are we hiring at the end of it compared to now? I think it'll work because I hypothesize that, amongst other things: It self-selects for people who really want to work at Red Gate which, at the moment, is something I have to try and assess based on their CV and cover letter - but if you're not that bothered about working here, why would you complete the assessment? Candidates who would submit a shoddy application probably won't feel motivated to do the assessment. Candidates who would demonstrate good attention to detail in their CV/cover letter will demonstrate good attention to detail in the assessment. In general, only the better candidates will complete and submit the assessment. Marking assessments is much less work so we'll be able to deal with any increase that we see (hopefully we will see). There are obviously other questions as well: Is plagiarism going to be a problem? Is there any way we can detect/discourage potential plagiarism? How do we assess candidates' education and experience? What about their ability to communicate in writing? Do we still want them to submit a CV afterwards if they pass assessment? Do we want to offer them the opportunity to tell us a bit about why they'd like the job when they submit their assessment? How does this affect our relationship with recruitment agencies we might use to hire for these roles? So, what's the objective for next week's Down Tools Week? Pretty simple really - we want to implement this process for the Graduate Software Engineer and Software Engineer positions that you can find on our website. I will be joined by a crack team of our best developers (Kevin Boyle, and new Red-Gater, Sam Blackburn), and recruiting hostess with the mostest Laura McQuillen, and hopefully a couple of others as well - if I can successfully twist more arms before Monday.* Hopefully by next Friday our experiment will be up and running, and we may have changed the way Red Gate recruits software engineers for good! Stay tuned and we'll let you know how it goes! *I'm going to play dirty by offering them beer and chocolate during meetings. Some background information: how agonising over the initial CV/cover letter sift helped lead us to bin it off entirely The other day I was agonising about the new university/good degree grade versus poor A-level results issue, and decided to canvas for other opinions to see if there was something I could do that was fairer than my current approach, which is almost always to reject. This generated quite an involved discussion on our Yammer site: I'm sure you can glean a pretty good impression of my own educational prejudices from that discussion as well, although I'm very open to changing my opinion - hopefully you've already figured that out from reading the rest of this post. Hopefully you can also trace a logical path from agonising about sifting to, "Uh, hang on, why on earth are we doing this anyway?!?" Technorati Tags: recruitment,hr,developers,testers,red gate,cv,resume,cover letter,assessment,sea change

    Read the article

  • C#/.NET Little Wonders: Interlocked CompareExchange()

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Two posts ago, I discussed the Interlocked Add(), Increment(), and Decrement() methods (here) for adding and subtracting values in a thread-safe, lightweight manner.  Then, last post I talked about the Interlocked Read() and Exchange() methods (here) for safely and efficiently reading and setting 32 or 64 bit values (or references).  This week, we’ll round out the discussion by talking about the Interlocked CompareExchange() method and how it can be put to use to exchange a value if the current value is what you expected it to be. Dirty reads can lead to bad results Many of the uses of Interlocked that we’ve explored so far have centered around either reading, setting, or adding values.  But what happens if you want to do something more complex such as setting a value based on the previous value in some manner? Perhaps you were creating an application that reads a current balance, applies a deposit, and then saves the new modified balance, where of course you’d want that to happen atomically.  If you read the balance, then go to save the new balance and between that time the previous balance has already changed, you’ll have an issue!  Think about it, if we read the current balance as $400, and we are applying a new deposit of $50.75, but meanwhile someone else deposits $200 and sets the total to $600, but then we write a total of $450.75 we’ve lost $200! Now, certainly for int and long values we can use Interlocked.Add() to handles these cases, and it works well for that.  But what if we want to work with doubles, for example?  Let’s say we wanted to add the numbers from 0 to 99,999 in parallel.  We could do this by spawning several parallel tasks to continuously add to a total: 1: double total = 0; 2:  3: Parallel.For(0, 10000, next => 4: { 5: total += next; 6: }); Were this run on one thread using a standard for loop, we’d expect an answer of 4,999,950,000 (the sum of all numbers from 0 to 99,999).  But when we run this in parallel as written above, we’ll likely get something far off.  The result of one of my runs, for example, was 1,281,880,740.  That is way off!  If this were banking software we’d be in big trouble with our clients.  So what happened?  The += operator is not atomic, it will read in the current value, add the result, then store it back into the total.  At any point in all of this another thread could read a “dirty” current total and accidentally “skip” our add.   So, to clean this up, we could use a lock to guarantee concurrency: 1: double total = 0.0; 2: object locker = new object(); 3:  4: Parallel.For(0, count, next => 5: { 6: lock (locker) 7: { 8: total += next; 9: } 10: }); Which will give us the correct result of 4,999,950,000.  One thing to note is that locking can be heavy, especially if the operation being locked over is trivial, or the life of the lock is a high percentage of the work being performed concurrently.  In the case above, the lock consumes pretty much all of the time of each parallel task – and the task being locked on is relatively trivial. Now, let me put in a disclaimer here before we go further: For most uses, lock is more than sufficient for your needs, and is often the simplest solution!    So, if lock is sufficient for most needs, why would we ever consider another solution?  The problem with locking is that it can suspend execution of your thread while it waits for the signal that the lock is free.  Moreover, if the operation being locked over is trivial, the lock can add a very high level of overhead.  This is why things like Interlocked.Increment() perform so well, instead of locking just to perform an increment, we perform the increment with an atomic, lockless method. As with all things performance related, it’s important to profile before jumping to the conclusion that you should optimize everything in your path.  If your profiling shows that locking is causing a high level of waiting in your application, then it’s time to consider lighter alternatives such as Interlocked. CompareExchange() – Exchange existing value if equal some value So let’s look at how we could use CompareExchange() to solve our problem above.  The general syntax of CompareExchange() is: T CompareExchange<T>(ref T location, T newValue, T expectedValue) If the value in location == expectedValue, then newValue is exchanged.  Either way, the value in location (before exchange) is returned. Actually, CompareExchange() is not one method, but a family of overloaded methods that can take int, long, float, double, pointers, or references.  It cannot take other value types (that is, can’t CompareExchange() two DateTime instances directly).  Also keep in mind that the version that takes any reference type (the generic overload) only checks for reference equality, it does not call any overridden Equals(). So how does this help us?  Well, we can grab the current total, and exchange the new value if total hasn’t changed.  This would look like this: 1: // grab the snapshot 2: double current = total; 3:  4: // if the total hasn’t changed since I grabbed the snapshot, then 5: // set it to the new total 6: Interlocked.CompareExchange(ref total, current + next, current); So what the code above says is: if the amount in total (1st arg) is the same as the amount in current (3rd arg), then set total to current + next (2nd arg).  This check and exchange pair is atomic (and thus thread-safe). This works if total is the same as our snapshot in current, but the problem, is what happens if they aren’t the same?  Well, we know that in either case we will get the previous value of total (before the exchange), back as a result.  Thus, we can test this against our snapshot to see if it was the value we expected: 1: // if the value returned is != current, then our snapshot must be out of date 2: // which means we didn't (and shouldn't) apply current + next 3: if (Interlocked.CompareExchange(ref total, current + next, current) != current) 4: { 5: // ooops, total was not equal to our snapshot in current, what should we do??? 6: } So what do we do if we fail?  That’s up to you and the problem you are trying to solve.  It’s possible you would decide to abort the whole transaction, or perhaps do a lightweight spin and try again.  Let’s try that: 1: double current = total; 2:  3: // make first attempt... 4: if (Interlocked.CompareExchange(ref total, current + i, current) != current) 5: { 6: // if we fail, go into a spin wait, spin, and try again until succeed 7: var spinner = new SpinWait(); 8:  9: do 10: { 11: spinner.SpinOnce(); 12: current = total; 13: } 14: while (Interlocked.CompareExchange(ref total, current + i, current) != current); 15: } 16:  This is not trivial code, but it illustrates a possible use of CompareExchange().  What we are doing is first checking to see if we succeed on the first try, and if so great!  If not, we create a SpinWait and then repeat the process of SpinOnce(), grab a fresh snapshot, and repeat until CompareExchnage() succeeds.  You may wonder why not a simple do-while here, and the reason it’s more efficient to only create the SpinWait until we absolutely know we need one, for optimal efficiency. Though not as simple (or maintainable) as a simple lock, this will perform better in many situations.  Comparing an unlocked (and wrong) version, a version using lock, and the Interlocked of the code, we get the following average times for multiple iterations of adding the sum of 100,000 numbers: 1: Unlocked money average time: 2.1 ms 2: Locked money average time: 5.1 ms 3: Interlocked money average time: 3 ms So the Interlocked.CompareExchange(), while heavier to code, came in lighter than the lock, offering a good compromise of safety and performance when we need to reduce contention. CompareExchange() - it’s not just for adding stuff… So that was one simple use of CompareExchange() in the context of adding double values -- which meant we couldn’t have used the simpler Interlocked.Add() -- but it has other uses as well. If you think about it, this really works anytime you want to create something new based on a current value without using a full lock.  For example, you could use it to create a simple lazy instantiation implementation.  In this case, we want to set the lazy instance only if the previous value was null: 1: public static class Lazy<T> where T : class, new() 2: { 3: private static T _instance; 4:  5: public static T Instance 6: { 7: get 8: { 9: // if current is null, we need to create new instance 10: if (_instance == null) 11: { 12: // attempt create, it will only set if previous was null 13: Interlocked.CompareExchange(ref _instance, new T(), (T)null); 14: } 15:  16: return _instance; 17: } 18: } 19: } So, if _instance == null, this will create a new T() and attempt to exchange it with _instance.  If _instance is not null, then it does nothing and we discard the new T() we created. This is a way to create lazy instances of a type where we are more concerned about locking overhead than creating an accidental duplicate which is not used.  In fact, the BCL implementation of Lazy<T> offers a similar thread-safety choice for Publication thread safety, where it will not guarantee only one instance was created, but it will guarantee that all readers get the same instance.  Another possible use would be in concurrent collections.  Let’s say, for example, that you are creating your own brand new super stack that uses a linked list paradigm and is “lock free”.  We could use Interlocked.CompareExchange() to be able to do a lockless Push() which could be more efficient in multi-threaded applications where several threads are pushing and popping on the stack concurrently. Yes, there are already concurrent collections in the BCL (in .NET 4.0 as part of the TPL), but it’s a fun exercise!  So let’s assume we have a node like this: 1: public sealed class Node<T> 2: { 3: // the data for this node 4: public T Data { get; set; } 5:  6: // the link to the next instance 7: internal Node<T> Next { get; set; } 8: } Then, perhaps, our stack’s Push() operation might look something like: 1: public sealed class SuperStack<T> 2: { 3: private volatile T _head; 4:  5: public void Push(T value) 6: { 7: var newNode = new Node<int> { Data = value, Next = _head }; 8:  9: if (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next) 10: { 11: var spinner = new SpinWait(); 12:  13: do 14: { 15: spinner.SpinOnce(); 16: newNode.Next = _head; 17: } 18: while (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next); 19: } 20: } 21:  22: // ... 23: } Notice a similar paradigm here as with adding our doubles before.  What we are doing is creating the new Node with the data to push, and with a Next value being the original node referenced by _head.  This will create our stack behavior (LIFO – Last In, First Out).  Now, we have to set _head to now refer to the newNode, but we must first make sure it hasn’t changed! So we check to see if _head has the same value we saved in our snapshot as newNode.Next, and if so, we set _head to newNode.  This is all done atomically, and the result is _head’s original value, as long as the original value was what we assumed it was with newNode.Next, then we are good and we set it without a lock!  If not, we SpinWait and try again. Once again, this is much lighter than locking in highly parallelized code with lots of contention.  If I compare the method above with a similar class using lock, I get the following results for pushing 100,000 items: 1: Locked SuperStack average time: 6 ms 2: Interlocked SuperStack average time: 4.5 ms So, once again, we can get more efficient than a lock, though there is the cost of added code complexity.  Fortunately for you, most of the concurrent collection you’d ever need are already created for you in the System.Collections.Concurrent (here) namespace – for more information, see my Little Wonders – The Concurent Collections Part 1 (here), Part 2 (here), and Part 3 (here). Summary We’ve seen before how the Interlocked class can be used to safely and efficiently add, increment, decrement, read, and exchange values in a multi-threaded environment.  In addition to these, Interlocked CompareExchange() can be used to perform more complex logic without the need of a lock when lock contention is a concern. The added efficiency, though, comes at the cost of more complex code.  As such, the standard lock is often sufficient for most thread-safety needs.  But if profiling indicates you spend a lot of time waiting for locks, or if you just need a lock for something simple such as an increment, decrement, read, exchange, etc., then consider using the Interlocked class’s methods to reduce wait. Technorati Tags: C#,CSharp,.NET,Little Wonders,Interlocked,CompareExchange,threading,concurrency

    Read the article

  • Guidance: A Branching strategy for Scrum Teams

    - by Martin Hinshelwood
    Having a good branching strategy will save your bacon, or at least your code. Be careful when deviating from your branching strategy because if you do, you may be worse off than when you started! This is one possible branching strategy for Scrum teams and I will not be going in depth with Scrum but you can find out more about Scrum by reading the Scrum Guide and you can even assess your Scrum knowledge by having a go at the Scrum Open Assessment. You can also read SSW’s Rules to Better Scrum using TFS which have been developed during our own Scrum implementations. Acknowledgements Bill Heys – Bill offered some good feedback on this post and helped soften the language. Note: Bill is a VS ALM Ranger and co-wrote the Branching Guidance for TFS 2010 Willy-Peter Schaub – Willy-Peter is an ex Visual Studio ALM MVP turned blue badge and has been involved in most of the guidance including the Branching Guidance for TFS 2010 Chris Birmele – Chris wrote some of the early TFS Branching and Merging Guidance. Dr Paul Neumeyer, Ph.D Parallel Processes, ScrumMaster and SSW Solution Architect – Paul wanted to have feature branches coming from the release branch as well. We agreed that this is really a spin-off that needs own project, backlog, budget and Team. Scenario: A product is developed RTM 1.0 is released and gets great sales.  Extra features are demanded but the new version will have double to price to pay to recover costs, work is approved by the guys with budget and a few sprints later RTM 2.0 is released.  Sales a very low due to the pricing strategy. There are lots of clients on RTM 1.0 calling out for patches. As I keep getting Reverse Integration and Forward Integration mixed up and Bill keeps slapping my wrists I thought I should have a reminder: You still seemed to use reverse and/or forward integration in the wrong context. I would recommend reviewing your document at the end to ensure that it agrees with the common understanding of these terms merge (forward integration) from parent to child (same direction as the branch), and merge  (reverse integration) from child to parent (the reverse direction of the branch). - one of my many slaps on the wrist from Bill Heys.   As I mentioned previously we are using a single feature branching strategy in our current project. The single biggest mistake developers make is developing against the “Main” or “Trunk” line. This ultimately leads to messy code as things are added and never finished. Your only alternative is to NEVER check in unless your code is 100%, but this does not work in practice, even with a single developer. Your ADD will kick in and your half-finished code will be finished enough to pass the build and the tests. You do use builds don’t you? Sadly, this is a very common scenario and I have had people argue that branching merely adds complexity. Then again I have seen the other side of the universe ... branching  structures from he... We should somehow convince everyone that there is a happy between no-branching and too-much-branching. - Willy-Peter Schaub, VS ALM Ranger, Microsoft   A key benefit of branching for development is to isolate changes from the stable Main branch. Branching adds sanity more than it adds complexity. We do try to stress in our guidance that it is important to justify a branch, by doing a cost benefit analysis. The primary cost is the effort to do merges and resolve conflicts. A key benefit is that you have a stable code base in Main and accept changes into Main only after they pass quality gates, etc. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft The second biggest mistake developers make is branching anything other than the WHOLE “Main” line. If you branch parts of your code and not others it gets out of sync and can make integration a nightmare. You should have your Source, Assets, Build scripts deployment scripts and dependencies inside the “Main” folder and branch the whole thing. Some departments within MSFT even go as far as to add the environments used to develop the product in there as well; although I would not recommend that unless you have a massive SQL cluster to house your source code. We tried the “add environment” back in South-Africa and while it was “phenomenal”, especially when having to switch between environments, the disk storage and processing requirements killed us. We opted for virtualization to skin this cat of keeping a ready-to-go environment handy. - Willy-Peter Schaub, VS ALM Ranger, Microsoft   I think people often think that you should have separate branches for separate environments (e.g. Dev, Test, Integration Test, QA, etc.). I prefer to think of deploying to environments (such as from Main to QA) rather than branching for QA). - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft   You can read about SSW’s Rules to better Source Control for some additional information on what Source Control to use and how to use it. There are also a number of branching Anti-Patterns that should be avoided at all costs: You know you are on the wrong track if you experience one or more of the following symptoms in your development environment: Merge Paranoia—avoiding merging at all cost, usually because of a fear of the consequences. Merge Mania—spending too much time merging software assets instead of developing them. Big Bang Merge—deferring branch merging to the end of the development effort and attempting to merge all branches simultaneously. Never-Ending Merge—continuous merging activity because there is always more to merge. Wrong-Way Merge—merging a software asset version with an earlier version. Branch Mania—creating many branches for no apparent reason. Cascading Branches—branching but never merging back to the main line. Mysterious Branches—branching for no apparent reason. Temporary Branches—branching for changing reasons, so the branch becomes a permanent temporary workspace. Volatile Branches—branching with unstable software assets shared by other branches or merged into another branch. Note   Branches are volatile most of the time while they exist as independent branches. That is the point of having them. The difference is that you should not share or merge branches while they are in an unstable state. Development Freeze—stopping all development activities while branching, merging, and building new base lines. Berlin Wall—using branches to divide the development team members, instead of dividing the work they are performing. -Branching and Merging Primer by Chris Birmele - Developer Tools Technical Specialist at Microsoft Pty Ltd in Australia   In fact, this can result in a merge exercise no-one wants to be involved in, merging hundreds of thousands of change sets and trying to get a consolidated build. Again, we need to find a happy medium. - Willy-Peter Schaub on Merge Paranoia Merge conflicts are generally the result of making changes to the same file in both the target and source branch. If you create merge conflicts, you will eventually need to resolve them. Often the resolution is manual. Merging more frequently allows you to resolve these conflicts close to when they happen, making the resolution clearer. Waiting weeks or months to resolve them, the Big Bang approach, means you are more likely to resolve conflicts incorrectly. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft   Figure: Main line, this is where your stable code lives and where any build has known entities, always passes and has a happy test that passes as well? Many development projects consist of, a single “Main” line of source and artifacts. This is good; at least there is source control . There are however a couple of issues that need to be considered. What happens if: you and your team are working on a new set of features and the customer wants a change to his current version? you are working on two features and the customer decides to abandon one of them? you have two teams working on different feature sets and their changes start interfering with each other? I just use labels instead of branches? That's a lot of “what if’s”, but there is a simple way of preventing this. Branching… In TFS, labels are not immutable. This does not mean they are not useful. But labels do not provide a very good development isolation mechanism. Branching allows separate code sets to evolve separately (e.g. Current with hotfixes, and vNext with new development). I don’t see how labels work here. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft   Figure: Creating a single feature branch means you can isolate the development work on that branch.   Its standard practice for large projects with lots of developers to use Feature branching and you can check the Branching Guidance for the latest recommendations from the Visual Studio ALM Rangers for other methods. In the diagram above you can see my recommendation for branching when using Scrum development with TFS 2010. It consists of a single Sprint branch to contain all the changes for the current sprint. The main branch has the permissions changes so contributors to the project can only Branch and Merge with “Main”. This will prevent accidental check-ins or checkouts of the “Main” line that would contaminate the code. The developers continue to develop on sprint one until the completion of the sprint. Note: In the real world, starting a new Greenfield project, this process starts at Sprint 2 as at the start of Sprint 1 you would have artifacts in version control and no need for isolation.   Figure: Once the sprint is complete the Sprint 1 code can then be merged back into the Main line. There are always good practices to follow, and one is to always do a Forward Integration from Main into Sprint 1 before you do a Reverse Integration from Sprint 1 back into Main. In this case it may seem superfluous, but this builds good muscle memory into your developer’s work ethic and means that no bad habits are learned that would interfere with additional Scrum Teams being added to the Product. The process of completing your sprint development: The Team completes their work according to their definition of done. Merge from “Main” into “Sprint1” (Forward Integration) Stabilize your code with any changes coming from other Scrum Teams working on the same product. If you have one Scrum Team this should be quick, but there may have been bug fixes in the Release branches. (we will talk about release branches later) Merge from “Sprint1” into “Main” to commit your changes. (Reverse Integration) Check-in Delete the Sprint1 branch Note: The Sprint 1 branch is no longer required as its useful life has been concluded. Check-in Done But you are not yet done with the Sprint. The goal in Scrum is to have a “potentially shippable product” at the end of every Sprint, and we do not have that yet, we only have finished code.   Figure: With Sprint 1 merged you can create a Release branch and run your final packaging and testing In 99% of all projects I have been involved in or watched, a “shippable product” only happens towards the end of the overall lifecycle, especially when sprints are short. The in-between releases are great demonstration releases, but not shippable. Perhaps it comes from my 80’s brain washing that we only ship when we reach the agreed quality and business feature bar. - Willy-Peter Schaub, VS ALM Ranger, Microsoft Although you should have been testing and packaging your code all the way through your Sprint 1 development, preferably using an automated process, you still need to test and package with stable unchanging code. This is where you do what at SSW we call a “Test Please”. This is first an internal test of the product to make sure it meets the needs of the customer and you generally use a resource external to your Team. Then a “Test Please” is conducted with the Product Owner to make sure he is happy with the output. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client?   Figure: If you find a deviation from the expected result you fix it on the Release branch. If during your final testing or your “Test Please” you find there are issues or bugs then you should fix them on the release branch. If you can’t fix them within the time box of your Sprint, then you will need to create a Bug and put it onto the backlog for prioritization by the Product owner. Make sure you leave plenty of time between your merge from the development branch to find and fix any problems that are uncovered. This process is commonly called Stabilization and should always be conducted once you have completed all of your User Stories and integrated all of your branches. Even once you have stabilized and released, you should not delete the release branch as you would with the Sprint branch. It has a usefulness for servicing that may extend well beyond the limited life you expect of it. Note: Don't get forced by the business into adding features into a Release branch instead that indicates the unspoken requirement is that they are asking for a product spin-off. In this case you can create a new Team Project and branch from the required Release branch to create a new Main branch for that product. And you create a whole new backlog to work from.   Figure: When the Team decides it is happy with the product you can create a RTM branch. Once you have fixed all the bugs you can, and added any you can’t to the Product Backlog, and you Team is happy with the result you can create a Release. This would consist of doing the final Build and Packaging it up ready for your Sprint Review meeting. You would then create a read-only branch that represents the code you “shipped”. This is really an Audit trail branch that is optional, but is good practice. You could use a Label, but Labels are not Auditable and if a dispute was raised by the customer you can produce a verifiable version of the source code for an independent party to check. Rare I know, but you do not want to be at the wrong end of a legal battle. Like the Release branch the RTM branch should never be deleted, or only deleted according to your companies legal policy, which in the UK is usually 7 years.   Figure: If you have made any changes in the Release you will need to merge back up to Main in order to finalise the changes. Nothing is really ever done until it is in Main. The same rules apply when merging any fixes in the Release branch back into Main and you should do a reverse merge before a forward merge, again for the muscle memory more than necessity at this stage. Your Sprint is now nearly complete, and you can have a Sprint Review meeting knowing that you have made every effort and taken every precaution to protect your customer’s investment. Note: In order to really achieve protection for both you and your client you would add Automated Builds, Automated Tests, Automated Acceptance tests, Acceptance test tracking, Unit Tests, Load tests, Web test and all the other good engineering practices that help produce reliable software.     Figure: After the Sprint Planning meeting the process begins again. Where the Sprint Review and Retrospective meetings mark the end of the Sprint, the Sprint Planning meeting marks the beginning. After you have completed your Sprint Planning and you know what you are trying to achieve in Sprint 2 you can create your new Branch to develop in. How do we handle a bug(s) in production that can’t wait? Although in Scrum the only work done should be on the backlog there should be a little buffer added to the Sprint Planning for contingencies. One of these contingencies is a bug in the current release that can’t wait for the Sprint to finish. But how do you handle that? Willy-Peter Schaub asked an excellent question on the release activities: In reality Sprint 2 starts when sprint 1 ends + weekend. Should we not cater for a possible parallelism between Sprint 2 and the release activities of sprint 1? It would introduce FI’s from main to sprint 2, I guess. Your “Figure: Merging print 2 back into Main.” covers, what I tend to believe to be reality in most cases. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I agree, and if you have a single Scrum team then your resources are limited. The Scrum Team is responsible for packaging and release, so at least one run at stabilization, package and release should be included in the Sprint time box. If more are needed on the current production release during the Sprint 2 time box then resource needs to be pulled from Sprint 2. The Product Owner and the Team have four choices (in order of disruption/cost): Backlog: Add the bug to the backlog and fix it in the next Sprint Buffer Time: Use any buffer time included in the current Sprint to fix the bug quickly Make time: Remove a Story from the current Sprint that is of equal value to the time lost fixing the bug(s) and releasing. Note: The Team must agree that it can still meet the Sprint Goal. Cancel Sprint: Cancel the sprint and concentrate all resource on fixing the bug(s) Note: This can be a very costly if the current sprint has already had a lot of work completed as it will be lost. The choice will depend on the complexity and severity of the bug(s) and both the Product Owner and the Team need to agree. In this case we will go with option #2 or #3 as they are uncomplicated but severe bugs. Figure: Real world issue where a bug needs fixed in the current release. If the bug(s) is urgent enough then then your only option is to fix it in place. You can edit the release branch to find and fix the bug, hopefully creating a test so it can’t happen again. Follow the prior process and conduct an internal and customer “Test Please” before releasing. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client?   Figure: After you have fixed the bug you need to ship again. You then need to again create an RTM branch to hold the version of the code you released in escrow.   Figure: Main is now out of sync with your Release. We now need to get these new changes back up into the Main branch. Do a reverse and then forward merge again to get the new code into Main. But what about the branch, are developers not working on Sprint 2? Does Sprint 2 now have changes that are not in Main and Main now have changes that are not in Sprint 2? Well, yes… and this is part of the hit you take doing branching. But would this scenario even have been possible without branching?   Figure: Getting the changes in Main into Sprint 2 is very important. The Team now needs to do a Forward Integration merge into their Sprint and resolve any conflicts that occur. Maybe the bug has already been fixed in Sprint 2, maybe the bug no longer exists! This needs to be identified and resolved by the developers before they continue to get further out of Sync with Main. Note: Avoid the “Big bang merge” at all costs.   Figure: Merging Sprint 2 back into Main, the Forward Integration, and R0 terminates. Sprint 2 now merges (Reverse Integration) back into Main following the procedures we have already established.   Figure: The logical conclusion. This then allows the creation of the next release. By now you should be getting the big picture and hopefully you learned something useful from this post. I know I have enjoyed writing it as I find these exploratory posts coupled with real world experience really help harden my understanding.  Branching is a tool; it is not a silver bullet. Don’t over use it, and avoid “Anti-Patterns” where possible. Although the diagram above looks complicated I hope showing you how it is formed simplifies it as much as possible.   Technorati Tags: Branching,Scrum,VS ALM,TFS 2010,VS2010

    Read the article

  • C#/.NET Little Wonders: The Concurrent Collections (1 of 3)

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  In the next few weeks, we will discuss the concurrent collections and how they have changed the face of concurrent programming. This week’s post will begin with a general introduction and discuss the ConcurrentStack<T> and ConcurrentQueue<T>.  Then in the following post we’ll discuss the ConcurrentDictionary<T> and ConcurrentBag<T>.  Finally, we shall close on the third post with a discussion of the BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. A brief history of collections In the beginning was the .NET 1.0 Framework.  And out of this framework emerged the System.Collections namespace, and it was good.  It contained all the basic things a growing programming language needs like the ArrayList and Hashtable collections.  The main problem, of course, with these original collections is that they held items of type object which means you had to be disciplined enough to use them correctly or you could end up with runtime errors if you got an object of a type you weren't expecting. Then came .NET 2.0 and generics and our world changed forever!  With generics the C# language finally got an equivalent of the very powerful C++ templates.  As such, the System.Collections.Generic was born and we got type-safe versions of all are favorite collections.  The List<T> succeeded the ArrayList and the Dictionary<TKey,TValue> succeeded the Hashtable and so on.  The new versions of the library were not only safer because they checked types at compile-time, in many cases they were more performant as well.  So much so that it's Microsoft's recommendation that the System.Collections original collections only be used for backwards compatibility. So we as developers came to know and love the generic collections and took them into our hearts and embraced them.  The problem is, thread safety in both the original collections and the generic collections can be problematic, for very different reasons. Now, if you are only doing single-threaded development you may not care – after all, no locking is required.  Even if you do have multiple threads, if a collection is “load-once, read-many” you don’t need to do anything to protect that container from multi-threaded access, as illustrated below: 1: public static class OrderTypeTranslator 2: { 3: // because this dictionary is loaded once before it is ever accessed, we don't need to synchronize 4: // multi-threaded read access 5: private static readonly Dictionary<string, char> _translator = new Dictionary<string, char> 6: { 7: {"New", 'N'}, 8: {"Update", 'U'}, 9: {"Cancel", 'X'} 10: }; 11:  12: // the only public interface into the dictionary is for reading, so inherently thread-safe 13: public static char? Translate(string orderType) 14: { 15: char charValue; 16: if (_translator.TryGetValue(orderType, out charValue)) 17: { 18: return charValue; 19: } 20:  21: return null; 22: } 23: } Unfortunately, most of our computer science problems cannot get by with just single-threaded applications or with multi-threading in a load-once manner.  Looking at  today's trends, it's clear to see that computers are not so much getting faster because of faster processor speeds -- we've nearly reached the limits we can push through with today's technologies -- but more because we're adding more cores to the boxes.  With this new hardware paradigm, it is even more important to use multi-threaded applications to take full advantage of parallel processing to achieve higher application speeds. So let's look at how to use collections in a thread-safe manner. Using historical collections in a concurrent fashion The early .NET collections (System.Collections) had a Synchronized() static method that could be used to wrap the early collections to make them completely thread-safe.  This paradigm was dropped in the generic collections (System.Collections.Generic) because having a synchronized wrapper resulted in atomic locks for all operations, which could prove overkill in many multithreading situations.  Thus the paradigm shifted to having the user of the collection specify their own locking, usually with an external object: 1: public class OrderAggregator 2: { 3: private static readonly Dictionary<string, List<Order>> _orders = new Dictionary<string, List<Order>>(); 4: private static readonly _orderLock = new object(); 5:  6: public void Add(string accountNumber, Order newOrder) 7: { 8: List<Order> ordersForAccount; 9:  10: // a complex operation like this should all be protected 11: lock (_orderLock) 12: { 13: if (!_orders.TryGetValue(accountNumber, out ordersForAccount)) 14: { 15: _orders.Add(accountNumber, ordersForAccount = new List<Order>()); 16: } 17:  18: ordersForAccount.Add(newOrder); 19: } 20: } 21: } Notice how we’re performing several operations on the dictionary under one lock.  With the Synchronized() static methods of the early collections, you wouldn’t be able to specify this level of locking (a more macro-level).  So in the generic collections, it was decided that if a user needed synchronization, they could implement their own locking scheme instead so that they could provide synchronization as needed. The need for better concurrent access to collections Here’s the problem: it’s relatively easy to write a collection that locks itself down completely for access, but anything more complex than that can be difficult and error-prone to write, and much less to make it perform efficiently!  For example, what if you have a Dictionary that has frequent reads but in-frequent updates?  Do you want to lock down the entire Dictionary for every access?  This would be overkill and would prevent concurrent reads.  In such cases you could use something like a ReaderWriterLockSlim which allows for multiple readers in a lock, and then once a writer grabs the lock it blocks all further readers until the writer is done (in a nutshell).  This is all very complex stuff to consider. Fortunately, this is where the Concurrent Collections come in.  The Parallel Computing Platform team at Microsoft went through great pains to determine how to make a set of concurrent collections that would have the best performance characteristics for general case multi-threaded use. Now, as in all things involving threading, you should always make sure you evaluate all your container options based on the particular usage scenario and the degree of parallelism you wish to acheive. This article should not be taken to understand that these collections are always supperior to the generic collections. Each fills a particular need for a particular situation. Understanding what each container is optimized for is key to the success of your application whether it be single-threaded or multi-threaded. General points to consider with the concurrent collections The MSDN points out that the concurrent collections all support the ICollection interface. However, since the collections are already synchronized, the IsSynchronized property always returns false, and SyncRoot always returns null.  Thus you should not attempt to use these properties for synchronization purposes. Note that since the concurrent collections also may have different operations than the traditional data structures you may be used to.  Now you may ask why they did this, but it was done out of necessity to keep operations safe and atomic.  For example, in order to do a Pop() on a stack you have to know the stack is non-empty, but between the time you check the stack’s IsEmpty property and then do the Pop() another thread may have come in and made the stack empty!  This is why some of the traditional operations have been changed to make them safe for concurrent use. In addition, some properties and methods in the concurrent collections achieve concurrency by creating a snapshot of the collection, which means that some operations that were traditionally O(1) may now be O(n) in the concurrent models.  I’ll try to point these out as we talk about each collection so you can be aware of any potential performance impacts.  Finally, all the concurrent containers are safe for enumeration even while being modified, but some of the containers support this in different ways (snapshot vs. dirty iteration).  Once again I’ll highlight how thread-safe enumeration works for each collection. ConcurrentStack<T>: The thread-safe LIFO container The ConcurrentStack<T> is the thread-safe counterpart to the System.Collections.Generic.Stack<T>, which as you may remember is your standard last-in-first-out container.  If you think of algorithms that favor stack usage (for example, depth-first searches of graphs and trees) then you can see how using a thread-safe stack would be of benefit. The ConcurrentStack<T> achieves thread-safe access by using System.Threading.Interlocked operations.  This means that the multi-threaded access to the stack requires no traditional locking and is very, very fast! For the most part, the ConcurrentStack<T> behaves like it’s Stack<T> counterpart with a few differences: Pop() was removed in favor of TryPop() Returns true if an item existed and was popped and false if empty. PushRange() and TryPopRange() were added Allows you to push multiple items and pop multiple items atomically. Count takes a snapshot of the stack and then counts the items. This means it is a O(n) operation, if you just want to check for an empty stack, call IsEmpty instead which is O(1). ToArray() and GetEnumerator() both also take snapshots. This means that iteration over a stack will give you a static view at the time of the call and will not reflect updates. Pushing on a ConcurrentStack<T> works just like you’d expect except for the aforementioned PushRange() method that was added to allow you to push a range of items concurrently. 1: var stack = new ConcurrentStack<string>(); 2:  3: // adding to stack is much the same as before 4: stack.Push("First"); 5:  6: // but you can also push multiple items in one atomic operation (no interleaves) 7: stack.PushRange(new [] { "Second", "Third", "Fourth" }); For looking at the top item of the stack (without removing it) the Peek() method has been removed in favor of a TryPeek().  This is because in order to do a peek the stack must be non-empty, but between the time you check for empty and the time you execute the peek the stack contents may have changed.  Thus the TryPeek() was created to be an atomic check for empty, and then peek if not empty: 1: // to look at top item of stack without removing it, can use TryPeek. 2: // Note that there is no Peek(), this is because you need to check for empty first. TryPeek does. 3: string item; 4: if (stack.TryPeek(out item)) 5: { 6: Console.WriteLine("Top item was " + item); 7: } 8: else 9: { 10: Console.WriteLine("Stack was empty."); 11: } Finally, to remove items from the stack, we have the TryPop() for single, and TryPopRange() for multiple items.  Just like the TryPeek(), these operations replace Pop() since we need to ensure atomically that the stack is non-empty before we pop from it: 1: // to remove items, use TryPop or TryPopRange to get multiple items atomically (no interleaves) 2: if (stack.TryPop(out item)) 3: { 4: Console.WriteLine("Popped " + item); 5: } 6:  7: // TryPopRange will only pop up to the number of spaces in the array, the actual number popped is returned. 8: var poppedItems = new string[2]; 9: int numPopped = stack.TryPopRange(poppedItems); 10:  11: foreach (var theItem in poppedItems.Take(numPopped)) 12: { 13: Console.WriteLine("Popped " + theItem); 14: } Finally, note that as stated before, GetEnumerator() and ToArray() gets a snapshot of the data at the time of the call.  That means if you are enumerating the stack you will get a snapshot of the stack at the time of the call.  This is illustrated below: 1: var stack = new ConcurrentStack<string>(); 2:  3: // adding to stack is much the same as before 4: stack.Push("First"); 5:  6: var results = stack.GetEnumerator(); 7:  8: // but you can also push multiple items in one atomic operation (no interleaves) 9: stack.PushRange(new [] { "Second", "Third", "Fourth" }); 10:  11: while(results.MoveNext()) 12: { 13: Console.WriteLine("Stack only has: " + results.Current); 14: } The only item that will be printed out in the above code is "First" because the snapshot was taken before the other items were added. This may sound like an issue, but it’s really for safety and is more correct.  You don’t want to enumerate a stack and have half a view of the stack before an update and half a view of the stack after an update, after all.  In addition, note that this is still thread-safe, whereas iterating through a non-concurrent collection while updating it in the old collections would cause an exception. ConcurrentQueue<T>: The thread-safe FIFO container The ConcurrentQueue<T> is the thread-safe counterpart of the System.Collections.Generic.Queue<T> class.  The concurrent queue uses an underlying list of small arrays and lock-free System.Threading.Interlocked operations on the head and tail arrays.  Once again, this allows us to do thread-safe operations without the need for heavy locks! The ConcurrentQueue<T> (like the ConcurrentStack<T>) has some departures from the non-concurrent counterpart.  Most notably: Dequeue() was removed in favor of TryDequeue(). Returns true if an item existed and was dequeued and false if empty. Count does not take a snapshot It subtracts the head and tail index to get the count.  This results overall in a O(1) complexity which is quite good.  It’s still recommended, however, that for empty checks you call IsEmpty instead of comparing Count to zero. ToArray() and GetEnumerator() both take snapshots. This means that iteration over a queue will give you a static view at the time of the call and will not reflect updates. The Enqueue() method on the ConcurrentQueue<T> works much the same as the generic Queue<T>: 1: var queue = new ConcurrentQueue<string>(); 2:  3: // adding to queue is much the same as before 4: queue.Enqueue("First"); 5: queue.Enqueue("Second"); 6: queue.Enqueue("Third"); For front item access, the TryPeek() method must be used to attempt to see the first item if the queue.  There is no Peek() method since, as you’ll remember, we can only peek on a non-empty queue, so we must have an atomic TryPeek() that checks for empty and then returns the first item if the queue is non-empty. 1: // to look at first item in queue without removing it, can use TryPeek. 2: // Note that there is no Peek(), this is because you need to check for empty first. TryPeek does. 3: string item; 4: if (queue.TryPeek(out item)) 5: { 6: Console.WriteLine("First item was " + item); 7: } 8: else 9: { 10: Console.WriteLine("Queue was empty."); 11: } Then, to remove items you use TryDequeue().  Once again this is for the same reason we have TryPeek() and not Peek(): 1: // to remove items, use TryDequeue. If queue is empty returns false. 2: if (queue.TryDequeue(out item)) 3: { 4: Console.WriteLine("Dequeued first item " + item); 5: } Just like the concurrent stack, the ConcurrentQueue<T> takes a snapshot when you call ToArray() or GetEnumerator() which means that subsequent updates to the queue will not be seen when you iterate over the results.  Thus once again the code below will only show the first item, since the other items were added after the snapshot. 1: var queue = new ConcurrentQueue<string>(); 2:  3: // adding to queue is much the same as before 4: queue.Enqueue("First"); 5:  6: var iterator = queue.GetEnumerator(); 7:  8: queue.Enqueue("Second"); 9: queue.Enqueue("Third"); 10:  11: // only shows First 12: while (iterator.MoveNext()) 13: { 14: Console.WriteLine("Dequeued item " + iterator.Current); 15: } Using collections concurrently You’ll notice in the examples above I stuck to using single-threaded examples so as to make them deterministic and the results obvious.  Of course, if we used these collections in a truly multi-threaded way the results would be less deterministic, but would still be thread-safe and with no locking on your part required! For example, say you have an order processor that takes an IEnumerable<Order> and handles each other in a multi-threaded fashion, then groups the responses together in a concurrent collection for aggregation.  This can be done easily with the TPL’s Parallel.ForEach(): 1: public static IEnumerable<OrderResult> ProcessOrders(IEnumerable<Order> orderList) 2: { 3: var proxy = new OrderProxy(); 4: var results = new ConcurrentQueue<OrderResult>(); 5:  6: // notice that we can process all these in parallel and put the results 7: // into our concurrent collection without needing any external locking! 8: Parallel.ForEach(orderList, 9: order => 10: { 11: var result = proxy.PlaceOrder(order); 12:  13: results.Enqueue(result); 14: }); 15:  16: return results; 17: } Summary Obviously, if you do not need multi-threaded safety, you don’t need to use these collections, but when you do need multi-threaded collections these are just the ticket! The plethora of features (I always think of the movie The Three Amigos when I say plethora) built into these containers and the amazing way they acheive thread-safe access in an efficient manner is wonderful to behold. Stay tuned next week where we’ll continue our discussion with the ConcurrentBag<T> and the ConcurrentDictionary<TKey,TValue>. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here.   Tweet Technorati Tags: C#,.NET,Concurrent Collections,Collections,Multi-Threading,Little Wonders,BlackRabbitCoder,James Michael Hare

    Read the article

  • C#/.NET Little Wonders: The ConcurrentDictionary

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  In this series of posts, we will discuss how the concurrent collections have been developed to help alleviate these multi-threading concerns.  Last week’s post began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>.  Today's post discusses the ConcurrentDictionary<T> (originally I had intended to discuss ConcurrentBag this week as well, but ConcurrentDictionary had enough information to create a very full post on its own!).  Finally next week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. Recap As you'll recall from the previous post, the original collections were object-based containers that accomplished synchronization through a Synchronized member.  While these were convenient because you didn't have to worry about writing your own synchronization logic, they were a bit too finely grained and if you needed to perform multiple operations under one lock, the automatic synchronization didn't buy much. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization.  This cuts both ways in that you have a lot more control as a developer over when and how fine-grained you want to synchronize, but on the other hand if you just want simple synchronization it creates more work. With .NET 4.0, we get the best of both worlds in generic collections.  A new breed of collections was born called the concurrent collections in the System.Collections.Concurrent namespace.  These amazing collections are fine-tuned to have best overall performance for situations requiring concurrent access.  They are not meant to replace the generic collections, but to simply be an alternative to creating your own locking mechanisms. Among those concurrent collections were the ConcurrentStack<T> and ConcurrentQueue<T> which provide classic LIFO and FIFO collections with a concurrent twist.  As we saw, some of the traditional methods that required calls to be made in a certain order (like checking for not IsEmpty before calling Pop()) were replaced in favor of an umbrella operation that combined both under one lock (like TryPop()). Now, let's take a look at the next in our series of concurrent collections!For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentDictionary – the fully thread-safe dictionary The ConcurrentDictionary<TKey,TValue> is the thread-safe counterpart to the generic Dictionary<TKey, TValue> collection.  Obviously, both are designed for quick – O(1) – lookups of data based on a key.  If you think of algorithms where you need lightning fast lookups of data and don’t care whether the data is maintained in any particular ordering or not, the unsorted dictionaries are generally the best way to go. Note: as a side note, there are sorted implementations of IDictionary, namely SortedDictionary and SortedList which are stored as an ordered tree and a ordered list respectively.  While these are not as fast as the non-sorted dictionaries – they are O(log2 n) – they are a great combination of both speed and ordering -- and still greatly outperform a linear search. Now, once again keep in mind that if all you need to do is load a collection once and then allow multi-threaded reading you do not need any locking.  Examples of this tend to be situations where you load a lookup or translation table once at program start, then keep it in memory for read-only reference.  In such cases locking is completely non-productive. However, most of the time when we need a concurrent dictionary we are interleaving both reads and updates.  This is where the ConcurrentDictionary really shines!  It achieves its thread-safety with no common lock to improve efficiency.  It actually uses a series of locks to provide concurrent updates, and has lockless reads!  This means that the ConcurrentDictionary gets even more efficient the higher the ratio of reads-to-writes you have. ConcurrentDictionary and Dictionary differences For the most part, the ConcurrentDictionary<TKey,TValue> behaves like it’s Dictionary<TKey,TValue> counterpart with a few differences.  Some notable examples of which are: Add() does not exist in the concurrent dictionary. This means you must use TryAdd(), AddOrUpdate(), or GetOrAdd().  It also means that you can’t use a collection initializer with the concurrent dictionary. TryAdd() replaced Add() to attempt atomic, safe adds. Because Add() only succeeds if the item doesn’t already exist, we need an atomic operation to check if the item exists, and if not add it while still under an atomic lock. TryUpdate() was added to attempt atomic, safe updates. If we want to update an item, we must make sure it exists first and that the original value is what we expected it to be.  If all these are true, we can update the item under one atomic step. TryRemove() was added to attempt atomic, safe removes. To safely attempt to remove a value we need to see if the key exists first, this checks for existence and removes under an atomic lock. AddOrUpdate() was added to attempt an thread-safe “upsert”. There are many times where you want to insert into a dictionary if the key doesn’t exist, or update the value if it does.  This allows you to make a thread-safe add-or-update. GetOrAdd() was added to attempt an thread-safe query/insert. Sometimes, you want to query for whether an item exists in the cache, and if it doesn’t insert a starting value for it.  This allows you to get the value if it exists and insert if not. Count, Keys, Values properties take a snapshot of the dictionary. Accessing these properties may interfere with add and update performance and should be used with caution. ToArray() returns a static snapshot of the dictionary. That is, the dictionary is locked, and then copied to an array as a O(n) operation.  GetEnumerator() is thread-safe and efficient, but allows dirty reads. Because reads require no locking, you can safely iterate over the contents of the dictionary.  The only downside is that, depending on timing, you may get dirty reads. Dirty reads during iteration The last point on GetEnumerator() bears some explanation.  Picture a scenario in which you call GetEnumerator() (or iterate using a foreach, etc.) and then, during that iteration the dictionary gets updated.  This may not sound like a big deal, but it can lead to inconsistent results if used incorrectly.  The problem is that items you already iterated over that are updated a split second after don’t show the update, but items that you iterate over that were updated a split second before do show the update.  Thus you may get a combination of items that are “stale” because you iterated before the update, and “fresh” because they were updated after GetEnumerator() but before the iteration reached them. Let’s illustrate with an example, let’s say you load up a concurrent dictionary like this: 1: // load up a dictionary. 2: var dictionary = new ConcurrentDictionary<string, int>(); 3:  4: dictionary["A"] = 1; 5: dictionary["B"] = 2; 6: dictionary["C"] = 3; 7: dictionary["D"] = 4; 8: dictionary["E"] = 5; 9: dictionary["F"] = 6; Then you have one task (using the wonderful TPL!) to iterate using dirty reads: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); And one task to attempt updates in a separate thread (probably): 1: // attempt updates in a separate thread 2: var updateTask = new Task(() => 3: { 4: // iterates, and updates the value by one 5: foreach (var pair in dictionary) 6: { 7: dictionary[pair.Key] = pair.Value + 1; 8: } 9: }); Now that we’ve done this, we can fire up both tasks and wait for them to complete: 1: // start both tasks 2: updateTask.Start(); 3: iterationTask.Start(); 4:  5: // wait for both to complete. 6: Task.WaitAll(updateTask, iterationTask); Now, if I you didn’t know about the dirty reads, you may have expected to see the iteration before the updates (such as A:1, B:2, C:3, D:4, E:5, F:6).  However, because the reads are dirty, we will quite possibly get a combination of some updated, some original.  My own run netted this result: 1: F:6 2: E:6 3: D:5 4: C:4 5: B:3 6: A:2 Note that, of course, iteration is not in order because ConcurrentDictionary, like Dictionary, is unordered.  Also note that both E and F show the value 6.  This is because the output task reached F before the update, but the updates for the rest of the items occurred before their output (probably because console output is very slow, comparatively). If we want to always guarantee that we will get a consistent snapshot to iterate over (that is, at the point we ask for it we see precisely what is in the dictionary and no subsequent updates during iteration), we should iterate over a call to ToArray() instead: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary.ToArray()) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); The atomic Try…() methods As you can imagine TryAdd() and TryRemove() have few surprises.  Both first check the existence of the item to determine if it can be added or removed based on whether or not the key currently exists in the dictionary: 1: // try add attempts an add and returns false if it already exists 2: if (dictionary.TryAdd("G", 7)) 3: Console.WriteLine("G did not exist, now inserted with 7"); 4: else 5: Console.WriteLine("G already existed, insert failed."); TryRemove() also has the virtue of returning the value portion of the removed entry matching the given key: 1: // attempt to remove the value, if it exists it is removed and the original is returned 2: int removedValue; 3: if (dictionary.TryRemove("C", out removedValue)) 4: Console.WriteLine("Removed C and its value was " + removedValue); 5: else 6: Console.WriteLine("C did not exist, remove failed."); Now TryUpdate() is an interesting creature.  You might think from it’s name that TryUpdate() first checks for an item’s existence, and then updates if the item exists, otherwise it returns false.  Well, note quite... It turns out when you call TryUpdate() on a concurrent dictionary, you pass it not only the new value you want it to have, but also the value you expected it to have before the update.  If the item exists in the dictionary, and it has the value you expected, it will update it to the new value atomically and return true.  If the item is not in the dictionary or does not have the value you expected, it is not modified and false is returned. 1: // attempt to update the value, if it exists and if it has the expected original value 2: if (dictionary.TryUpdate("G", 42, 7)) 3: Console.WriteLine("G existed and was 7, now it's 42."); 4: else 5: Console.WriteLine("G either didn't exist, or wasn't 7."); The composite Add methods The ConcurrentDictionary also has composite add methods that can be used to perform updates and gets, with an add if the item is not existing at the time of the update or get. The first of these, AddOrUpdate(), allows you to add a new item to the dictionary if it doesn’t exist, or update the existing item if it does.  For example, let’s say you are creating a dictionary of counts of stock ticker symbols you’ve subscribed to from a market data feed: 1: public sealed class SubscriptionManager 2: { 3: private readonly ConcurrentDictionary<string, int> _subscriptions = new ConcurrentDictionary<string, int>(); 4:  5: // adds a new subscription, or increments the count of the existing one. 6: public void AddSubscription(string tickerKey) 7: { 8: // add a new subscription with count of 1, or update existing count by 1 if exists 9: var resultCount = _subscriptions.AddOrUpdate(tickerKey, 1, (symbol, count) => count + 1); 10:  11: // now check the result to see if we just incremented the count, or inserted first count 12: if (resultCount == 1) 13: { 14: // subscribe to symbol... 15: } 16: } 17: } Notice the update value factory Func delegate.  If the key does not exist in the dictionary, the add value is used (in this case 1 representing the first subscription for this symbol), but if the key already exists, it passes the key and current value to the update delegate which computes the new value to be stored in the dictionary.  The return result of this operation is the value used (in our case: 1 if added, existing value + 1 if updated). Likewise, the GetOrAdd() allows you to attempt to retrieve a value from the dictionary, and if the value does not currently exist in the dictionary it will insert a value.  This can be handy in cases where perhaps you wish to cache data, and thus you would query the cache to see if the item exists, and if it doesn’t you would put the item into the cache for the first time: 1: public sealed class PriceCache 2: { 3: private readonly ConcurrentDictionary<string, double> _cache = new ConcurrentDictionary<string, double>(); 4:  5: // adds a new subscription, or increments the count of the existing one. 6: public double QueryPrice(string tickerKey) 7: { 8: // check for the price in the cache, if it doesn't exist it will call the delegate to create value. 9: return _cache.GetOrAdd(tickerKey, symbol => GetCurrentPrice(symbol)); 10: } 11:  12: private double GetCurrentPrice(string tickerKey) 13: { 14: // do code to calculate actual true price. 15: } 16: } There are other variations of these two methods which vary whether a value is provided or a factory delegate, but otherwise they work much the same. Oddities with the composite Add methods The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic.  It is important to note that the methods that use delegates execute those delegates outside of the lock.  This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads. This is not necessarily an issue, per se, but it is something you must consider in your design.  The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned!  Consider this scenario: A calls GetOrAdd and sees that the key does not currently exist, so it calls the delegate.  Now thread B also calls GetOrAdd and also sees that the key does not currently exist, and for whatever reason in this race condition it’s delegate completes first and it adds its new value to the dictionary.  Now A is done and goes to get the lock, and now sees that the item now exists.  In this case even though it called the delegate to create the item, it will pitch it because an item arrived between the time it attempted to create one and it attempted to add it. Let’s illustrate, assume this totally contrived example program which has a dictionary of char to int.  And in this dictionary we want to store a char and it’s ordinal (that is, A = 1, B = 2, etc).  So for our value generator, we will simply increment the previous value in a thread-safe way (perhaps using Interlocked): 1: public static class Program 2: { 3: private static int _nextNumber = 0; 4:  5: // the holder of the char to ordinal 6: private static ConcurrentDictionary<char, int> _dictionary 7: = new ConcurrentDictionary<char, int>(); 8:  9: // get the next id value 10: public static int NextId 11: { 12: get { return Interlocked.Increment(ref _nextNumber); } 13: } Then, we add a method that will perform our insert: 1: public static void Inserter() 2: { 3: for (int i = 0; i < 26; i++) 4: { 5: _dictionary.GetOrAdd((char)('A' + i), key => NextId); 6: } 7: } Finally, we run our test by starting two tasks to do this work and get the results… 1: public static void Main() 2: { 3: // 3 tasks attempting to get/insert 4: var tasks = new List<Task> 5: { 6: new Task(Inserter), 7: new Task(Inserter) 8: }; 9:  10: tasks.ForEach(t => t.Start()); 11: Task.WaitAll(tasks.ToArray()); 12:  13: foreach (var pair in _dictionary.OrderBy(p => p.Key)) 14: { 15: Console.WriteLine(pair.Key + ":" + pair.Value); 16: } 17: } If you run this with only one task, you get the expected A:1, B:2, ..., Z:26.  But running this in parallel you will get something a bit more complex.  My run netted these results: 1: A:1 2: B:3 3: C:4 4: D:5 5: E:6 6: F:7 7: G:8 8: H:9 9: I:10 10: J:11 11: K:12 12: L:13 13: M:14 14: N:15 15: O:16 16: P:17 17: Q:18 18: R:19 19: S:20 20: T:21 21: U:22 22: V:23 23: W:24 24: X:25 25: Y:26 26: Z:27 Notice that B is 3?  This is most likely because both threads attempted to call GetOrAdd() at roughly the same time and both saw that B did not exist, thus they both called the generator and one thread got back 2 and the other got back 3.  However, only one of those threads can get the lock at a time for the actual insert, and thus the one that generated the 3 won and the 3 was inserted and the 2 got discarded.  This is why on these methods your factory delegates should be careful not to have any logic that would be unsafe if the value they generate will be pitched in favor of another item generated at roughly the same time.  As such, it is probably a good idea to keep those generators as stateless as possible. Summary The ConcurrentDictionary is a very efficient and thread-safe version of the Dictionary generic collection.  It has all the benefits of type-safety that it’s generic collection counterpart does, and in addition is extremely efficient especially when there are more reads than writes concurrently. Tweet Technorati Tags: C#, .NET, Concurrent Collections, Collections, Little Wonders, Black Rabbit Coder,James Michael Hare

    Read the article

  • C#/.NET Little Wonders: The Generic Func Delegates

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Back in one of my three original “Little Wonders” Trilogy of posts, I had listed generic delegates as one of the Little Wonders of .NET.  Later, someone posted a comment saying said that they would love more detail on the generic delegates and their uses, since my original entry just scratched the surface of them. Last week, I began our look at some of the handy generic delegates built into .NET with a description of delegates in general, and the Action family of delegates.  For this week, I’ll launch into a look at the Func family of generic delegates and how they can be used to support generic, reusable algorithms and classes. Quick Delegate Recap Delegates are similar to function pointers in C++ in that they allow you to store a reference to a method.  They can store references to either static or instance methods, and can actually be used to chain several methods together in one delegate. Delegates are very type-safe and can be satisfied with any standard method, anonymous method, or a lambda expression.  They can also be null as well (refers to no method), so care should be taken to make sure that the delegate is not null before you invoke it. Delegates are defined using the keyword delegate, where the delegate’s type name is placed where you would typically place the method name: 1: // This delegate matches any method that takes string, returns nothing 2: public delegate void Log(string message); This delegate defines a delegate type named Log that can be used to store references to any method(s) that satisfies its signature (whether instance, static, lambda expression, etc.). Delegate instances then can be assigned zero (null) or more methods using the operator = which replaces the existing delegate chain, or by using the operator += which adds a method to the end of a delegate chain: 1: // creates a delegate instance named currentLogger defaulted to Console.WriteLine (static method) 2: Log currentLogger = Console.Out.WriteLine; 3:  4: // invokes the delegate, which writes to the console out 5: currentLogger("Hi Standard Out!"); 6:  7: // append a delegate to Console.Error.WriteLine to go to std error 8: currentLogger += Console.Error.WriteLine; 9:  10: // invokes the delegate chain and writes message to std out and std err 11: currentLogger("Hi Standard Out and Error!"); While delegates give us a lot of power, it can be cumbersome to re-create fairly standard delegate definitions repeatedly, for this purpose the generic delegates were introduced in various stages in .NET.  These support various method types with particular signatures. Note: a caveat with generic delegates is that while they can support multiple parameters, they do not match methods that contains ref or out parameters. If you want to a delegate to represent methods that takes ref or out parameters, you will need to create a custom delegate. We’ve got the Func… delegates Just like it’s cousin, the Action delegate family, the Func delegate family gives us a lot of power to use generic delegates to make classes and algorithms more generic.  Using them keeps us from having to define a new delegate type when need to make a class or algorithm generic. Remember that the point of the Action delegate family was to be able to perform an “action” on an item, with no return results.  Thus Action delegates can be used to represent most methods that take 0 to 16 arguments but return void.  You can assign a method The Func delegate family was introduced in .NET 3.5 with the advent of LINQ, and gives us the power to define a function that can be called on 0 to 16 arguments and returns a result.  Thus, the main difference between Action and Func, from a delegate perspective, is that Actions return nothing, but Funcs return a result. The Func family of delegates have signatures as follows: Func<TResult> – matches a method that takes no arguments, and returns value of type TResult. Func<T, TResult> – matches a method that takes an argument of type T, and returns value of type TResult. Func<T1, T2, TResult> – matches a method that takes arguments of type T1 and T2, and returns value of type TResult. Func<T1, T2, …, TResult> – and so on up to 16 arguments, and returns value of type TResult. These are handy because they quickly allow you to be able to specify that a method or class you design will perform a function to produce a result as long as the method you specify meets the signature. For example, let’s say you were designing a generic aggregator, and you wanted to allow the user to define how the values will be aggregated into the result (i.e. Sum, Min, Max, etc…).  To do this, we would ask the user of our class to pass in a method that would take the current total, the next value, and produce a new total.  A class like this could look like: 1: public sealed class Aggregator<TValue, TResult> 2: { 3: // holds method that takes previous result, combines with next value, creates new result 4: private Func<TResult, TValue, TResult> _aggregationMethod; 5:  6: // gets or sets the current result of aggregation 7: public TResult Result { get; private set; } 8:  9: // construct the aggregator given the method to use to aggregate values 10: public Aggregator(Func<TResult, TValue, TResult> aggregationMethod = null) 11: { 12: if (aggregationMethod == null) throw new ArgumentNullException("aggregationMethod"); 13:  14: _aggregationMethod = aggregationMethod; 15: } 16:  17: // method to add next value 18: public void Aggregate(TValue nextValue) 19: { 20: // performs the aggregation method function on the current result and next and sets to current result 21: Result = _aggregationMethod(Result, nextValue); 22: } 23: } Of course, LINQ already has an Aggregate extension method, but that works on a sequence of IEnumerable<T>, whereas this is designed to work more with aggregating single results over time (such as keeping track of a max response time for a service). We could then use this generic aggregator to find the sum of a series of values over time, or the max of a series of values over time (among other things): 1: // creates an aggregator that adds the next to the total to sum the values 2: var sumAggregator = new Aggregator<int, int>((total, next) => total + next); 3:  4: // creates an aggregator (using static method) that returns the max of previous result and next 5: var maxAggregator = new Aggregator<int, int>(Math.Max); So, if we were timing the response time of a web method every time it was called, we could pass that response time to both of these aggregators to get an idea of the total time spent in that web method, and the max time spent in any one call to the web method: 1: // total will be 13 and max 13 2: int responseTime = 13; 3: sumAggregator.Aggregate(responseTime); 4: maxAggregator.Aggregate(responseTime); 5:  6: // total will be 20 and max still 13 7: responseTime = 7; 8: sumAggregator.Aggregate(responseTime); 9: maxAggregator.Aggregate(responseTime); 10:  11: // total will be 40 and max now 20 12: responseTime = 20; 13: sumAggregator.Aggregate(responseTime); 14: maxAggregator.Aggregate(responseTime); The Func delegate family is useful for making generic algorithms and classes, and in particular allows the caller of the method or user of the class to specify a function to be performed in order to generate a result. What is the result of a Func delegate chain? If you remember, we said earlier that you can assign multiple methods to a delegate by using the += operator to chain them.  So how does this affect delegates such as Func that return a value, when applied to something like the code below? 1: Func<int, int, int> combo = null; 2:  3: // What if we wanted to aggregate the sum and max together? 4: combo += (total, next) => total + next; 5: combo += Math.Max; 6:  7: // what is the result? 8: var comboAggregator = new Aggregator<int, int>(combo); Well, in .NET if you chain multiple methods in a delegate, they will all get invoked, but the result of the delegate is the result of the last method invoked in the chain.  Thus, this aggregator would always result in the Math.Max() result.  The other chained method (the sum) gets executed first, but it’s result is thrown away: 1: // result is 13 2: int responseTime = 13; 3: comboAggregator.Aggregate(responseTime); 4:  5: // result is still 13 6: responseTime = 7; 7: comboAggregator.Aggregate(responseTime); 8:  9: // result is now 20 10: responseTime = 20; 11: comboAggregator.Aggregate(responseTime); So remember, you can chain multiple Func (or other delegates that return values) together, but if you do so you will only get the last executed result. Func delegates and co-variance/contra-variance in .NET 4.0 Just like the Action delegate, as of .NET 4.0, the Func delegate family is contra-variant on its arguments.  In addition, it is co-variant on its return type.  To support this, in .NET 4.0 the signatures of the Func delegates changed to: Func<out TResult> – matches a method that takes no arguments, and returns value of type TResult (or a more derived type). Func<in T, out TResult> – matches a method that takes an argument of type T (or a less derived type), and returns value of type TResult(or a more derived type). Func<in T1, in T2, out TResult> – matches a method that takes arguments of type T1 and T2 (or less derived types), and returns value of type TResult (or a more derived type). Func<in T1, in T2, …, out TResult> – and so on up to 16 arguments, and returns value of type TResult (or a more derived type). Notice the addition of the in and out keywords before each of the generic type placeholders.  As we saw last week, the in keyword is used to specify that a generic type can be contra-variant -- it can match the given type or a type that is less derived.  However, the out keyword, is used to specify that a generic type can be co-variant -- it can match the given type or a type that is more derived. On contra-variance, if you are saying you need an function that will accept a string, you can just as easily give it an function that accepts an object.  In other words, if you say “give me an function that will process dogs”, I could pass you a method that will process any animal, because all dogs are animals.  On the co-variance side, if you are saying you need a function that returns an object, you can just as easily pass it a function that returns a string because any string returned from the given method can be accepted by a delegate expecting an object result, since string is more derived.  Once again, in other words, if you say “give me a method that creates an animal”, I can pass you a method that will create a dog, because all dogs are animals. It really all makes sense, you can pass a more specific thing to a less specific parameter, and you can return a more specific thing as a less specific result.  In other words, pay attention to the direction the item travels (parameters go in, results come out).  Keeping that in mind, you can always pass more specific things in and return more specific things out. For example, in the code below, we have a method that takes a Func<object> to generate an object, but we can pass it a Func<string> because the return type of object can obviously accept a return value of string as well: 1: // since Func<object> is co-variant, this will access Func<string>, etc... 2: public static string Sequence(int count, Func<object> generator) 3: { 4: var builder = new StringBuilder(); 5:  6: for (int i=0; i<count; i++) 7: { 8: object value = generator(); 9: builder.Append(value); 10: } 11:  12: return builder.ToString(); 13: } Even though the method above takes a Func<object>, we can pass a Func<string> because the TResult type placeholder is co-variant and accepts types that are more derived as well: 1: // delegate that's typed to return string. 2: Func<string> stringGenerator = () => DateTime.Now.ToString(); 3:  4: // This will work in .NET 4.0, but not in previous versions 5: Sequence(100, stringGenerator); Previous versions of .NET implemented some forms of co-variance and contra-variance before, but .NET 4.0 goes one step further and allows you to pass or assign an Func<A, BResult> to a Func<Y, ZResult> as long as A is less derived (or same) as Y, and BResult is more derived (or same) as ZResult. Sidebar: The Func and the Predicate A method that takes one argument and returns a bool is generally thought of as a predicate.  Predicates are used to examine an item and determine whether that item satisfies a particular condition.  Predicates are typically unary, but you may also have binary and other predicates as well. Predicates are often used to filter results, such as in the LINQ Where() extension method: 1: var numbers = new[] { 1, 2, 4, 13, 8, 10, 27 }; 2:  3: // call Where() using a predicate which determines if the number is even 4: var evens = numbers.Where(num => num % 2 == 0); As of .NET 3.5, predicates are typically represented as Func<T, bool> where T is the type of the item to examine.  Previous to .NET 3.5, there was a Predicate<T> type that tended to be used (which we’ll discuss next week) and is still supported, but most developers recommend using Func<T, bool> now, as it prevents confusion with overloads that accept unary predicates and binary predicates, etc.: 1: // this seems more confusing as an overload set, because of Predicate vs Func 2: public static SomeMethod(Predicate<int> unaryPredicate) { } 3: public static SomeMethod(Func<int, int, bool> binaryPredicate) { } 4:  5: // this seems more consistent as an overload set, since just uses Func 6: public static SomeMethod(Func<int, bool> unaryPredicate) { } 7: public static SomeMethod(Func<int, int, bool> binaryPredicate) { } Also, even though Predicate<T> and Func<T, bool> match the same signatures, they are separate types!  Thus you cannot assign a Predicate<T> instance to a Func<T, bool> instance and vice versa: 1: // the same method, lambda expression, etc can be assigned to both 2: Predicate<int> isEven = i => (i % 2) == 0; 3: Func<int, bool> alsoIsEven = i => (i % 2) == 0; 4:  5: // but the delegate instances cannot be directly assigned, strongly typed! 6: // ERROR: cannot convert type... 7: isEven = alsoIsEven; 8:  9: // however, you can assign by wrapping in a new instance: 10: isEven = new Predicate<int>(alsoIsEven); 11: alsoIsEven = new Func<int, bool>(isEven); So, the general advice that seems to come from most developers is that Predicate<T> is still supported, but we should use Func<T, bool> for consistency in .NET 3.5 and above. Sidebar: Func as a Generator for Unit Testing One area of difficulty in unit testing can be unit testing code that is based on time of day.  We’d still want to unit test our code to make sure the logic is accurate, but we don’t want the results of our unit tests to be dependent on the time they are run. One way (of many) around this is to create an internal generator that will produce the “current” time of day.  This would default to returning result from DateTime.Now (or some other method), but we could inject specific times for our unit testing.  Generators are typically methods that return (generate) a value for use in a class/method. For example, say we are creating a CacheItem<T> class that represents an item in the cache, and we want to make sure the item shows as expired if the age is more than 30 seconds.  Such a class could look like: 1: // responsible for maintaining an item of type T in the cache 2: public sealed class CacheItem<T> 3: { 4: // helper method that returns the current time 5: private static Func<DateTime> _timeGenerator = () => DateTime.Now; 6:  7: // allows internal access to the time generator 8: internal static Func<DateTime> TimeGenerator 9: { 10: get { return _timeGenerator; } 11: set { _timeGenerator = value; } 12: } 13:  14: // time the item was cached 15: public DateTime CachedTime { get; private set; } 16:  17: // the item cached 18: public T Value { get; private set; } 19:  20: // item is expired if older than 30 seconds 21: public bool IsExpired 22: { 23: get { return _timeGenerator() - CachedTime > TimeSpan.FromSeconds(30.0); } 24: } 25:  26: // creates the new cached item, setting cached time to "current" time 27: public CacheItem(T value) 28: { 29: Value = value; 30: CachedTime = _timeGenerator(); 31: } 32: } Then, we can use this construct to unit test our CacheItem<T> without any time dependencies: 1: var baseTime = DateTime.Now; 2:  3: // start with current time stored above (so doesn't drift) 4: CacheItem<int>.TimeGenerator = () => baseTime; 5:  6: var target = new CacheItem<int>(13); 7:  8: // now add 15 seconds, should still be non-expired 9: CacheItem<int>.TimeGenerator = () => baseTime.AddSeconds(15); 10:  11: Assert.IsFalse(target.IsExpired); 12:  13: // now add 31 seconds, should now be expired 14: CacheItem<int>.TimeGenerator = () => baseTime.AddSeconds(31); 15:  16: Assert.IsTrue(target.IsExpired); Now we can unit test for 1 second before, 1 second after, 1 millisecond before, 1 day after, etc.  Func delegates can be a handy tool for this type of value generation to support more testable code.  Summary Generic delegates give us a lot of power to make truly generic algorithms and classes.  The Func family of delegates is a great way to be able to specify functions to calculate a result based on 0-16 arguments.  Stay tuned in the weeks that follow for other generic delegates in the .NET Framework!   Tweet Technorati Tags: .NET, C#, CSharp, Little Wonders, Generics, Func, Delegates

    Read the article

< Previous Page | 32 33 34 35 36