Of C# Iterators and Performance

Posted by James Michael Hare on Geeks with Blogs See other posts from Geeks with Blogs or by James Michael Hare
Published on Mon, 19 Apr 2010 20:12:55 GMT Indexed on 2010/04/20 3:23 UTC
Read the original article Hit count: 608

Filed under:

Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++. Nope, I'm talking C# iterators. No, not enumerators, iterators.

So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well.

Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do. I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?"

Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post. Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration?

So, to begin, let's look at a couple of methods. This is a new (albeit contrived) method called Every(...). The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first). So Every(2) would return items 0, 2, 4, 6, etc.

Now, if you wanted to write this in the traditional way, you may come up with something like this:

public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)

{

List<T> newList = new List<T>();

int count = 0;

foreach (var i in list)

{

if ((count++ % interval) == 0)

{

newList.Add(i);

}

return newList;

}

So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item. Pretty straight forward.

The problem? Well, Every<T>(...) will construct a list containing every nth item whether or not you care. What happens if you were searching this result for a certain item and find that item after five tries? You would have generated the rest of the list for nothing.

Enter iterators. This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for. This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list.

We see this all the time in Linq, where many expressions are chained together to do complex processing on a list. This would be very expensive if each of these expressions evaluated their entire possible result set on call.

Let's look at the same example function, this time using an iterator:

public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)

{

int count = 0;

foreach (var i in list)

{

if ((count++ % interval) == 0)

{

yield return i;

}

Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement. What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends.

Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time.

It doesn't seem like a big deal, does it? But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit.

This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained.

For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10. We could write that as:

List<int> someList = new List<int>();

// fill list here

someList.Every(10).TakeWhile(i => i <= 10);

Now is the difference more apparent? If we use the first form of Every that makes a copy of the list. It's going to copy the entire list whether we will need those items or not, that can be costly!

With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated.

So, sounds neat eh? But what's the cost is what you're probably wondering. So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things.

Now, iteration isn't free. If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead:

Copy vs Iterator on 100% of Collection (10,000 iterations)
Collection Size	Num Iterated	Type	Total ms
5	5	Copy	5
5	5	Iterator	5
50	50	Copy	28
50	50	Iterator	27
500	500	Copy	227
500	500	Iterator	247
5000	5000	Copy	2266
5000	5000	Iterator	2444
50,000	50,000	Copy	24,443
50,000	50,000	Iterator	24,719
500,000	500,000	Copy	250,024
500,000	500,000	Iterator	251,521

Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists. In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price. However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect.

However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead. Let's look at what happens if you stop looking after 80% of the list:

Copy vs Iterator on 80% of Collection (10,000 iterations)
Collection Size	Num Iterated	Type	Total ms
5	4	Copy	5
5	4	Iterator	5
50	40	Copy	27
50	40	Iterator	23
500	400	Copy	215
500	400	Iterator	200
5000	4000	Copy	2099
5000	4000	Iterator	1962
50,000	40,000	Copy	22,385
50,000	40,000	Iterator	19,599
500,000	400,000	Copy	236,427
500,000	400,000	Iterator	196,010

Notice that the iterator form is now operating quite a bit faster. But the savings really add up if you stop on average at 50% (which most searches would typically do):

Copy vs Iterator on 50% of Collection (10,000 iterations)
Collection Size	Num Iterated	Type	Total ms
5	2	Copy	5
5	2	Iterator	4
50	25	Copy	25
50	25	Iterator	16
500	250	Copy	188
500	250	Iterator	126
5000	2500	Copy	1854
5000	2500	Iterator	1226
50,000	25,000	Copy	19,839
50,000	25,000	Iterator	12,233
500,000	250,000	Copy	208,667
500,000	250,000	Iterator	122,336

Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time. And this is only for one level deep. If we are using this in a chain of query expressions it only adds to the savings.

So my recommendation? If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator. The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains.

Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

Developer IT

Of C# Iterators and Performance - Developer IT

Of C# Iterators and Performance

Categories cloud