More Fun with C# Iterators and Generators
- by James Michael Hare
In my last post, I talked quite a bit about iterators and how they can be really powerful tools for filtering a list of items down to a subset of items. This had both pros and cons over returning a full collection, which, in summary, were:
Pros:
If traversal is only partial, does not have to visit rest of collection.
If evaluation method is costly, only incurs that cost on elements visited.
Adds little to no garbage collection pressure.
Cons:
Very slight performance impact if you know caller will always consume all items in collection.
And as we saw in the last post, that con for the cost was very, very small and only really became evident on very tight loops consuming very large lists completely.
One of the key items to note, though, is the garbage! In the traditional (return a new collection) method, if you have a 1,000,000 element collection, and wish to transform or filter it in some way, you have to allocate space for that copy of the collection. That is, say you have a collection of 1,000,000 items and you want to double every item in the collection. Well, that means you have to allocate a collection to hold those 1,000,000 items to return, which is a lot especially if you are just going to use it once and toss it.
Iterators, though, don't have this problem. Each time you visit the node, it would return the doubled value of the node (in this example) and not allocate a second collection of 1,000,000 doubled items. Do you see the distinction? In both cases, we're consuming 1,000,000 items. But in one case we pass back each doubled item which is just an int (for example's sake) on the stack and in the other case, we allocate a list containing 1,000,000 items which then must be garbage collected.
So iterators in C# are pretty cool, eh? Well, here's one more thing a C# iterator can do that a traditional "return a new collection" transformation can't! It can return **unbounded** collections!
I know, I know, that smells a lot like an infinite loop, eh? Yes and no. Basically, you're relying on the caller to put the bounds on the list, and as long as the caller doesn't you keep going. Consider this example:
public static class Fibonacci
{
// returns the infinite fibonacci sequence
public static IEnumerable<int> Sequence()
{
int iteration = 0;
int first = 1;
int second = 1;
int current = 0;
while (true)
{
if (iteration++ < 2)
{
current = 1;
}
else
{
current = first + second;
second = first;
first = current;
}
yield return current;
}
}
}
Whoa, you say! Yes, that's an infinite loop! What the heck is going on there? Yes, that was intentional. Would it be better to have a fibonacci sequence that returns only a specific number of items? Perhaps, but that wouldn't give you the power to defer
the execution to the caller.
The beauty of this function is it is as infinite as the sequence itself! The fibonacci sequence is unbounded, and so is this method. It will continue to return fibonacci numbers for as long as you ask for them. Now that's not something you can do with a traditional method that would return a collection of ints representing each number. In that case you would eventually run out of memory as you got to higher and higher numbers. This method, though, never runs out of memory.
Now, that said, you do have to know when you use it that it is an infinite collection and bound it appropriately. Fortunately, Linq provides a lot of these extension methods for you!
Let's say you only want the first 10 fibonacci numbers:
foreach(var fib in Fibonacci.Sequence().Take(10))
{
Console.WriteLine(fib);
}
Or let's say you only want the fibonacci numbers that are less than 100:
foreach(var fib in Fibonacci.Sequence().TakeWhile(f => f < 100))
{
Console.WriteLine(fib);
}
So, you see, one of the nice things about iterators is their power to work with virtually any size (even infinite) collections without adding the garbage collection overhead of making new collections.
You can also do fun things like this to make a more "fluent" interface for for loops:
// A set of integer generator extension methods
public static class IntExtensions
{
// Begins counting to inifity, use To() to range this.
public static IEnumerable<int> Every(this int start)
{
// deliberately avoiding condition because keeps going
// to infinity for as long as values are pulled.
for (var i = start; ; ++i)
{
yield return i;
}
}
// Begins counting to infinity by the given step value, use To() to
public static IEnumerable<int> Every(this int start, int byEvery)
{
// deliberately avoiding condition because keeps going
// to infinity for as long as values are pulled.
for (var i = start; ; i += byEvery)
{
yield return i;
}
}
// Begins counting to inifity, use To() to range this.
public static IEnumerable<int> To(this int start, int end)
{
for (var i = start; i <= end; ++i)
{
yield return i;
}
}
// Ranges the count by specifying the upper range of the count.
public static IEnumerable<int> To(this IEnumerable<int> collection, int end)
{
return collection.TakeWhile(item => item <= end);
}
}
Note that there are two versions of each method. One that starts with an int and one that starts with an IEnumerable<int>. This
is to allow more power in chaining from either an existing collection or from an int. This lets you do things like:
// count from 1 to 30
foreach(var i in 1.To(30))
{
Console.WriteLine(i);
}
// count from 1 to 10 by 2s
foreach(var i in 0.Every(2).To(10))
{
Console.WriteLine(i);
}
// or, if you want an infinite sequence counting by 5s until something inside breaks you out...
foreach(var i in 0.Every(5))
{
if (someCondition)
{
break;
}
...
}
Yes, those are kinda play functions and not particularly useful, but they show some of the power of generators and
extension methods to form a fluid interface.
So what do you think? What are some of your favorite generators and iterators?