yield – Just yet another sexy c# keyword?
- by George Mamaladze
yield (see NSDN c# reference) operator came I guess with .NET 2.0 and I my feeling is that it’s not as wide used as it could (or should) be.
I am not going to talk here about necessarity and advantages of using iterator pattern when accessing custom sequences (just google it).
Let’s look at it from the clean code point of view. Let's see if it really helps us to keep our code understandable, reusable and testable.
Let’s say we want to iterate a tree and do something with it’s nodes, for instance calculate a sum of their values. So the most elegant way would be to build a recursive method performing a classic depth traversal returning the sum.
private int CalculateTreeSum(Node top)
{
int sumOfChildNodes = 0;
foreach (Node childNode in top.ChildNodes)
{
sumOfChildNodes += CalculateTreeSum(childNode);
}
return top.Value + sumOfChildNodes;
}
“Do One Thing”
Nevertheless it violates one of the most important rules “Do One Thing”. Our method CalculateTreeSum does two things at the same time. It travels inside the tree and performs some computation – in this case calculates sum. Doing two things in one method is definitely a bad thing because of several reasons:
· Understandability: Readability / refactoring
· Reuseability: when overriding - no chance to override computation without copying iteration code and vice versa.
· Testability: you are not able to test computation without constructing the tree and you are not able to test correctness of tree iteration.
I want to spend some more words on this last issue. How do you test the method CalculateTreeSum when it contains two in one: computation & iteration? The only chance is to construct a test tree and assert the result of the method call, in our case the sum against our expectation. And if the test fails you do not know wether was the computation algorithm wrong or was that the iteration? At the end to top it all off I tell you: according to Murphy’s Law the iteration will have a bug as well as the calculation. Both bugs in a combination will cause the sum to be accidentally exactly the same you expect and the test will PASS. J
Ok let’s use yield!
That’s why it is generally a very good idea not to mix but isolate “things”. Ok let’s use yield!
private int CalculateTreeSumClean(Node top)
{
IEnumerable<Node> treeNodes = GetTreeNodes(top);
return CalculateSum(treeNodes);
}
private int CalculateSum(IEnumerable<Node> nodes)
{
int sumOfNodes = 0;
foreach (Node node in nodes)
{
sumOfNodes += node.Value;
}
return sumOfNodes;
}
private IEnumerable<Node> GetTreeNodes(Node top)
{
yield return top;
foreach (Node childNode in top.ChildNodes)
{
foreach (Node currentNode in GetTreeNodes(childNode))
{
yield return currentNode;
}
}
}
Two methods does not know anything about each other. One contains calculation logic another jut the iteration logic. You can relpace the tree iteration algorithm from depth traversal to breath trevaersal or use stack or visitor pattern instead of recursion. This will not influence your calculation logic. And vice versa you can relace the sum with product or do whatever you want with node values, the calculateion algorithm is not aware of beeng working on some tree or graph.
How about not using yield?
Now let’s ask the question – what if we do not have yield operator?
The brief look at the generated code gives us an answer. The compiler generates a 150 lines long class to implement the iteration logic.
[CompilerGenerated]
private sealed class <GetTreeNodes>d__0 : IEnumerable<Node>, IEnumerable, IEnumerator<Node>, IEnumerator, IDisposable
{
...
150 Lines of generated code
...
}
Often we compromise code readability, cleanness, testability, etc. – to reduce number of classes, code lines, keystrokes and mouse clicks. This is the human nature - we are lazy. Knowing and using such a sexy construct like yield, allows us to be lazy, write very few lines of code and at the same time stay clean and do one thing in a method. That's why I generally welcome using staff like that.
Note: The above used recursive depth traversal algorithm is possibly the compact one but not the best one from the performance and memory utilization point of view. It was taken to emphasize on other primary aspects of this post.