Why enumerator structs are a really bad idea
Posted
by Simon Cooper
on Simple Talk
See other posts from Simple Talk
or by Simon Cooper
Published on Wed, 19 May 2010 18:25:00 GMT
Indexed on
2010/05/19
18:33 UTC
Read the original article
Hit count: 386
If you've ever poked around the .NET class libraries in Reflector, I'm sure you would have noticed that the generic collection classes all have implementations of their IEnumerator
as a struct rather than a class. As you will see, this design decision has some rather unfortunate side effects...
As is generally known in the .NET world, mutable structs are a Very Bad Idea; and there are several other blogs around explaining this (Eric Lippert's blog post explains the problem quite well). In the BCL, the generic collection enumerators are all mutable structs, as they need to keep track of where they are in the collection. This bit me quite hard when I was coding a wrapper around a LinkedList<int>.Enumerator
. It boils down to this code:
sealed class EnumeratorWrapper : IEnumerator<int> { private readonly LinkedList<int>.Enumerator m_Enumerator; public EnumeratorWrapper(LinkedList<int> linkedList) { m_Enumerator = linkedList.GetEnumerator(); } public int Current { get { return m_Enumerator.Current; } } object System.Collections.IEnumerator.Current { get { return Current; } } public bool MoveNext() { return m_Enumerator.MoveNext(); } public void Reset() { ((System.Collections.IEnumerator)m_Enumerator).Reset(); } public void Dispose() { m_Enumerator.Dispose(); } }
The key line here is the MoveNext
method. When I initially coded this, I thought that the call to m_Enumerator.MoveNext()
would alter the enumerator state in the m_Enumerator
class variable and so the enumeration would proceed in an orderly fashion through the collection. However, when I ran this code it went into an infinite loop - the m_Enumerator.MoveNext()
call wasn't actually changing the state in the m_Enumerator
variable at all, and my code was looping forever on the first collection element. It was only after disassembling that method that I found out what was going on
The MoveNext
method above results in the following IL:
.method public hidebysig newslot virtual final instance bool MoveNext() cil managed { .maxstack 1 .locals init ( [0] bool CS$1$0000, [1] valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator CS$0$0001) L_0000: nop L_0001: ldarg.0 L_0002: ldfld valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator EnumeratorWrapper::m_Enumerator L_0007: stloc.1 L_0008: ldloca.s CS$0$0001 L_000a: call instance bool [System]System.Collections.Generic.LinkedList`1/Enumerator::MoveNext() L_000f: stloc.0 L_0010: br.s L_0012 L_0012: ldloc.0 L_0013: ret }
Here, the important line is 0002 - m_Enumerator
is accessed using the ldfld
operator, which does the following:
Finds the value of a field in the object whose reference is currently on the evaluation stack.
So, what the MoveNext
method is doing is the following:
public bool MoveNext() { LinkedList<int>.Enumerator CS$0$0001 = this.m_Enumerator; bool CS$1$0000 = CS$0$0001.MoveNext(); return CS$1$0000; }
The enumerator instance being modified by the call to MoveNext
is the one stored in the CS$0$0001
variable on the stack, and not the one in the EnumeratorWrapper
class instance. Hence why the state of m_Enumerator
wasn't getting updated.
Hmm, ok. Well, why is it doing this? If you have a read of Eric Lippert's blog post about this issue, you'll notice he quotes a few sections of the C# spec. In particular, 7.5.4:
...if the field is readonly and the reference occurs outside an instance constructor of the class in which the field is declared, then the result is a value, namely the value of the field I in the object referenced by E.
And my m_Enumerator
field is readonly! Indeed, if I remove the readonly
from the class variable then the problem goes away, and the code works as expected. The IL confirms this:
.method public hidebysig newslot virtual final instance bool MoveNext() cil managed { .maxstack 1 .locals init ( [0] bool CS$1$0000) L_0000: nop L_0001: ldarg.0 L_0002: ldflda valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator EnumeratorWrapper::m_Enumerator L_0007: call instance bool [System]System.Collections.Generic.LinkedList`1/Enumerator::MoveNext() L_000c: stloc.0 L_000d: br.s L_000f L_000f: ldloc.0 L_0010: ret }
Notice on line 0002, instead of the ldfld
we had before, we've got a ldflda
, which does this:
Finds the address of a field in the object whose reference is currently on the evaluation stack.
Instead of loading the value, we're loading the address of the m_Enumerator
field. So now the call to MoveNext
modifies the enumerator stored in the class rather than on the stack, and everything works as expected.
Previously, I had thought enumerator structs were an odd but interesting feature of the BCL that I had used in the past to do linked list slices. However, effects like this only underline how dangerous mutable structs are, and I'm at a loss to explain why the enumerators were implemented as structs in the first place. (interestingly, the SortedList<TKey, TValue>
enumerator is a struct but is private, which makes it even more odd - the only way it can be accessed is as a boxed IEnumerator
!).
I would love to hear people's theories as to why the enumerators are implemented in such a fashion. And bonus points if you can explain why LinkedList<int>.Enumerator.Reset
is an explicit implementation but Dispose
is implicit...
Note to self: never ever ever code a mutable struct.
© Simple Talk or respective owner