Why enumerator structs are a really bad idea

Posted by Simon Cooper on Simple Talk See other posts from Simple Talk or by Simon Cooper
Published on Wed, 19 May 2010 18:25:00 GMT Indexed on 2010/05/19 18:33 UTC
Read the original article Hit count: 386

Filed under:

If you've ever poked around the .NET class libraries in Reflector, I'm sure you would have noticed that the generic collection classes all have implementations of their IEnumerator as a struct rather than a class. As you will see, this design decision has some rather unfortunate side effects...

As is generally known in the .NET world, mutable structs are a Very Bad Idea; and there are several other blogs around explaining this (Eric Lippert's blog post explains the problem quite well). In the BCL, the generic collection enumerators are all mutable structs, as they need to keep track of where they are in the collection. This bit me quite hard when I was coding a wrapper around a LinkedList<int>.Enumerator. It boils down to this code:

sealed class EnumeratorWrapper : IEnumerator<int> {

    private readonly LinkedList<int>.Enumerator m_Enumerator;

    public EnumeratorWrapper(LinkedList<int> linkedList) {
        m_Enumerator = linkedList.GetEnumerator();
    }

    public int Current {
        get { return m_Enumerator.Current; }
    }

    object System.Collections.IEnumerator.Current {
        get { return Current; }
    }

    public bool MoveNext() {
        return m_Enumerator.MoveNext();
    }

    public void Reset() {
        ((System.Collections.IEnumerator)m_Enumerator).Reset();
    }

    public void Dispose() {
        m_Enumerator.Dispose();
    }
}

The key line here is the MoveNext method. When I initially coded this, I thought that the call to m_Enumerator.MoveNext() would alter the enumerator state in the m_Enumerator class variable and so the enumeration would proceed in an orderly fashion through the collection. However, when I ran this code it went into an infinite loop - the m_Enumerator.MoveNext() call wasn't actually changing the state in the m_Enumerator variable at all, and my code was looping forever on the first collection element. It was only after disassembling that method that I found out what was going on

The MoveNext method above results in the following IL:

.method public hidebysig newslot virtual final instance bool MoveNext() cil managed
{
    .maxstack 1
    .locals init (
        [0] bool CS$1$0000,
        [1] valuetype  
            [System]System.Collections.Generic.LinkedList`1/Enumerator CS$0$0001)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldfld valuetype
        [System]System.Collections.Generic.LinkedList`1/Enumerator 
            EnumeratorWrapper::m_Enumerator
    L_0007: stloc.1 
    L_0008: ldloca.s CS$0$0001
    L_000a: call instance bool 
        [System]System.Collections.Generic.LinkedList`1/Enumerator::MoveNext()
    L_000f: stloc.0 
    L_0010: br.s L_0012
    L_0012: ldloc.0 
    L_0013: ret 
}

Here, the important line is 0002 - m_Enumerator is accessed using the ldfld operator, which does the following:

Finds the value of a field in the object whose reference is currently on the evaluation stack.

So, what the MoveNext method is doing is the following:

public bool MoveNext() {
    LinkedList<int>.Enumerator CS$0$0001 = this.m_Enumerator;
    bool CS$1$0000 = CS$0$0001.MoveNext();
    return CS$1$0000;
}

The enumerator instance being modified by the call to MoveNext is the one stored in the CS$0$0001 variable on the stack, and not the one in the EnumeratorWrapper class instance. Hence why the state of m_Enumerator wasn't getting updated.

Hmm, ok. Well, why is it doing this? If you have a read of Eric Lippert's blog post about this issue, you'll notice he quotes a few sections of the C# spec. In particular, 7.5.4:

...if the field is readonly and the reference occurs outside an instance constructor of the class in which the field is declared, then the result is a value, namely the value of the field I in the object referenced by E.

And my m_Enumerator field is readonly! Indeed, if I remove the readonly from the class variable then the problem goes away, and the code works as expected. The IL confirms this:

.method public hidebysig newslot virtual final instance bool MoveNext() cil managed
{
    .maxstack 1
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldflda valuetype 
        [System]System.Collections.Generic.LinkedList`1/Enumerator 
            EnumeratorWrapper::m_Enumerator
    L_0007: call instance bool 
        [System]System.Collections.Generic.LinkedList`1/Enumerator::MoveNext()
    L_000c: stloc.0 
    L_000d: br.s L_000f
    L_000f: ldloc.0 
    L_0010: ret 
}

Notice on line 0002, instead of the ldfld we had before, we've got a ldflda, which does this:

Finds the address of a field in the object whose reference is currently on the evaluation stack.

Instead of loading the value, we're loading the address of the m_Enumerator field. So now the call to MoveNext modifies the enumerator stored in the class rather than on the stack, and everything works as expected.

Previously, I had thought enumerator structs were an odd but interesting feature of the BCL that I had used in the past to do linked list slices. However, effects like this only underline how dangerous mutable structs are, and I'm at a loss to explain why the enumerators were implemented as structs in the first place. (interestingly, the SortedList<TKey, TValue> enumerator is a struct but is private, which makes it even more odd - the only way it can be accessed is as a boxed IEnumerator!).

I would love to hear people's theories as to why the enumerators are implemented in such a fashion. And bonus points if you can explain why LinkedList<int>.Enumerator.Reset is an explicit implementation but Dispose is implicit...

Note to self: never ever ever code a mutable struct.

© Simple Talk or respective owner