Subterranean IL: Generics and array covariance
- by Simon Cooper
Arrays in .NET are curious beasts. They are the only built-in collection types in the CLR, and SZ-arrays (single dimension, zero-indexed) have their own commands and IL syntax. One of their stranger properties is they have a kind of built-in covariance long before generic variance was added in .NET 4. However, this causes a subtle but important problem with generics. First of all, we need to briefly recap on array covariance.
SZ-array covariance
To demonstrate, I'll tweak the classes I introduced in my previous posts:
public class IncrementableClass {
public int Value;
public virtual void Increment(int incrementBy) {
Value += incrementBy;
}
}
public class IncrementableClassx2 : IncrementableClass {
public override void Increment(int incrementBy) {
base.Increment(incrementBy);
base.Increment(incrementBy);
}
}
In the CLR, SZ-arrays of reference types are implicitly convertible to arrays of the element's supertypes, all the way up to object (note that this does not apply to value types). That is, an instance of IncrementableClassx2[] can be used wherever a IncrementableClass[] or object[] is required. When an SZ-array could be used in this fashion, a run-time type check is performed when you try to insert an object into the array to make sure you're not trying to insert an instance of IncrementableClass into an IncrementableClassx2[].
This check means that the following code will compile fine but will fail at run-time:
IncrementableClass[] array = new IncrementableClassx2[1];
array[0] = new IncrementableClass(); // throws ArrayTypeMismatchException
These checks are enforced by the various stelem* and ldelem* il instructions in such a way as to ensure you can't insert a IncrementableClass into a IncrementableClassx2[]. For the rest of this post, however, I'm going to concentrate on the ldelema instruction.
ldelema
This instruction pops the array index (int32) and array reference (O) off the stack, and pushes a pointer (&) to the corresponding array element. However, unlike the ldelem instruction, the instruction's type argument must match the run-time array type exactly. This is because, once you've got a managed pointer, you can use that pointer to both load and store values in that array element using the ldind* and stind* (load/store indirect) instructions. As the same pointer can be used for both input and output to the array, the type argument to ldelema must be invariant. At the time, this was a perfectly reasonable restriction, and maintained array type-safety within managed code.
However, along came generics, and with it the constrained callvirt instruction. So, what happens when we combine array covariance and constrained callvirt?
.method public static void CallIncrementArrayValue() {
// IncrementableClassx2[] arr = new IncrementableClassx2[1]
ldc.i4.1
newarr IncrementableClassx2
// arr[0] = new IncrementableClassx2();
dup
newobj instance void IncrementableClassx2::.ctor()
ldc.i4.0
stelem.ref
// IncrementArrayValue<IncrementableClass>(arr, 0)
// here, we're treating an IncrementableClassx2[] as IncrementableClass[]
dup
ldc.i4.0
call void IncrementArrayValue<class IncrementableClass>(!!0[],int32)
// ...
ret
}
.method public static void IncrementArrayValue<(IncrementableClass) T>(
!!T[] arr, int32 index) {
// arr[index].Increment(1)
ldarg.0
ldarg.1
ldelema !!T
ldc.i4.1
constrained. !!T
callvirt instance void IIncrementable::Increment(int32)
ret
}
And the result:
Unhandled Exception: System.ArrayTypeMismatchException:
Attempted to access an element as a type incompatible with the array.
at IncrementArrayValue[T](T[] arr, Int32 index)
at CallIncrementArrayValue()
Hmm. We're instantiating the generic method as IncrementArrayValue<IncrementableClass>, but passing in an IncrementableClassx2[], hence the ldelema instruction is failing as it's expecting an IncrementableClass[].
On features and feature conflicts
What we've got here is a conflict between existing behaviour (ldelema ensuring type safety on covariant arrays) and new behaviour (managed pointers to object references used for every constrained callvirt on generic type instances). And, although this is an edge case, there is no general workaround. The generic method could be hidden behind several layers of assemblies, wrappers and interfaces that make it a requirement to use array covariance when calling the generic method. Furthermore, this will only fail at runtime, whereas compile-time safety is what generics were designed for!
The solution is the readonly. prefix instruction. This modifies the ldelema instruction to ignore the exact type check for arrays of reference types, and so it lets us take the address of array elements using a covariant type to the actual run-time type of the array:
.method public static void IncrementArrayValue<(IncrementableClass) T>(
!!T[] arr, int32 index) {
// arr[index].Increment(1)
ldarg.0
ldarg.1
readonly.
ldelema !!T
ldc.i4.1
constrained. !!T
callvirt instance void IIncrementable::Increment(int32)
ret
}
But what about type safety? In return for ignoring the type check, the resulting controlled mutability pointer can only be used in the following situations:
As the object parameter to ldfld, ldflda, stfld, call and constrained callvirt instructions
As the pointer parameter to ldobj or ldind*
As the source parameter to cpobj
In other words, the only operations allowed are those that read from the pointer; stind* and similar that alter the pointer itself are banned. This ensures that the array element we're pointing to won't be changed to anything untoward, and so type safety within the array is maintained.
This is a typical example of the maxim that whenever you add a feature to a program, you have to consider how that feature interacts with every single one of the existing features. Although an edge case, the readonly. prefix instruction ensures that generics and array covariance work together and that compile-time type safety is maintained.
Tune in next time for a look at the .ctor generic type constraint, and what it means.