Anatomy of a .NET Assembly - Custom attribute encoding

Posted by Simon Cooper on Simple Talk See other posts from Simple Talk or by Simon Cooper
Published on Fri, 03 Jun 2011 15:07:00 GMT Indexed on 2011/06/20 16:34 UTC
Read the original article Hit count: 445

In my previous post, I covered how field, method, and other types of signatures are encoded in a .NET assembly. Custom attribute signatures differ quite a bit from these, which consequently affects attribute specifications in C#.

Custom attribute specifications

In C#, you can apply a custom attribute to a type or type member, specifying a constructor as well as the values of fields or properties on the attribute type:

public class ExampleAttribute : Attribute {
    
    public ExampleAttribute(int ctorArg1, string ctorArg2) { ... }
    
    public Type ExampleType { get; set; }
}

[Example(5, "6", ExampleType = typeof(string))]
public class C { ... }

How does this specification actually get encoded and stored in an assembly?

Specification blob values

Custom attribute specification signatures use the same building blocks as other types of signatures; the ELEMENT_TYPE structure. However, they significantly differ from other types of signatures, in that the actual parameter values need to be stored along with type information.

There are two types of specification arguments in a signature blob; fixed args and named args. Fixed args are the arguments to the attribute type constructor, named arguments are specified after the constructor arguments to provide a value to a field or property on the constructed attribute type (PropertyName = propValue)

Values in an attribute blob are limited to one of the basic types (one of the number types, character, or boolean), a reference to a type, an enum (which, in .NET, has to use one of the integer types as a base representation), or arrays of any of those.

Enums and the basic types are easy to store in a blob - you simply store the binary representation. Strings are stored starting with a compressed integer indicating the length of the string, followed by the UTF8 characters. Array values start with an integer indicating the number of elements in the array, then the item values concatentated together.

Rather than using a coded token, Type values are stored using a string representing the type name and fully qualified assembly name (for example, MyNs.MyType, MyAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0123456789abcdef). If the type is in the current assembly or mscorlib then just the type name can be used. This is probably done to prevent direct references between assemblies solely because of attribute specification arguments; assemblies can be loaded in the reflection-only context and attribute arguments still processed, without loading the entire assembly.

Fixed and named arguments

Each entry in the CustomAttribute metadata table contains a reference to the object the attribute is applied to, the attribute constructor, and the specification blob. The number and type of arguments to the constructor (the fixed args) can be worked out by the method signature referenced by the attribute constructor, and so the fixed args can simply be concatenated together in the blob without any extra type information.

Named args are different. These specify the value to assign to a field or property once the attribute type has been constructed. In the CLR, fields and properties can be overloaded just on their type; different fields and properties can have the same name. Therefore, to uniquely identify a field or property you need:

  1. Whether it's a field or property (indicated using byte values 0x53 and 0x54, respectively)
  2. The field or property type
  3. The field or property name

After the fixed arg values is a 2-byte number specifying the number of named args in the blob. Each named argument has the above information concatenated together, mostly using the basic ELEMENT_TYPE values, in the same way as a method or field signature. A Type argument is represented using the byte 0x50, and an enum argument is represented using the byte 0x55 followed by a string specifying the name and assembly of the enum type. The named argument property information is followed by the argument value, using the same encoding as fixed args.

Boxed objects

This would be all very well, were it not for object and object[]. Arguments and properties of type object allow a value of any allowed argument type to be specified. As a result, more information needs to be specified in the blob to interpret the argument bytes as the correct type.

So, the argument value is simple prepended with the type of the value by specifying the ELEMENT_TYPE or name of the enum the value represents. For named arguments, a field or property of type object is represented using the byte 0x51, with the actual type specified in the argument value.

Some examples...

All property signatures start with the 2-byte value 0x0001. Similar to my previous post in the series, names in capitals correspond to a particular byte value in the ELEMENT_TYPE structure. For strings, I'll simply give the string value, rather than the length and UTF8 encoding in the actual blob.

I'll be using the following enum and attribute types to demonstrate specification encodings:

class AttrAttribute : Attribute {
    public AttrAttribute() {}
    public AttrAttribute(Type[] tArray) {}
    public AttrAttribute(object o) {}
    public AttrAttribute(MyEnum e) {}
    public AttrAttribute(ushort x, int y) {}
    public AttrAttribute(string str, Type type1, Type type2) {}
    
    public int Prop1 { get; set; }
    public object Prop2 { get; set; }
    public object[] ObjectArray;
}

enum MyEnum : int { Val1 = 1, Val2 = 2 }
Now, some examples:
  • Here, the the specification binds to the (ushort, int) attribute constructor, with fixed args only. The specification blob starts off with a prolog, followed by the two constructor arguments, then the number of named arguments (zero):
    [Attr(42, 84)]
    
    0x0001
        0x002a
        0x00000054
    0x0000
  • An example of string and type encoding:
    [Attr("MyString", typeof(Array), typeof(System.Windows.Forms.Form))]
    
    0x0001
        "MyString"
        "System.Array"
        "System.Windows.Forms.Form,
            System.Windows.Forms,
            Version=4.0.0.0,
            Culture=neutral,
            PublicKeyToken=b77a5c561934e089"
    0x0000
    As you can see, the full assembly specification of a type is only needed if the type isn't in the current assembly or mscorlib. Note, however, that the C# compiler currently chooses to fully-qualify mscorlib types anyway.
  • An object argument (this binds to the object attribute constructor), and two named arguments (a null string is represented by 0xff and the empty string by 0x00)
    [Attr((ushort)40, Prop1 = 12, Prop2 = "")]
    
    0x0001
        U2
        0x0028
    0x0002
        0x54 I4 "Prop1" 0x0000000c
        0x54 0x51 "Prop2"
            STRING 0x00
  • Right, more complicated now. A type array as a fixed argument:
    [Attr(new[] { typeof(string), typeof(object) })]
    
    0x0001
        0x00000002  // the number of elements
        "System.String"
        "System.Object"
    0x0000
  • An enum value, which is simply represented using the underlying value. The CLR works out that it's an enum using information in the attribute constructor signature:
    [Attr(MyEnum.Val1)]
    
    0x0001
        0x00000001
    0x0000
  • And finally, a null array, and an object array as a named argument:
    [Attr((Type[])null,
        ObjectArray = new object[] {
           (byte)2,
           typeof(decimal),
           null,
           MyEnum.Val2 })]
    
    0x0001
        0xffffffff
    0x0001
        0x53 SZARRAY 0x51 "ObjectArray"
            0x00000004
                U1 0x02
                0x50 "System.Decimal"
                STRING 0xff
                0x55 "MyEnum" 0x00000002
    As you'll notice, a null object is encoded as a null string value, and a null array is represented using a length of -1 (0xffffffff).

How does this affect C#?

So, we can now explain why the limits on attribute arguments are so strict in C#. Attribute specification blobs are limited to basic numbers, enums, types, and arrays. As you can see, this is because the raw CLR encoding can only accommodate those types. Special byte patterns have to be used to indicate object, string, Type, or enum values in named arguments; you can't specify an arbitary object type, as there isn't a generalised way of encoding the resulting value in the specification blob.

In particular, decimal values can't be encoded, as it isn't a 'built-in' CLR type that has a native representation (you'll notice that decimal constants in C# programs are compiled as several integer arguments to DecimalConstantAttribute). Jagged arrays also aren't natively supported, although you can get around it by using an array as a value to an object argument:

[Attr(new object[] { new object[] { new Type[] { typeof(string) } }, 42 })]

Finally...

Phew! That was a bit longer than I thought it would be. Custom attribute encodings are complicated! Hopefully this series has been an informative look at what exactly goes on inside a .NET assembly. In the next blog posts, I'll be carrying on with the 'Inside Red Gate' series.

© Simple Talk or respective owner

Related posts about Anatomy of a .NET Assembly