Anatomy of a .NET Assembly - Signature encodings

Posted by Simon Cooper on Simple Talk See other posts from Simple Talk or by Simon Cooper
Published on Fri, 27 May 2011 11:31:00 GMT Indexed on 2011/06/20 16:35 UTC
Read the original article Hit count: 409

If you've just joined this series, I highly recommend you read the previous posts in this series, starting here, or at least these posts, covering the CLR metadata tables.

Before we look at custom attribute encoding, we first need to have a brief look at how signatures are encoded in an assembly in general.

Signature types

There are several types of signatures in an assembly, all of which share a common base representation, and are all stored as binary blobs in the #Blob heap, referenced by an offset from various metadata tables.

The types of signatures are:

  • Method definition and method reference signatures.
  • Field signatures
  • Property signatures
  • Method local variables. These are referenced from the StandAloneSig table, which is then referenced by method body headers.
  • Generic type specifications. These represent a particular instantiation of a generic type.
  • Generic method specifications. Similarly, these represent a particular instantiation of a generic method.
All these signatures share the same underlying mechanism to represent a type

Representing a type

All metadata signatures are based around the ELEMENT_TYPE structure. This assigns a number to each 'built-in' type in the framework; for example, Uint16 is 0x07, String is 0x0e, and Object is 0x1c. Byte codes are also used to indicate SzArrays, multi-dimensional arrays, custom types, and generic type and method variables. However, these require some further information.

Firstly, custom types (ie not one of the built-in types). These require you to specify the 4-byte TypeDefOrRef coded token after the CLASS (0x12) or VALUETYPE (0x11) element type. This 4-byte value is stored in a compressed format before being written out to disk (for more excruciating details, you can refer to the CLI specification).

SzArrays simply have the array item type after the SZARRAY byte (0x1d). Multidimensional arrays follow the ARRAY element type with a series of compressed integers indicating the number of dimensions, and the size and lower bound of each dimension.

Generic variables are simply followed by the index of the generic variable they refer to.

There are other additions as well, for example, a specific byte value indicates a method parameter passed by reference (BYREF), and other values indicating custom modifiers.

Some examples...

To demonstrate, here's a few examples and what the resulting blobs in the #Blob heap will look like. Each name in capitals corresponds to a particular byte value in the ELEMENT_TYPE or CALLCONV structure, and coded tokens to custom types are represented by the type name in curly brackets.

  • A simple field:
    int intField;
    
    FIELD I4
  • A field of an array of a generic type parameter (assuming T is the first generic parameter of the containing type):
    T[] genArrayField
    
    FIELD SZARRAY VAR 0
  • An instance method signature (note how the number of parameters does not include the return type):
    instance string MyMethod(MyType, int&, bool[][]);
    
    HASTHIS DEFAULT 3
        STRING
        CLASS {MyType}
        BYREF I4
        SZARRAY SZARRAY BOOLEAN
  • A generic type instantiation:
    MyGenericType<MyType, MyStruct>
    
    GENERICINST CLASS {MyGenericType} 2
        CLASS {MyType}
        VALUETYPE {MyStruct}
  • For more complicated examples, in the following C# type declaration:
    GenericType<T> : GenericBaseType<object[], T, GenericType<T>> { ... }
    the Extends field of the TypeDef for GenericType will point to a TypeSpec with the following blob:
    GENERICINST CLASS {GenericBaseType} 3
        SZARRAY OBJECT
        VAR 0
        GENERICINST CLASS {GenericType} 1
            VAR 0
  • And a static generic method signature (generic parameters on types are referenced using VAR, generic parameters on methods using MVAR):
    TResult[] GenericMethod<TInput, TResult>(
        TInput,
        System.Converter<TInput, TOutput>);
    
    GENERIC 2 2
        SZARRAY MVAR 1
        MVAR 0
        GENERICINST CLASS {System.Converter} 2
            MVAR 0
            MVAR 1

As you can see, complicated signatures are recursively built up out of quite simple building blocks to represent all the possible variations in a .NET assembly.

Now we've looked at the basics of normal method signatures, in my next post I'll look at custom attribute application signatures, and how they are different to normal signatures.

© Simple Talk or respective owner

Related posts about Anatomy of a .NET Assembly