Search Results

Search found 62 results on 3 pages for 'indirection'.

Page 3/3 | < Previous Page | 1 2 3

How do I recover from an unchecked exception?

- by erickson

Unchecked exceptions are alright if you want to handle every failure the same way, for example by logging it and skipping to the next request, displaying a message to the user and handling the next event, etc. If this is my use case, all I have to do is catch some general exception type at a high level in my system, and handle everything the same way. But I want to recover from specific problems, and I'm not sure the best way to approach it with unchecked exceptions. Here is a concrete example. Suppose I have a web application, built using Struts2 and Hibernate. If an exception bubbles up to my "action", I log it, and display a pretty apology to the user. But one of the functions of my web application is creating new user accounts, that require a unique user name. If a user picks a name that already exists, Hibernate throws an org.hibernate.exception.ConstraintViolationException (an unchecked exception) down in the guts of my system. I'd really like to recover from this particular problem by asking the user to choose another user name, rather than giving them the same "we logged your problem but for now you're hosed" message. Here are a few points to consider: There a lot of people creating accounts simultaneously. I don't want to lock the whole user table between a "SELECT" to see if the name exists and an "INSERT" if it doesn't. In the case of relational databases, there might be some tricks to work around this, but what I'm really interested in is the general case where pre-checking for an exception won't work because of a fundamental race condition. Same thing could apply to looking for a file on the file system, etc. Given my CTO's propensity for drive-by management induced by reading technology columns in "Inc.", I need a layer of indirection around the persistence mechanism so that I can throw out Hibernate and use Kodo, or whatever, without changing anything except the lowest layer of persistence code. As a matter of fact, there are several such layers of abstraction in my system. How can I prevent them from leaking in spite of unchecked exceptions? One of the declaimed weaknesses of checked exceptions is having to "handle" them in every call on the stack—either by declaring that a calling method throws them, or by catching them and handling them. Handling them often means wrapping them in another checked exception of a type appropriate to the level of abstraction. So, for example, in checked-exception land, a file-system–based implementation of my UserRegistry might catch IOException, while a database implementation would catch SQLException, but both would throw a UserNotFoundException that hides the underlying implementation. How do I take advantage of unchecked exceptions, sparing myself of the burden of this wrapping at each layer, without leaking implementation details?

Read the article
Java template classes using generator or similar?

- by Hugh Perkins

Is there some library or generator that I can use to generate multiple templated java classes from a single template? Obviously Java does have a generics implementation itself, but since it uses type-erasure, there are lots of situations where it is less than adequate. For example, if I want to make a self-growing array like this: class EasyArray { T[] backingarray; } (where T is a primitive type), then this isn't possible. This is true for anything which needs an array, for example high-performance templated matrix and vector classes. It should probably be possible to write a code generator which takes a templated class and generates multiple instantiations, for different types, eg for 'double' and 'float' and 'int' and 'String'. Is there something that already exists that does this? Edit: note that using an array of Object is not what I'm looking for, since it's no longer an array of primitives. An array of primitives is very fast, and uses only as much space a sizeof(primitive) * length-of-array. An array of object is an array of pointers/references, that points to Double objects, or similar, which could be scattered all over the place in memory, require garbage collection, allocation, and imply a double-indirection for access. Edit2: good god, voted down for asking for something that probably doesn't currently exist, but is technically possible and feasible? Does that mean that people looking for ways to improve things have already left the java community? Edit3: Here is code to show the difference in performance between primitive and boxed arrays: int N = 10*1000*1000; double[]primArray = new double[N]; for( int i = 0; i < N; i++ ) { primArray[i] = 123.0; } Object[] objArray = new Double[N]; for( int i = 0; i < N; i++ ) { objArray[i] = 123.0; } tic(); primArray = new double[N]; for( int i = 0; i < N; i++ ) { primArray[i] = 123.0; } toc(); tic(); objArray = new Double[N]; for( int i = 0; i < N; i++ ) { objArray[i] = 123.0; } toc(); Results: double[] array: 148 ms Double[] array: 4614 ms Not even close!

Read the article
Is this a problem typically solved with IOC?

- by Dirk

My current application allows users to define custom web forms through a set of admin screens. it's essentially an EAV type application. As such, I can't hard code HTML or ASP.NET markup to render a given page. Instead, the UI requests an instance of a Form object from the service layer, which in turn constructs one using a several RDMBS tables. Form contains the kind of classes you would expect to see in such a context: Form= IEnumerable<FormSections>=IEnumerable<FormFields> Here's what the service layer looks like: public class MyFormService: IFormService{ public Form OpenForm(int formId){ //construct and return a concrete implementation of Form } } Everything works splendidly (for a while). The UI is none the wiser about what sections/fields exist in a given form: It happily renders the Form object it receives into a functional ASP.NET page. A few weeks later, I get a new requirement from the business: When viewing a non-editable (i.e. read-only) versions of a form, certain field values should be merged together and other contrived/calculated fields should are added. No problem I say. Simply amend my service class so that its methods are more explicit: public class MyFormService: IFormService{ public Form OpenFormForEditing(int formId){ //construct and return a concrete implementation of Form } public Form OpenFormForViewing(int formId){ //construct and a concrete implementation of Form //apply additional transformations to the form } } Again everything works great and balance has been restored to the force. The UI continues to be agnostic as to what is in the Form, and our separation of concerns is achieved. Only a few short weeks later, however, the business puts out a new requirement: in certain scenarios, we should apply only some of the form transformations I referenced above. At this point, it feels like the "explicit method" approach has reached a dead end, unless I want to end up with an explosion of methods (OpenFormViewingScenario1, OpenFormViewingScenario2, etc). Instead, I introduce another level of indirection: public interface IFormViewCreator{ void CreateView(Form form); } public class MyFormService: IFormService{ public Form OpenFormForEditing(int formId){ //construct and return a concrete implementation of Form } public Form OpenFormForViewing(int formId, IFormViewCreator formViewCreator){ //construct a concrete implementation of Form //apply transformations to the dynamic field list return formViewCreator.CreateView(form); } } On the surface, this seems like acceptable approach and yet there is a certain smell. Namely, the UI, which had been living in ignorant bliss about the implementation details of OpenFormForViewing, must possess knowledge of and create an instance of IFormViewCreator. My questions are twofold: Is there a better way to achieve the composability I'm after? (perhaps by using an IoC container or a home rolled factory to create the concrete IFormViewCreator)? Did I fundamentally screw up the abstraction here?

Read the article
PostSharp, Obfuscation, and IL

- by Simon Cooper

Aspect-oriented programming (AOP) is a relatively new programming paradigm. Originating at Xerox PARC in 1994, the paradigm was first made available for general-purpose development as an extension to Java in 2001. From there, it has quickly been adapted for use in all the common languages used today. In the .NET world, one of the primary AOP toolkits is PostSharp. Attributes and AOP Normally, attributes in .NET are entirely a metadata construct. Apart from a few special attributes in the .NET framework, they have no effect whatsoever on how a class or method executes within the CLR. Only by using reflection at runtime can you access any attributes declared on a type or type member. PostSharp changes this. By declaring a custom attribute that derives from PostSharp.Aspects.Aspect, applying it to types and type members, and running the resulting assembly through the PostSharp postprocessor, you can essentially declare 'clever' attributes that change the behaviour of whatever the aspect has been applied to at runtime. A simple example of this is logging. By declaring a TraceAttribute that derives from OnMethodBoundaryAspect, you can automatically log when a method has been executed: public class TraceAttribute : PostSharp.Aspects.OnMethodBoundaryAspect { public override void OnEntry(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Entering {0}.{1}.", method.DeclaringType.FullName, method.Name)); } public override void OnExit(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Leaving {0}.{1}.", method.DeclaringType.FullName, method.Name)); } } [Trace] public void MethodToLog() { ... } Now, whenever MethodToLog is executed, the aspect will automatically log entry and exit, without having to add the logging code to MethodToLog itself. PostSharp Performance Now this does introduce a performance overhead - as you can see, the aspect allows access to the MethodBase of the method the aspect has been applied to. If you were limited to C#, you would be forced to retrieve each MethodBase instance using Type.GetMethod(), matching on the method name and signature. This is slow. Fortunately, PostSharp is not limited to C#. It can use any instruction available in IL. And in IL, you can do some very neat things. Ldtoken C# allows you to get the Type object corresponding to a specific type name using the typeof operator: Type t = typeof(Random); The C# compiler compiles this operator to the following IL: ldtoken [mscorlib]System.Random call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle( valuetype [mscorlib]System.RuntimeTypeHandle) The ldtoken instruction obtains a special handle to a type called a RuntimeTypeHandle, and from that, the Type object can be obtained using GetTypeFromHandle. These are both relatively fast operations - no string lookup is required, only direct assembly and CLR constructs are used. However, a little-known feature is that ldtoken is not just limited to types; it can also get information on methods and fields, encapsulated in a RuntimeMethodHandle or RuntimeFieldHandle: // get a MethodBase for String.EndsWith(string) ldtoken method instance bool [mscorlib]System.String::EndsWith(string) call class [mscorlib]System.Reflection.MethodBase [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle( valuetype [mscorlib]System.RuntimeMethodHandle) // get a FieldInfo for the String.Empty field ldtoken field string [mscorlib]System.String::Empty call class [mscorlib]System.Reflection.FieldInfo [mscorlib]System.Reflection.FieldInfo::GetFieldFromHandle( valuetype [mscorlib]System.RuntimeFieldHandle) These usages of ldtoken aren't usable from C# or VB, and aren't likely to be added anytime soon (Eric Lippert's done a blog post on the possibility of adding infoof, methodof or fieldof operators to C#). However, PostSharp deals directly with IL, and so can use ldtoken to get MethodBase objects quickly and cheaply, without having to resort to string lookups. The kicker However, there are problems. Because ldtoken for methods or fields isn't accessible from C# or VB, it hasn't been as well-tested as ldtoken for types. This has resulted in various obscure bugs in most versions of the CLR when dealing with ldtoken and methods, and specifically, generic methods and methods of generic types. This means that PostSharp was behaving incorrectly, or just plain crashing, when aspects were applied to methods that were generic in some way. So, PostSharp has to work around this. Without using the metadata tokens directly, the only way to get the MethodBase of generic methods is to use reflection: Type.GetMethod(), passing in the method name as a string along with information on the signature. Now, this works fine. It's slower than using ldtoken directly, but it works, and this only has to be done for generic methods. Unfortunately, this poses problems when the assembly is obfuscated. PostSharp and Obfuscation When using ldtoken, obfuscators don't affect how PostSharp operates. Because the ldtoken instruction directly references the type, method or field within the assembly, it is unaffected if the name of the object is changed by an obfuscator. However, the indirect loading used for generic methods was breaking, because that uses the name of the method when the assembly is put through the PostSharp postprocessor to lookup the MethodBase at runtime. If the name then changes, PostSharp can't find it anymore, and the assembly breaks. So, PostSharp needs to know about any changes an obfuscator does to an assembly. The way PostSharp does this is by adding another layer of indirection. When PostSharp obfuscation support is enabled, it includes an extra 'name table' resource in the assembly, consisting of a series of method & type names. When PostSharp needs to lookup a method using reflection, instead of encoding the method name directly, it looks up the method name at a fixed offset inside that name table: MethodBase genericMethod = typeof(ContainingClass).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: get_Prop1 21: set_Prop1 22: DoFoo 23: GetWibble When the assembly is later processed by an obfuscator, the obfuscator can replace all the method and type names within the name table with their new name. That way, the reflection lookups performed by PostSharp will now use the new names, and everything will work as expected: MethodBase genericMethod = typeof(#kGy).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: #kkA 21: #zAb 22: #EF5a 23: #2tg As you can see, this requires direct support by an obfuscator in order to perform these rewrites. Dotfuscator supports it, and now, starting with SmartAssembly 6.6.4, SmartAssembly does too. So, a relatively simple solution to a tricky problem, with some CLR bugs thrown in for good measure. You don't see those every day!

Read the article
PostSharp, Obfuscation, and IL

- by Simon Cooper

Aspect-oriented programming (AOP) is a relatively new programming paradigm. Originating at Xerox PARC in 1994, the paradigm was first made available for general-purpose development as an extension to Java in 2001. From there, it has quickly been adapted for use in all the common languages used today. In the .NET world, one of the primary AOP toolkits is PostSharp. Attributes and AOP Normally, attributes in .NET are entirely a metadata construct. Apart from a few special attributes in the .NET framework, they have no effect whatsoever on how a class or method executes within the CLR. Only by using reflection at runtime can you access any attributes declared on a type or type member. PostSharp changes this. By declaring a custom attribute that derives from PostSharp.Aspects.Aspect, applying it to types and type members, and running the resulting assembly through the PostSharp postprocessor, you can essentially declare 'clever' attributes that change the behaviour of whatever the aspect has been applied to at runtime. A simple example of this is logging. By declaring a TraceAttribute that derives from OnMethodBoundaryAspect, you can automatically log when a method has been executed: public class TraceAttribute : PostSharp.Aspects.OnMethodBoundaryAspect { public override void OnEntry(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Entering {0}.{1}.", method.DeclaringType.FullName, method.Name)); } public override void OnExit(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Leaving {0}.{1}.", method.DeclaringType.FullName, method.Name)); } } [Trace] public void MethodToLog() { ... } Now, whenever MethodToLog is executed, the aspect will automatically log entry and exit, without having to add the logging code to MethodToLog itself. PostSharp Performance Now this does introduce a performance overhead - as you can see, the aspect allows access to the MethodBase of the method the aspect has been applied to. If you were limited to C#, you would be forced to retrieve each MethodBase instance using Type.GetMethod(), matching on the method name and signature. This is slow. Fortunately, PostSharp is not limited to C#. It can use any instruction available in IL. And in IL, you can do some very neat things. Ldtoken C# allows you to get the Type object corresponding to a specific type name using the typeof operator: Type t = typeof(Random); The C# compiler compiles this operator to the following IL: ldtoken [mscorlib]System.Random call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle( valuetype [mscorlib]System.RuntimeTypeHandle) The ldtoken instruction obtains a special handle to a type called a RuntimeTypeHandle, and from that, the Type object can be obtained using GetTypeFromHandle. These are both relatively fast operations - no string lookup is required, only direct assembly and CLR constructs are used. However, a little-known feature is that ldtoken is not just limited to types; it can also get information on methods and fields, encapsulated in a RuntimeMethodHandle or RuntimeFieldHandle: // get a MethodBase for String.EndsWith(string) ldtoken method instance bool [mscorlib]System.String::EndsWith(string) call class [mscorlib]System.Reflection.MethodBase [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle( valuetype [mscorlib]System.RuntimeMethodHandle) // get a FieldInfo for the String.Empty field ldtoken field string [mscorlib]System.String::Empty call class [mscorlib]System.Reflection.FieldInfo [mscorlib]System.Reflection.FieldInfo::GetFieldFromHandle( valuetype [mscorlib]System.RuntimeFieldHandle) These usages of ldtoken aren't usable from C# or VB, and aren't likely to be added anytime soon (Eric Lippert's done a blog post on the possibility of adding infoof, methodof or fieldof operators to C#). However, PostSharp deals directly with IL, and so can use ldtoken to get MethodBase objects quickly and cheaply, without having to resort to string lookups. The kicker However, there are problems. Because ldtoken for methods or fields isn't accessible from C# or VB, it hasn't been as well-tested as ldtoken for types. This has resulted in various obscure bugs in most versions of the CLR when dealing with ldtoken and methods, and specifically, generic methods and methods of generic types. This means that PostSharp was behaving incorrectly, or just plain crashing, when aspects were applied to methods that were generic in some way. So, PostSharp has to work around this. Without using the metadata tokens directly, the only way to get the MethodBase of generic methods is to use reflection: Type.GetMethod(), passing in the method name as a string along with information on the signature. Now, this works fine. It's slower than using ldtoken directly, but it works, and this only has to be done for generic methods. Unfortunately, this poses problems when the assembly is obfuscated. PostSharp and Obfuscation When using ldtoken, obfuscators don't affect how PostSharp operates. Because the ldtoken instruction directly references the type, method or field within the assembly, it is unaffected if the name of the object is changed by an obfuscator. However, the indirect loading used for generic methods was breaking, because that uses the name of the method when the assembly is put through the PostSharp postprocessor to lookup the MethodBase at runtime. If the name then changes, PostSharp can't find it anymore, and the assembly breaks. So, PostSharp needs to know about any changes an obfuscator does to an assembly. The way PostSharp does this is by adding another layer of indirection. When PostSharp obfuscation support is enabled, it includes an extra 'name table' resource in the assembly, consisting of a series of method & type names. When PostSharp needs to lookup a method using reflection, instead of encoding the method name directly, it looks up the method name at a fixed offset inside that name table: MethodBase genericMethod = typeof(ContainingClass).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: get_Prop1 21: set_Prop1 22: DoFoo 23: GetWibble When the assembly is later processed by an obfuscator, the obfuscator can replace all the method and type names within the name table with their new name. That way, the reflection lookups performed by PostSharp will now use the new names, and everything will work as expected: MethodBase genericMethod = typeof(#kGy).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: #kkA 21: #zAb 22: #EF5a 23: #2tg As you can see, this requires direct support by an obfuscator in order to perform these rewrites. Dotfuscator supports it, and now, starting with SmartAssembly 6.6.4, SmartAssembly does too. So, a relatively simple solution to a tricky problem, with some CLR bugs thrown in for good measure. You don't see those every day!

Read the article
PostSharp, Obfuscation, and IL

- by simonc

Aspect-oriented programming (AOP) is a relatively new programming paradigm. Originating at Xerox PARC in 1994, the paradigm was first made available for general-purpose development as an extension to Java in 2001. From there, it has quickly been adapted for use in all the common languages used today. In the .NET world, one of the primary AOP toolkits is PostSharp. Attributes and AOP Normally, attributes in .NET are entirely a metadata construct. Apart from a few special attributes in the .NET framework, they have no effect whatsoever on how a class or method executes within the CLR. Only by using reflection at runtime can you access any attributes declared on a type or type member. PostSharp changes this. By declaring a custom attribute that derives from PostSharp.Aspects.Aspect, applying it to types and type members, and running the resulting assembly through the PostSharp postprocessor, you can essentially declare 'clever' attributes that change the behaviour of whatever the aspect has been applied to at runtime. A simple example of this is logging. By declaring a TraceAttribute that derives from OnMethodBoundaryAspect, you can automatically log when a method has been executed: public class TraceAttribute : PostSharp.Aspects.OnMethodBoundaryAspect { public override void OnEntry(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Entering {0}.{1}.", method.DeclaringType.FullName, method.Name)); } public override void OnExit(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Leaving {0}.{1}.", method.DeclaringType.FullName, method.Name)); } } [Trace] public void MethodToLog() { ... } Now, whenever MethodToLog is executed, the aspect will automatically log entry and exit, without having to add the logging code to MethodToLog itself. PostSharp Performance Now this does introduce a performance overhead - as you can see, the aspect allows access to the MethodBase of the method the aspect has been applied to. If you were limited to C#, you would be forced to retrieve each MethodBase instance using Type.GetMethod(), matching on the method name and signature. This is slow. Fortunately, PostSharp is not limited to C#. It can use any instruction available in IL. And in IL, you can do some very neat things. Ldtoken C# allows you to get the Type object corresponding to a specific type name using the typeof operator: Type t = typeof(Random); The C# compiler compiles this operator to the following IL: ldtoken [mscorlib]System.Random call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle( valuetype [mscorlib]System.RuntimeTypeHandle) The ldtoken instruction obtains a special handle to a type called a RuntimeTypeHandle, and from that, the Type object can be obtained using GetTypeFromHandle. These are both relatively fast operations - no string lookup is required, only direct assembly and CLR constructs are used. However, a little-known feature is that ldtoken is not just limited to types; it can also get information on methods and fields, encapsulated in a RuntimeMethodHandle or RuntimeFieldHandle: // get a MethodBase for String.EndsWith(string) ldtoken method instance bool [mscorlib]System.String::EndsWith(string) call class [mscorlib]System.Reflection.MethodBase [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle( valuetype [mscorlib]System.RuntimeMethodHandle) // get a FieldInfo for the String.Empty field ldtoken field string [mscorlib]System.String::Empty call class [mscorlib]System.Reflection.FieldInfo [mscorlib]System.Reflection.FieldInfo::GetFieldFromHandle( valuetype [mscorlib]System.RuntimeFieldHandle) These usages of ldtoken aren't usable from C# or VB, and aren't likely to be added anytime soon (Eric Lippert's done a blog post on the possibility of adding infoof, methodof or fieldof operators to C#). However, PostSharp deals directly with IL, and so can use ldtoken to get MethodBase objects quickly and cheaply, without having to resort to string lookups. The kicker However, there are problems. Because ldtoken for methods or fields isn't accessible from C# or VB, it hasn't been as well-tested as ldtoken for types. This has resulted in various obscure bugs in most versions of the CLR when dealing with ldtoken and methods, and specifically, generic methods and methods of generic types. This means that PostSharp was behaving incorrectly, or just plain crashing, when aspects were applied to methods that were generic in some way. So, PostSharp has to work around this. Without using the metadata tokens directly, the only way to get the MethodBase of generic methods is to use reflection: Type.GetMethod(), passing in the method name as a string along with information on the signature. Now, this works fine. It's slower than using ldtoken directly, but it works, and this only has to be done for generic methods. Unfortunately, this poses problems when the assembly is obfuscated. PostSharp and Obfuscation When using ldtoken, obfuscators don't affect how PostSharp operates. Because the ldtoken instruction directly references the type, method or field within the assembly, it is unaffected if the name of the object is changed by an obfuscator. However, the indirect loading used for generic methods was breaking, because that uses the name of the method when the assembly is put through the PostSharp postprocessor to lookup the MethodBase at runtime. If the name then changes, PostSharp can't find it anymore, and the assembly breaks. So, PostSharp needs to know about any changes an obfuscator does to an assembly. The way PostSharp does this is by adding another layer of indirection. When PostSharp obfuscation support is enabled, it includes an extra 'name table' resource in the assembly, consisting of a series of method & type names. When PostSharp needs to lookup a method using reflection, instead of encoding the method name directly, it looks up the method name at a fixed offset inside that name table: MethodBase genericMethod = typeof(ContainingClass).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: get_Prop1 21: set_Prop1 22: DoFoo 23: GetWibble When the assembly is later processed by an obfuscator, the obfuscator can replace all the method and type names within the name table with their new name. That way, the reflection lookups performed by PostSharp will now use the new names, and everything will work as expected: MethodBase genericMethod = typeof(#kGy).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: #kkA 21: #zAb 22: #EF5a 23: #2tg As you can see, this requires direct support by an obfuscator in order to perform these rewrites. Dotfuscator supports it, and now, starting with SmartAssembly 6.6.4, SmartAssembly does too. So, a relatively simple solution to a tricky problem, with some CLR bugs thrown in for good measure. You don't see those every day! Cross posted from Simple Talk.

Read the article
Need some help deciphering a line of assembler code, from .NET JITted code

- by Lasse V. Karlsen

In a C# constructor, that ends up with a call to this(...), the actual call gets translated to this: 0000003d call dword ptr ds:[199B88E8h] What is the DS register contents here? I know it's the data-segment, but is this call through a VMT-table or similar? I doubt it though, since this(...) wouldn't be a call to a virtual method, just another constructor. I ask because the value at that location seems to be bad in some way, if I hit F11, trace into (Visual Studio 2008), on that call-instruction, the program crashes with an access violation. The code is deep inside a 3rd party control library, where, though I have the source code, I don't have the assemblies compiled with enough debug information that I can trace it through C# code, only through the disassembler, and then I have to match that back to the actual code. The C# code in question is this: public AxisRangeData(AxisRange range) : this(range, range.Axis) { } Reflector shows me this IL code: .maxstack 8 L_0000: ldarg.0 L_0001: ldarg.1 L_0002: ldarg.1 L_0003: callvirt instance class DevExpress.XtraCharts.AxisBase DevExpress.XtraCharts.AxisRange::get_Axis() L_0008: call instance void DevExpress.XtraCharts.Native.AxisRangeData::.ctor(class DevExpress.XtraCharts.ChartElement, class DevExpress.XtraCharts.AxisBase) L_000d: ret It's that last call there, to the other constructor of the same class, that fails. The debugger never surfaces inside the other method, it just crashes. The disassembly for the method after JITting is this: 00000000 push ebp 00000001 mov ebp,esp 00000003 sub esp,14h 00000006 mov dword ptr [ebp-4],ecx 00000009 mov dword ptr [ebp-8],edx 0000000c cmp dword ptr ds:[18890E24h],0 00000013 je 0000001A 00000015 call 61843511 0000001a mov eax,dword ptr [ebp-4] 0000001d mov dword ptr [ebp-0Ch],eax 00000020 mov eax,dword ptr [ebp-8] 00000023 mov dword ptr [ebp-10h],eax 00000026 mov ecx,dword ptr [ebp-8] 00000029 cmp dword ptr [ecx],ecx 0000002b call dword ptr ds:[1889D0DCh] // range.Axis 00000031 mov dword ptr [ebp-14h],eax 00000034 push dword ptr [ebp-14h] 00000037 mov edx,dword ptr [ebp-10h] 0000003a mov ecx,dword ptr [ebp-0Ch] 0000003d call dword ptr ds:[199B88E8h] // this(range, range.Axis)? 00000043 nop 00000044 mov esp,ebp 00000046 pop ebp 00000047 ret Basically what I'm asking is this: What the purpose of the ds:[ADDR] indirection here? VMT-table is only for virtual isn't it? and this is constructor Could the constructor have yet to be JITted, which could mean that the call would actually call through a JIT shim? I'm afraid I'm in deep water here, so anything might and could help. Edit: Well, the problem just got worse, or better, or whatever. We are developing the .NET feature in a C# project in a Visual Studio 2008 solution, and debugging and developing through Visual Studio. However, in the end, this code will be loaded into a .NET runtime hosted by a Win32 Delphi application. In order to facilitate easy experimentation of such features, we can also configure the Visual Studio project/solution/debugger to copy the produced dll's to the Delphi app's directory, and then execute the Delphi app, through the Visual Studio debugger. Turns out, the problem goes away if I run the program outside of the debugger, but during debugging, it crops up, every time. Not sure that helps, but since the code isn't slated for production release for another 6 months or so, then it takes some of the pressure off of it for the test release that we have soon. I'll dive into the memory parts later, but probably not until over the weekend, and post a followup.

Read the article
C++ pimpl idiom wastes an instruction vs. C style?

- by Rob

(Yes, I know that one machine instruction usually doesn't matter. I'm asking this question because I want to understand the pimpl idiom, and use it in the best possible way; and because sometimes I do care about one machine instruction.) In the sample code below, there are two classes, Thing and OtherThing. Users would include "thing.hh". Thing uses the pimpl idiom to hide it's implementation. OtherThing uses a C style – non-member functions that return and take pointers. This style produces slightly better machine code. I'm wondering: is there a way to use C++ style – ie, make the functions into member functions – and yet still save the machine instruction. I like this style because it doesn't pollute the namespace outside the class. Note: I'm only looking at calling member functions (in this case, calc). I'm not looking at object allocation. Below are the files, commands, and the machine code, on my Mac. thing.hh: class ThingImpl; class Thing { ThingImpl *impl; public: Thing(); int calc(); }; class OtherThing; OtherThing *make_other(); int calc(OtherThing *); thing.cc: #include "thing.hh" struct ThingImpl { int x; }; Thing::Thing() { impl = new ThingImpl; impl->x = 5; } int Thing::calc() { return impl->x + 1; } struct OtherThing { int x; }; OtherThing *make_other() { OtherThing *t = new OtherThing; t->x = 5; } int calc(OtherThing *t) { return t->x + 1; } main.cc (just to test the code actually works...) #include "thing.hh" #include <cstdio> int main() { Thing *t = new Thing; printf("calc: %d\n", t->calc()); OtherThing *t2 = make_other(); printf("calc: %d\n", calc(t2)); } Makefile: all: main thing.o : thing.cc thing.hh g++ -fomit-frame-pointer -O2 -c thing.cc main.o : main.cc thing.hh g++ -fomit-frame-pointer -O2 -c main.cc main: main.o thing.o g++ -O2 -o $@ $^ clean: rm *.o rm main Run make and then look at the machine code. On the mac I use otool -tv thing.o | c++filt. On linux I think it's objdump -d thing.o. Here is the relevant output: Thing::calc(): 0000000000000000 movq (%rdi),%rax 0000000000000003 movl (%rax),%eax 0000000000000005 incl %eax 0000000000000007 ret calc(OtherThing*): 0000000000000010 movl (%rdi),%eax 0000000000000012 incl %eax 0000000000000014 ret Notice the extra instruction because of the pointer indirection. The first function looks up two fields (impl, then x), while the second only needs to get x. What can be done?

Read the article
Elapsed time of running a C program

- by yCalleecharan

Hi, I would like to know what lines of C code to add to a program so that it tells me the total time that the program takes to run. I guess there should be counter initialization near the beginning of main and one after the main function ends. Is the right header clock.h? Thanks a lot... Update I have a Win Xp machine. Is it just adding clock() at the beginning and another clock() at the end of the program? Then I can estimate the time difference. Yes, you're right it's time.h. Here's my code: #include <stdio.h> #include <stdlib.h> #include <math.h> #include <share.h> #include <time.h> void f(long double fb[], long double fA, long double fB); int main() { clock_t start, end; start = clock(); const int ARRAY_SIZE = 11; long double* z = (long double*) malloc(sizeof (long double) * ARRAY_SIZE); int i; long double A, B; if (z == NULL) { printf("Out of memory\n"); exit(-1); } A = 0.5; B = 2; for (i = 0; i < ARRAY_SIZE; i++) { z[i] = 0; } z[1] = 5; f(z, A, B); for (i = 0; i < ARRAY_SIZE; i++) printf("z is %.16Le\n", z[i]); free(z); z = NULL; end = clock(); printf("Took %ld ticks\n", end-start); printf("Took %f seconds\n", (double)(end-start)/CLOCKS_PER_SEC); return 0; } void f(long double fb[], long double fA, long double fB) { fb[0] = fb[1]* fA; fb[1] = fb[1] - 1; return; } Some errors with MVS2008: testim.c(16) : error C2143: syntax error : missing ';' before 'const' testim.c(18) :error C2143: syntax error : missing ';' before 'type' testim.c(20) :error C2143: syntax error : missing ';' before 'type' testim.c(21) :error C2143: syntax error : missing ';' before 'type' testim.c(23) :error C2065: 'z' : undeclared identifier testim.c(23) :warning C4047: '==' : 'int' differs in levels of indirection from 'void *' testim.c(28) : error C2065: 'A' : undeclared identifier testim.c(28) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data and it goes to 28 errors. Note that I don't have any errors/warnings without your clock codes. LATEST NEWS: I unfortunately didn't get a good reply here. But after a search on Google, the code is working. Here it is: #include <stdio.h> #include <stdlib.h> #include <math.h> #include <share.h> #include <time.h> void f(long double fb[], long double fA, long double fB); int main() { clock_t start = clock(); const int ARRAY_SIZE = 11; long double* z = (long double*) malloc(sizeof (long double) * ARRAY_SIZE); int i; long double A, B; if (z == NULL) { printf("Out of memory\n"); exit(-1); } A = 0.5; B = 2; for (i = 0; i < ARRAY_SIZE; i++) { z[i] = 0; } z[1] = 5; f(z, A, B); for (i = 0; i < ARRAY_SIZE; i++) printf("z is %.16Le\n", z[i]); free(z); z = NULL; printf("Took %f seconds\n", ((double)clock()-start)/CLOCKS_PER_SEC); return 0; } void f(long double fb[], long double fA, long double fB) { fb[0] = fb[1]* fA; fb[1] = fb[1] - 1; return; } Cheers

Read the article
Red Gate Coder interviews: Robin Hellen

- by Michael Williamson

Robin Hellen is a test engineer here at Red Gate, and is also the latest coder I’ve interviewed. We chatted about debugging code, the roles of software engineers and testers, and why Vala is currently his favourite programming language. How did you get started with programming?It started when I was about six. My dad’s a professional programmer, and he gave me and my sister one of his old computers and taught us a bit about programming. It was an old Amiga 500 with a variant of BASIC. I don’t think I ever successfully completed anything! It was just faffing around. I didn’t really get anywhere with it.But then presumably you did get somewhere with it at some point.At some point. The PC emerged as the dominant platform, and I learnt a bit of Visual Basic. I didn’t really do much, just a couple of quick hacky things. A bit of demo animation. Took me a long time to get anywhere with programming, really.When did you feel like you did start to get somewhere?I think it was when I started doing things for someone else, which was my sister’s final year of university project. She called up my dad two days before she was due to submit, saying “We need something to display a graph!”. Dad says, “I’m too busy, go talk to your brother”. So I hacked up this ugly piece of code, sent it off and they won a prize for that project. Apparently, the graph, the bit that I wrote, was the reason they won a prize! That was when I first felt that I’d actually done something that was worthwhile. That was my first real bit of code, and the ugliest code I’ve ever written. It’s basically an array of pre-drawn line elements that I shifted round the screen to draw a very spikey graph.When did you decide that programming might actually be something that you wanted to do as a career?It’s not really a decision I took, I always wanted to do something with computers. And I had to take a gap year for uni, so I was looking for twelve month internships. I applied to Red Gate, and they gave me a job as a tester. And that’s where I really started having to write code well. To a better standard that I had been up to that point.How did you find coming to Red Gate and working with other coders?I thought it was really nice. I learnt so much just from other people around. I think one of the things that’s really great is that people are just willing to help you learn. Instead of “Don’t you know that, you’re so stupid”, it’s “You can just do it this way”.If you could go back to the very start of that internship, is there something that you would tell yourself?Write shorter code. I have a tendency to write massive, many-thousand line files that I break out of right at the end. And then half-way through a project I’m doing something, I think “Where did I write that bit that does that thing?”, and it’s almost impossible to find. I wrote some horrendous code when I started. Just that principle, just keep things short. Even if looks a bit crazy to be jumping around all over the place all of the time, it’s actually a lot more understandable.And how do you hold yourself to that?Generally, if a function’s going off my screen, it’s probably too long. That’s what I tell myself, and within the team here we have code reviews, so the guys I’m with at the moment are pretty good at pulling me up on, “Doesn’t that look like it’s getting a bit long?”. It’s more just the subjective standard of readability than anything.So you’re an advocate of code review?Yes, definitely. Both to spot errors that you might have made, and to improve your knowledge. The person you’re reviewing will say “Oh, you could have done it that way”. That’s how we learn, by talking to others, and also just sharing knowledge of how your project works around the team, or even outside the team. Definitely a very firm advocate of code reviews.Do you think there’s more we could do with them?I don’t know. We’re struggling with how to add them as part of the process without it becoming too cumbersome. We’ve experimented with a few different ways, and we’ve not found anything that just works.To get more into the nitty gritty: how do you like to debug code?The first thing is to do it in my head. I’ll actually think what piece of code is likely to have caused that error, and take a quick look at it, just to see if there’s anything glaringly obvious there. The next thing I’ll probably do is throw in print statements, or throw some exceptions from various points, just to check: is it going through the code path I expect it to? A last resort is to actually debug code using a debugger.Why is the debugger the last resort?Probably because of the environments I learnt programming in. VB and early BASIC didn’t have much of a debugger, the only way to find out what your program was doing was to add print statements. Also, because a lot of the stuff I tend to work with is non-interactive, if it’s something that takes a long time to run, I can throw in the print statements, set a run off, go and do something else, and look at it again later, rather than trying to remember what happened at that point when I was debugging through it. So it also gives me the record of what happens. I hate just sitting there pressing F5, F5, continually. If you’re having to find out what your code is doing at each line, you’ve probably got a very wrong mental model of what your code’s doing, and you can find that out just as easily by inspecting a couple of values through the print statements.If I were on some codebase that you were also working on, what should I do to make it as easy as possible to understand?I’d say short and well-named methods. The one thing I like to do when I’m looking at code is to find out where a value comes from, and the more layers of indirection there are, particularly DI [dependency injection] frameworks, the harder it is to find out where something’s come from. I really hate that. I want to know if the value come from the user here or is a constant here, and if I can’t find that out, that makes code very hard to understand for me.As a tester, where do you think the split should lie between software engineers and testers?I think the split is less on areas of the code you write and more what you’re designing and creating. The developers put a structure on the code, while my major role is to say which tests we should have, whether we should test that, or it’s not worth testing that because it’s a tiny function in code that nobody’s ever actually going to see. So it’s not a split in the code, it’s a split in what you’re thinking about. Saying what code we should write, but alternatively what code we should take out.In your experience, do the software engineers tend to do much testing themselves?They tend to control the lowest layer of tests. And, depending on how the balance of people is in the team, they might write some of the higher levels of test. Or that might go to the testers. I’m the only tester on my team with three other developers, so they’ll be writing quite a lot of the actual test code, with input from me as to whether we should test that functionality, whereas on other teams, where it’s been more equal numbers, the testers have written pretty much all of the high level tests, just because that’s the best use of resource.If you could shuffle resources around however you liked, do you think that the developers should be writing those high-level tests?I think they should be writing them occasionally. It helps when they have an understanding of how testing code works and possibly what assumptions we’ve made in tests, and they can say “actually, it doesn’t work like that under the hood so you’ve missed this whole area”. It’s one of those agile things that everyone on the team should be at least comfortable doing the various jobs. So if the developers can write test code then I think that’s a very good thing.So you think testers should be able to write production code?Yes, although given most testers skills at coding, I wouldn’t advise it too much! I have written a few things, and I did make a few changes that have actually gone into our production code base. They’re not necessarily running every time but they are there. I think having that mix of skill sets is really useful. In some ways we’re using our own product to test itself, so being able to make those changes where it’s not working saves me a round-trip through the developers. It can be really annoying if the developers have no time to make a change, and I can’t touch the code.If the software engineers are consistently writing tests at all levels, what role do you think the role of a tester is?I think on a team like that, those distinctions aren’t quite so useful. There’ll be two cases. There’s either the case where the developers think they’ve written good tests, but you still need someone with a test engineer mind-set to go through the tests and validate that it’s a useful set, or the correct set for that code. Or they won’t actually be pure developers, they’ll have that mix of test ability in there.I think having slightly more distinct roles is useful. When it starts to blur, then you lose that view of the tests as a whole. The tester job is not to create tests, it’s to validate the quality of the product, and you don’t do that just by writing tests. There’s more things you’ve got to keep in your mind. And I think when you blur the roles, you start to lose that end of the tester.So because you’re working on those features, you lose that holistic view of the whole system?Yeah, and anyone who’s worked on the feature shouldn’t be testing it. You always need to have it tested it by someone who didn’t write it. Otherwise you’re a bit too close and you assume “yes, people will only use it that way”, but the tester will come along and go “how do people use this? How would our most idiotic user use this?”. I might not test that because it might be completely irrelevant. But it’s coming in and trying to have a different set of assumptions.Are you a believer that it should all be automated if possible?Not entirely. So an automated test is always better than a manual test for the long-term, but there’s still nothing that beats a human sitting in front of the application and thinking “What could I do at this point?”. The automated test is very good but they follow that strict path, and they never check anything off the path. The human tester will look at things that they weren’t expecting, whereas the automated test can only ever go “Is that value correct?” in many respects, and it won’t notice that on the other side of the screen you’re showing something completely wrong. And that value might have been checked independently, but you always find a few odd interactions when you’re going through something manually, and you always need to go through something manually to start with anyway, otherwise you won’t know where the important bits to write your automation are.When you’re doing that manual testing, do you think it’s important to do that across the entire product, or just the bits that you’ve touched recently?I think it’s important to do it mostly on the bits you’ve touched, but you can’t ignore the rest of the product. Unless you’re dealing with a very, very self-contained bit, you’re almost always encounter other bits of the product along the way. Most testers I know, even if they are looking at just one path, they’ll keep open and move around a bit anyway, just because they want to find something that’s broken. If we find that your path is right, we’ll go out and hunt something else.How do you think this fits into the idea of continuously deploying, so long as the tests pass?With deploying a website it’s a bit different because you can always pull it back. If you’re deploying an application to customers, when you’ve released it, it’s out there, you can’t pull it back. Someone’s going to keep it, no matter how hard you try there will be a few installations that stay around. So I’d always have at least a human element on that path. With websites, you could probably automate straight out, or at least straight out to an internal environment or a single server in a cloud of fifty that will serve some people. But I don’t think you should release to everyone just on automated tests passing.You’ve already mentioned using BASIC and C# — are there any other languages that you’ve used?I’ve used a few. That’s something that has changed more recently, I’ve become familiar with more languages. Before I started at Red Gate I learnt a bit of C. Then last year, I taught myself Python which I actually really enjoyed using. I’ve also come across another language called Vala, which is sort of a C#-like language. It’s basically a pre-processor for C, but it has very nice syntax. I think that’s currently my favourite language.Any particular reason for trying Vala?I have a completely Linux environment at home, and I’ve been looking for a nice language, and C# just doesn’t cut it because I won’t touch Mono. So, I was looking for something like C# but that was useable in an open source environment, and Vala’s what I found. C#’s got a few features that Vala doesn’t, and Vala’s got a few features where I think “It would be awesome if C# had that”.What are some of the features that it’s missing?Extension methods. And I think that’s the only one that really bugs me. I like to use them when I’m writing C# because it makes some things really easy, especially with libraries that you can’t touch the internals of. It doesn’t have method overloading, which is sometimes annoying.Where it does win over C#?Everything is non-nullable by default, you never have to check that something’s unexpectedly null.Also, Vala has code contracts. This is starting to come in C# 4, but the way it works in Vala is that you specify requirements in short phrases as part of your function signature and they stick to the signature, so that when you inherit it, it has exactly the same code contract as the base one, or when you inherit from an interface, you have to match the signature exactly. Just using those makes you think a bit more about how you’re writing your method, it’s not an afterthought when you’ve got contracts from base classes given to you, you can’t change it. Which I think is a lot nicer than the way C# handles it. When are those actually checked?They’re checked both at compile and run-time. The compile-time checking isn’t very strong yet, it’s quite a new feature in the compiler, and because it compiles down to C, you can write C code and interface with your methods, so you can bypass that compile-time check anyway. So there’s an extra runtime check, and if you violate one of the contracts at runtime, it’s game over for your program, there’s no exception to catch, it’s just goodbye!One thing I dislike about C# is the exceptions. You write a bit of code and fifty exceptions could come from any point in your ten lines, and you can’t mentally model how those exceptions are going to come out, and you can’t even predict them based on the functions you’re calling, because if you’ve accidentally got a derived class there instead of a base class, that can throw a completely different set of exceptions. So I’ve got no way of mentally modelling those, whereas in Vala they’re checked like Java, so you know only these exceptions can come out. You know in advance the error conditions.I think Raymond Chen on Old New Thing says “the only thing you know when you throw an exception is that you’re in an invalid state somewhere in your program, so just kill it and be done with it!”You said you’ve also learnt bits of Python. How did you find that compared to Vala and C#?Very different because of the dynamic typing. I’ve been writing a website for my own use. I’m quite into photography, so I take photos off my camera, post-process them, dump them in a file, and I get a webpage with all my thumbnails. So sort of like Picassa, but written by myself because I wanted something to learn Python with. There are some things that are really nice, I just found it really difficult to cope with the fact that I’m not quite sure what this object type that I’m passed is, I might not ever be sure, so it can randomly blow up on me. But once I train myself to ignore that and just say “well, I’m fairly sure it’s going to be something that looks like this, so I’ll use it like this”, then it’s quite nice.Any particular features that you’ve appreciated?I don’t like any particular feature, it’s just very straightforward to work with. It’s very quick to write something in, particularly as you don’t have to worry that you’ve changed something that affects a different part of the program. If you have, then that part blows up, but I can get this part working right now.If you were doing a big project, would you be willing to do it in Python rather than C# or Vala?I think I might be willing to try something bigger or long term with Python. We’re currently doing an ASP.NET MVC project on C#, and I don’t like the amount of reflection. There’s a lot of magic that pulls values out, and it’s all done under the scenes. It’s almost managed to put a dynamic type system on top of C#, which in many ways destroys the language to me, whereas if you’re already in a dynamic language, having things done dynamically is much more natural. In many ways, you get the worst of both worlds. I think for web projects, I would go with Python again, whereas for anything desktop, command-line or GUI-based, I’d probably go for C# or Vala, depending on what environment I’m in.It’s the fact that you can gain from the strong typing in ways that you can’t so much on the web app. Or, in a web app, you have to use dynamic typing at some point, or you have to write a hell of a lot of boilerplate, and I’d rather use the dynamic typing than write the boilerplate.What do you think separates great programmers from everyone else?Probably design choices. Choosing to write it a piece of code one way or another. For any given program you ask me to write, I could probably do it five thousand ways. A programmer who is capable will see four or five of them, and choose one of the better ones. The excellent programmer will see the largest proportion and manage to pick the best one very quickly without having to think too much about it. I think that’s probably what separates, is the speed at which they can see what’s the best path to write the program in. More Red Gater Coder interviews

Read the article
Capturing and Transforming ASP.NET Output with Response.Filter

- by Rick Strahl

During one of my Handlers and Modules session at DevConnections this week one of the attendees asked a question that I didn’t have an immediate answer for. Basically he wanted to capture response output completely and then apply some filtering to the output – effectively injecting some additional content into the page AFTER the page had completely rendered. Specifically the output should be captured from anywhere – not just a page and have this code injected into the page. Some time ago I posted some code that allows you to capture ASP.NET Page output by overriding the Render() method, capturing the HtmlTextWriter() and reading its content, modifying the rendered data as text then writing it back out. I’ve actually used this approach on a few occasions and it works fine for ASP.NET pages. But this obviously won’t work outside of the Page class environment and it’s not really generic – you have to create a custom page class in order to handle the output capture. [updated 11/16/2009 – updated ResponseFilterStream implementation and a few additional notes based on comments] Enter Response.Filter However, ASP.NET includes a Response.Filter which can be used – well to filter output. Basically Response.Filter is a stream through which the OutputStream is piped back to the Web Server (indirectly). As content is written into the Response object, the filter stream receives the appropriate Stream commands like Write, Flush and Close as well as read operations although for a Response.Filter that’s uncommon to be hit. The Response.Filter can be programmatically replaced at runtime which allows you to effectively intercept all output generation that runs through ASP.NET. A common Example: Dynamic GZip Encoding A rather common use of Response.Filter hooking up code based, dynamic GZip compression for requests which is dead simple by applying a GZipStream (or DeflateStream) to Response.Filter. The following generic routines can be used very easily to detect GZip capability of the client and compress response output with a single line of code and a couple of library helper routines: WebUtils.GZipEncodePage(); which is handled with a few lines of reusable code and a couple of static helper methods: /// <summary> ///Sets up the current page or handler to use GZip through a Response.Filter ///IMPORTANT: ///You have to call this method before any output is generated! /// </summary> public static void GZipEncodePage() { HttpResponse Response = HttpContext.Current.Response; if(IsGZipSupported()) { stringAcceptEncoding = HttpContext.Current.Request.Headers["Accept-Encoding"]; if(AcceptEncoding.Contains("deflate")) { Response.Filter = newSystem.IO.Compression.DeflateStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); Response.AppendHeader("Content-Encoding", "deflate"); } else { Response.Filter = newSystem.IO.Compression.GZipStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); Response.AppendHeader("Content-Encoding", "gzip"); } } // Allow proxy servers to cache encoded and unencoded versions separately Response.AppendHeader("Vary", "Content-Encoding"); } /// <summary> /// Determines if GZip is supported /// </summary> /// <returns></returns> public static bool IsGZipSupported() { string AcceptEncoding = HttpContext.Current.Request.Headers["Accept-Encoding"]; if (!string.IsNullOrEmpty(AcceptEncoding) && (AcceptEncoding.Contains("gzip") || AcceptEncoding.Contains("deflate"))) return true; return false; } GZipStream and DeflateStream are streams that are assigned to Response.Filter and by doing so apply the appropriate compression on the active Response. Response.Filter content is chunked So to implement a Response.Filter effectively requires only that you implement a custom stream and handle the Write() method to capture Response output as it’s written. At first blush this seems very simple – you capture the output in Write, transform it and write out the transformed content in one pass. And that indeed works for small amounts of content. But you see, the problem is that output is written in small buffer chunks (a little less than 16k it appears) rather than just a single Write() statement into the stream, which makes perfect sense for ASP.NET to stream data back to IIS in smaller chunks to minimize memory usage en route. Unfortunately this also makes it a more difficult to implement any filtering routines since you don’t directly get access to all of the response content which is problematic especially if those filtering routines require you to look at the ENTIRE response in order to transform or capture the output as is needed for the solution the gentleman in my session asked for. So in order to address this a slightly different approach is required that basically captures all the Write() buffers passed into a cached stream and then making the stream available only when it’s complete and ready to be flushed. As I was thinking about the implementation I also started thinking about the few instances when I’ve used Response.Filter implementations. Each time I had to create a new Stream subclass and create my custom functionality but in the end each implementation did the same thing – capturing output and transforming it. I thought there should be an easier way to do this by creating a re-usable Stream class that can handle stream transformations that are common to Response.Filter implementations. Creating a semi-generic Response Filter Stream Class What I ended up with is a ResponseFilterStream class that provides a handful of Events that allow you to capture and/or transform Response content. The class implements a subclass of Stream and then overrides Write() and Flush() to handle capturing and transformation operations. By exposing events it’s easy to hook up capture or transformation operations via single focused methods. ResponseFilterStream exposes the following events: CaptureStream, CaptureString Captures the output only and provides either a MemoryStream or String with the final page output. Capture is hooked to the Flush() operation of the stream. TransformStream, TransformString Allows you to transform the complete response output with events that receive a MemoryStream or String respectively and can you modify the output then return it back as a return value. The transformed output is then written back out in a single chunk to the response output stream. These events capture all output internally first then write the entire buffer into the response. TransformWrite, TransformWriteString Allows you to transform the Response data as it is written in its original chunk size in the Stream’s Write() method. Unlike TransformStream/TransformString which operate on the complete output, these events only see the current chunk of data written. This is more efficient as there’s no caching involved, but can cause problems due to searched content splitting over multiple chunks. Using this implementation, creating a custom Response.Filter transformation becomes as simple as the following code. To hook up the Response.Filter using the MemoryStream version event: ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformStream += filter_TransformStream; Response.Filter = filter; and the event handler to do the transformation: MemoryStream filter_TransformStream(MemoryStream ms) { Encoding encoding = HttpContext.Current.Response.ContentEncoding; string output = encoding.GetString(ms.ToArray()); output = FixPaths(output); ms = new MemoryStream(output.Length); byte[] buffer = encoding.GetBytes(output); ms.Write(buffer,0,buffer.Length); return ms; } private string FixPaths(string output) { string path = HttpContext.Current.Request.ApplicationPath; // override root path wonkiness if (path == "/") path = ""; output = output.Replace("\"~/", "\"" + path + "/").Replace("'~/", "'" + path + "/"); return output; } The idea of the event handler is that you can do whatever you want to the stream and return back a stream – either the same one that’s been modified or a brand new one – which is then sent back to as the final response. The above code can be simplified even more by using the string version events which handle the stream to string conversions for you: ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformString += filter_TransformString; Response.Filter = filter; and the event handler to do the transformation calling the same FixPaths method shown above: string filter_TransformString(string output) { return FixPaths(output); } The events for capturing output and capturing and transforming chunks work in a very similar way. By using events to handle the transformations ResponseFilterStream becomes a reusable component and we don’t have to create a new stream class or subclass an existing Stream based classed. By the way, the example used here is kind of a cool trick which transforms “~/” expressions inside of the final generated HTML output – even in plain HTML controls not HTML controls – and transforms them into the appropriate application relative path in the same way that ResolveUrl would do. So you can write plain old HTML like this: <a href=”~/default.aspx”>Home</a> and have it turned into: <a href=”/myVirtual/default.aspx”>Home</a> without having to use an ASP.NET control like Hyperlink or Image or having to constantly use: <img src=”<%= ResolveUrl(“~/images/home.gif”) %>” /> in MVC applications (which frankly is one of the most annoying things about MVC especially given the path hell that extension-less and endpoint-less URLs impose). I can’t take credit for this idea. While discussing the Response.Filter issues on Twitter a hint from Dylan Beattie who pointed me at one of his examples which does something similar. I thought the idea was cool enough to use an example for future demos of Response.Filter functionality in ASP.NET next I time I do the Modules and Handlers talk (which was great fun BTW). How practical this is is debatable however since there’s definitely some overhead to using a Response.Filter in general and especially on one that caches the output and the re-writes it later. Make sure to test for performance anytime you use Response.Filter hookup and make sure it' doesn’t end up killing perf on you. You’ve been warned :-}. How does ResponseFilterStream work? The big win of this implementation IMHO is that it’s a reusable component – so for implementation there’s no new class, no subclassing – you simply attach to an event to implement an event handler method with a straight forward signature to retrieve the stream or string you’re interested in. The implementation is based on a subclass of Stream as is required in order to handle the Response.Filter requirements. What’s different than other implementations I’ve seen in various places is that it supports capturing output as a whole to allow retrieving the full response output for capture or modification. The exception are the TransformWrite and TransformWrite events which operate only active chunk of data written by the Response. For captured output, the Write() method captures output into an internal MemoryStream that is cached until writing is complete. So Write() is called when ASP.NET writes to the Response stream, but the filter doesn’t pass on the Write immediately to the filter’s internal stream. The data is cached and only when the Flush() method is called to finalize the Stream’s output do we actually send the cached stream off for transformation (if the events are hooked up) and THEN finally write out the returned content in one big chunk. Here’s the implementation of ResponseFilterStream: /// <summary> /// A semi-generic Stream implementation for Response.Filter with /// an event interface for handling Content transformations via /// Stream or String. /// <remarks> /// Use with care for large output as this implementation copies /// the output into a memory stream and so increases memory usage. /// </remarks> /// </summary> public class ResponseFilterStream : Stream { /// <summary> /// The original stream /// </summary> Stream _stream; /// <summary> /// Current position in the original stream /// </summary> long _position; /// <summary> /// Stream that original content is read into /// and then passed to TransformStream function /// </summary> MemoryStream _cacheStream = new MemoryStream(5000); /// <summary> /// Internal pointer that that keeps track of the size /// of the cacheStream /// </summary> int _cachePointer = 0; /// <summary> /// /// </summary> /// <param name="responseStream"></param> public ResponseFilterStream(Stream responseStream) { _stream = responseStream; } /// <summary> /// Determines whether the stream is captured /// </summary> private bool IsCaptured { get { if (CaptureStream != null || CaptureString != null || TransformStream != null || TransformString != null) return true; return false; } } /// <summary> /// Determines whether the Write method is outputting data immediately /// or delaying output until Flush() is fired. /// </summary> private bool IsOutputDelayed { get { if (TransformStream != null || TransformString != null) return true; return false; } } /// <summary> /// Event that captures Response output and makes it available /// as a MemoryStream instance. Output is captured but won't /// affect Response output. /// </summary> public event Action<MemoryStream> CaptureStream; /// <summary> /// Event that captures Response output and makes it available /// as a string. Output is captured but won't affect Response output. /// </summary> public event Action<string> CaptureString; /// <summary> /// Event that allows you transform the stream as each chunk of /// the output is written in the Write() operation of the stream. /// This means that that it's possible/likely that the input /// buffer will not contain the full response output but only /// one of potentially many chunks. /// /// This event is called as part of the filter stream's Write() /// operation. /// </summary> public event Func<byte[], byte[]> TransformWrite; /// <summary> /// Event that allows you to transform the response stream as /// each chunk of bytep[] output is written during the stream's write /// operation. This means it's possibly/likely that the string /// passed to the handler only contains a portion of the full /// output. Typical buffer chunks are around 16k a piece. /// /// This event is called as part of the stream's Write operation. /// </summary> public event Func<string, string> TransformWriteString; /// <summary> /// This event allows capturing and transformation of the entire /// output stream by caching all write operations and delaying final /// response output until Flush() is called on the stream. /// </summary> public event Func<MemoryStream, MemoryStream> TransformStream; /// <summary> /// Event that can be hooked up to handle Response.Filter /// Transformation. Passed a string that you can modify and /// return back as a return value. The modified content /// will become the final output. /// </summary> public event Func<string, string> TransformString; protected virtual void OnCaptureStream(MemoryStream ms) { if (CaptureStream != null) CaptureStream(ms); } private void OnCaptureStringInternal(MemoryStream ms) { if (CaptureString != null) { string content = HttpContext.Current.Response.ContentEncoding.GetString(ms.ToArray()); OnCaptureString(content); } } protected virtual void OnCaptureString(string output) { if (CaptureString != null) CaptureString(output); } protected virtual byte[] OnTransformWrite(byte[] buffer) { if (TransformWrite != null) return TransformWrite(buffer); return buffer; } private byte[] OnTransformWriteStringInternal(byte[] buffer) { Encoding encoding = HttpContext.Current.Response.ContentEncoding; string output = OnTransformWriteString(encoding.GetString(buffer)); return encoding.GetBytes(output); } private string OnTransformWriteString(string value) { if (TransformWriteString != null) return TransformWriteString(value); return value; } protected virtual MemoryStream OnTransformCompleteStream(MemoryStream ms) { if (TransformStream != null) return TransformStream(ms); return ms; } /// <summary> /// Allows transforming of strings /// /// Note this handler is internal and not meant to be overridden /// as the TransformString Event has to be hooked up in order /// for this handler to even fire to avoid the overhead of string /// conversion on every pass through. /// </summary> /// <param name="responseText"></param> /// <returns></returns> private string OnTransformCompleteString(string responseText) { if (TransformString != null) TransformString(responseText); return responseText; } /// <summary> /// Wrapper method form OnTransformString that handles /// stream to string and vice versa conversions /// </summary> /// <param name="ms"></param> /// <returns></returns> internal MemoryStream OnTransformCompleteStringInternal(MemoryStream ms) { if (TransformString == null) return ms; //string content = ms.GetAsString(); string content = HttpContext.Current.Response.ContentEncoding.GetString(ms.ToArray()); content = TransformString(content); byte[] buffer = HttpContext.Current.Response.ContentEncoding.GetBytes(content); ms = new MemoryStream(); ms.Write(buffer, 0, buffer.Length); //ms.WriteString(content); return ms; } /// <summary> /// /// </summary> public override bool CanRead { get { return true; } } public override bool CanSeek { get { return true; } } /// <summary> /// /// </summary> public override bool CanWrite { get { return true; } } /// <summary> /// /// </summary> public override long Length { get { return 0; } } /// <summary> /// /// </summary> public override long Position { get { return _position; } set { _position = value; } } /// <summary> /// /// </summary> /// <param name="offset"></param> /// <param name="direction"></param> /// <returns></returns> public override long Seek(long offset, System.IO.SeekOrigin direction) { return _stream.Seek(offset, direction); } /// <summary> /// /// </summary> /// <param name="length"></param> public override void SetLength(long length) { _stream.SetLength(length); } /// <summary> /// /// </summary> public override void Close() { _stream.Close(); } /// <summary> /// Override flush by writing out the cached stream data /// </summary> public override void Flush() { if (IsCaptured && _cacheStream.Length > 0) { // Check for transform implementations _cacheStream = OnTransformCompleteStream(_cacheStream); _cacheStream = OnTransformCompleteStringInternal(_cacheStream); OnCaptureStream(_cacheStream); OnCaptureStringInternal(_cacheStream); // write the stream back out if output was delayed if (IsOutputDelayed) _stream.Write(_cacheStream.ToArray(), 0, (int)_cacheStream.Length); // Clear the cache once we've written it out _cacheStream.SetLength(0); } // default flush behavior _stream.Flush(); } /// <summary> /// /// </summary> /// <param name="buffer"></param> /// <param name="offset"></param> /// <param name="count"></param> /// <returns></returns> public override int Read(byte[] buffer, int offset, int count) { return _stream.Read(buffer, offset, count); } /// <summary> /// Overriden to capture output written by ASP.NET and captured /// into a cached stream that is written out later when Flush() /// is called. /// </summary> /// <param name="buffer"></param> /// <param name="offset"></param> /// <param name="count"></param> public override void Write(byte[] buffer, int offset, int count) { if ( IsCaptured ) { // copy to holding buffer only - we'll write out later _cacheStream.Write(buffer, 0, count); _cachePointer += count; } // just transform this buffer if (TransformWrite != null) buffer = OnTransformWrite(buffer); if (TransformWriteString != null) buffer = OnTransformWriteStringInternal(buffer); if (!IsOutputDelayed) _stream.Write(buffer, offset, buffer.Length); } } The key features are the events and corresponding OnXXX methods that handle the event hookups, and the Write() and Flush() methods of the stream implementation. All the rest of the members tend to be plain jane passthrough stream implementation code without much consequence. I do love the way Action<t> and Func<T> make it so easy to create the event signatures for the various events – sweet. A few Things to consider Performance Response.Filter is not great for performance in general as it adds another layer of indirection to the ASP.NET output pipeline, and this implementation in particular adds a memory hit as it basically duplicates the response output into the cached memory stream which is necessary since you may have to look at the entire response. If you have large pages in particular this can cause potentially serious memory pressure in your server application. So be careful of wholesale adoption of this (or other) Response.Filters. Make sure to do some performance testing to ensure it’s not killing your app’s performance. Response.Filter works everywhere A few questions came up in comments and discussion as to capturing ALL output hitting the site and – yes you can definitely do that by assigning a Response.Filter inside of a module. If you do this however you’ll want to be very careful and decide which content you actually want to capture especially in IIS 7 which passes ALL content – including static images/CSS etc. through the ASP.NET pipeline. So it is important to filter only on what you’re looking for – like the page extension or maybe more effectively the Response.ContentType. Response.Filter Chaining Originally I thought that filter chaining doesn’t work at all due to a bug in the stream implementation code. But it’s quite possible to assign multiple filters to the Response.Filter property. So the following actually works to both compress the output and apply the transformed content: WebUtils.GZipEncodePage(); ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformString += filter_TransformString; Response.Filter = filter; However the following does not work resulting in invalid content encoding errors: ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformString += filter_TransformString; Response.Filter = filter; WebUtils.GZipEncodePage(); In other words multiple Response filters can work together but it depends entirely on the implementation whether they can be chained or in which order they can be chained. In this case running the GZip/Deflate stream filters apparently relies on the original content length of the output and chokes when the content is modified. But if attaching the compression first it works fine as unintuitive as that may seem. Resources Download example code Capture Output from ASP.NET Pages © Rick Strahl, West Wind Technologies, 2005-2010Posted in ASP.NET

Read the article
value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article

Search Results

Search found 62 results on 3 pages for 'indirection'.

Page 3/3 | < Previous Page | 1 2 3

- by erickson

- by Hugh Perkins

- by Dirk

- by Simon Cooper

- by Simon Cooper

- by simonc

- by Lasse V. Karlsen

- by Rob

- by yCalleecharan

- by Michael Williamson

- by Rick Strahl

- by john.rose

< Previous Page | 1 2 3