Languages and VMs: Features that are hard to optimize and why
- by mrjoltcola
I'm doing a survey of features in preparation for a research project.
Name a mainstream language or language feature that is hard to optimize, and why the feature is or isn't worth the price paid, or instead, just debunk my theories below with anecdotal evidence. Before anyone flags this as subjective, I am asking for specific examples of languages or features, and ideas for optimization of these features, or important features that I haven't considered. Also, any references to implementations that prove my theories right or wrong.
Top on my list of hard to optimize features and my theories (some of my theories are untested and are based on thought experiments):
1) Runtime method overloading (aka multi-method dispatch or signature based dispatch). Is it hard to optimize when combined with features that allow runtime recompilation or method addition. Or is it just hard, anyway? Call site caching is a common optimization for many runtime systems, but multi-methods add additional complexity as well as making it less practical to inline methods.
2) Type morphing / variants (aka value based typing as opposed to variable based)
Traditional optimizations simply cannot be applied when you don't know if the type of someting can change in a basic block. Combined with multi-methods, inlining must be done carefully if at all, and probably only for a given threshold of size of the callee. ie. it is easy to consider inlining simple property fetches (getters / setters) but inlining complex methods may result in code bloat. The other issue is I cannot just assign a variant to a register and JIT it to the native instructions because I have to carry around the type info, or every variable needs 2 registers instead of 1. On IA-32 this is inconvenient, even if improved with x64's extra registers. This is probably my favorite feature of dynamic languages, as it simplifies so many things from the programmer's perspective.
3) First class continuations - There are multiple ways to implement them, and I have done so in both of the most common approaches, one being stack copying and the other as implementing the runtime to use continuation passing style, cactus stacks, copy-on-write stack frames, and garbage collection. First class continuations have resource management issues, ie. we must save everything, in case the continuation is resumed, and I'm not aware if any languages support leaving a continuation with "intent" (ie. "I am not coming back here, so you may discard this copy of the world"). Having programmed in the threading model and the contination model, I know both can accomplish the same thing, but continuations' elegance imposes considerable complexity on the runtime and also may affect cache efficienty (locality of stack changes more with use of continuations and co-routines). The other issue is they just don't map to hardware. Optimizing continuations is optimizing for the less-common case, and as we know, the common case should be fast, and the less-common cases should be correct.
4) Pointer arithmetic and ability to mask pointers (storing in integers, etc.) Had to throw this in, but I could actually live without this quite easily.
My feelings are that many of the high-level features, particularly in dynamic languages just don't map to hardware. Microprocessor implementations have billions of dollars of research behind the optimizations on the chip, yet the choice of language feature(s) may marginalize many of these features (features like caching, aliasing top of stack to register, instruction parallelism, return address buffers, loop buffers and branch prediction). Macro-applications of micro-features don't necessarily pan out like some developers like to think, and implementing many languages in a VM ends up mapping native ops into function calls (ie. the more dynamic a language is the more we must lookup/cache at runtime, nothing can be assumed, so our instruction mix is made up of a higher percentage of non-local branching than traditional, statically compiled code) and the only thing we can really JIT well is expression evaluation of non-dynamic types and operations on constant or immediate types. It is my gut feeling that bytecode virtual machines and JIT cores are perhaps not always justified for certain languages because of this.
I welcome your answers.