Poor LLVM JIT performance
- by Paul J. Lucas
I have a legacy C++ application that constructs a tree of C++ objects. I want to use LLVM to call class constructors to create said tree. The generated LLVM code is fairly straight-forward and looks repeated sequences of:
; ...
%11 = getelementptr [11 x i8*]* %Value_array1, i64 0, i64 1
%12 = call i8* @T_string_M_new_A_2Pv(i8* %heap, i8* getelementptr inbounds ([10 x i8]* @0, i64 0, i64 0))
%13 = call i8* @T_QueryLoc_M_new_A_2Pv4i(i8* %heap, i8* %12, i32 1, i32 1, i32 4, i32 5)
%14 = call i8* @T_GlobalEnvironment_M_getItemFactory_A_Pv(i8* %heap)
%15 = call i8* @T_xs_integer_M_new_A_Pvl(i8* %heap, i64 2)
%16 = call i8* @T_ItemFactory_M_createInteger_A_3Pv(i8* %heap, i8* %14, i8* %15)
%17 = call i8* @T_SingletonIterator_M_new_A_4Pv(i8* %heap, i8* %2, i8* %13, i8* %16)
store i8* %17, i8** %11, align 8
; ...
Where each T_ function is a C "thunk" that calls some C++ constructor, e.g.:
void* T_string_M_new_A_2Pv( void *v_value ) {
string *const value = static_cast<string*>( v_value );
return new string( value );
}
The thunks are necessary, of course, because LLVM knows nothing about C++. The T_ functions are added to the ExecutionEngine in use via ExecutionEngine::addGlobalMapping().
When this code is JIT'd, the performance of the JIT'ing itself is very poor. I've generated a call-graph using kcachegrind. I don't understand all the numbers (and this PDF seems not to include commas where it should), but if you look at the left fork, the bottom two ovals, Schedule... is called 16K times and setHeightToAtLeas... is called 37K times. On the right fork, RAGreed... is called 35K times.
Those are far too many calls to anything for what's mostly a simple sequence of call LLVM instructions. Something seems horribly wrong.
Any ideas on how to improve the performance of the JIT'ing?