I am doing some optimizations on an MPEG decoder. To ensure my optimizations aren't breaking anything I have a test suite that benchmarks the entire codebase (both optimized and original) as well as verifying that they both produce identical results (basically just feeding a couple of different streams through the decoder and crc32 the outputs).
When using the "-server" option with the Sun 1.6.0_18, the test suite runs about 12% slower on the optimized version after warmup (in comparison to the default "-client" setting), while the original codebase gains a good boost running about twice as fast as in client mode.
While at first this seemed to be simply a warmup issue to me, I added a loop to repeat the entire test suite multiple times. Then execution times become constant for each pass starting at the 3rd iteration of the test, still the optimized version stays 12% slower than in the client mode.
I am also pretty sure its not a garbage collection issue, since the code involves absolutely no object allocations after startup. The code consists mainly of some bit manipulation operations (stream decoding) and lots of basic floating math (generating PCM audio). The only JDK classes involved are ByteArrayInputStream (feeds the stream to the test and excluding disk IO from the tests) and CRC32 (to verify the result). I also observed the same behaviour with Sun JDK 1.7.0_b98 (only that ist 15% instead of 12% there).
Oh, and the tests were all done on the same machine (single core) with no other applications running (WinXP). While there is some inevitable variation on the measured execution times (using System.nanoTime btw), the variation between different test runs with the same settings never exceeded 2%, usually less than 1% (after warmup), so I conclude the effect is real and not purely induced by the measuring mechanism/machine.
Are there any known coding patterns that perform worse on the server JIT? Failing that, what options are available to "peek" under the hood and observe what the JIT is doing there?