I know that many people, at a first glance of the question, may immediately yell out "Java", but no, I know Java's qualities. Allow me to elaborate my question first.
Normally, when we want our program to run at a native speed on a system, whether it be Windows, Mac OS X, or Linux, we need to compile from source codes. If you want to run a program of another system in your system, you need to use a virtual machine or an emulator. While these tools allow you to use the program you need on the non-native OS, they sometimes have problems of performance and glitches.
We also have a newer compiler called "JIT Compiler", where the compiler will parse the bytecode program to native machine language before execution. The performance may increase to a very good extent with JIT Compiler, but the performance is still not the same as running it on a native system.
Another program on Linux, WINE, is also a good tool for running Windows program on Linux system. I have tried running Team Fortress 2 on it, and tried experiment with some settings. I got ~40 fps on Windows at its mid-high setting on 1280 x 1024. On Linux, I need to turn everything low at 1280 x 1024 to get ~40 fps. There are 2 notable things though:
Polygon model settings do not seem to affect framerate whether I set it low or high.
When there are post-processing effects or some special effects that require manipulation of drawn pixels of the current frame, the framerate will drop to 10-20 fps.
From this point, I can see that normal polygon rendering is just fine, but when it comes to newer rendering methods that requires graphic card to the job, it slows down to a crawl.
Anyway, this question is rather theoretical. Is there anything we can do at all? I see that WINE can run STEAM and Team Fortress 2. Although there are flaws, they can run at lower setting. Or perhaps, I should also ask, "is it possible to translate one whole program on a system to another system without recompiling from source and get native speed?" I see that we also have AOT Compiler, is it possible to use it for something like this? Or there are so many constraints (such as DirectX call or differences in software architecture) that make it impossible to have a flawless and not native to the system program that runs at native speed?