I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows.
High-level concepts of a "language" (in essential a Java API for creating executables) are translated into opcodes, which then are wrapped and outputted to a file. The translation process has several phases, one is the translation between languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code (probably more than 3 levels).
My problem is the following; if I have higher-level code (X and Y) translated to lower-level code (x, y, U and V), then an example of such a translation is, in pseudo-code:
x + U(f) // generated by X
+
V(f) + y // generated by Y
(An easy example) where V is the opposite of U (compare with a stack push as U and a pop as V). This needs to be 'optimized' into:
x + y
(essentially removing the "useless" code)
My idea was to use regular expressions. For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null, meaning for all x find U(x) followed by V(x) and replace by null. Imagine more complex regular expressions, for more complex optimizations. This should work on all levels.
What do you suggest? What would be a good approach to optimize in these situations?