Optimization techniques for the project

Hello,

this thread is mainly meant to explore what optimizations are feasible within the three week window of the project and which are not. This includes my impressions and opinions on what I found, which might be completely off since I haven’t built a single compiler in my life. Also there are some questions on implementation details in there.

Basically I looked over this list on Wikipedia to find out some basic strategies for optimization and to maybe get an idea which problems are actually targeted by optimizations. I skipped over some because they are about functional languages, arrays or smarter code generation and thus out of the scope of the project.

Loop Optimizations

These all look cool but many exploit the rigid structure of FOR-loops:

  • Fission/Fusion: LOOP (c) {P}; LOOP (c) {Q} becomes LOOP (c) {P; Q} and in reverse (assuming that P and Q are somehow independent). From my experience most WHILE-loops in a program have different looking conditions, but maybe that is because all languages I ever used had FOR-loops. Fusion supposedly cuts down on loop overhead while fission seems to be a way to prevent the block within the loop from having to use too many registers because once those run out we have to store stuff on the stack or some other place which incurs a speed/memory penalty. (stuff in registers is readily available while everything else needs to be loaded)
  • Inversion: WHILE (c) {P} becomes IF (c) {P}; DO {P} WHILE(c). Supposedly this let’s you save a few jumps but this is not why I included it, rather it seems to be necessary for the next optimization which seems huge.
  • Loop-invariant code motion: Something within a loop is not influenced by the changes in the loop, so why not just pull it out? The first problem here is that if we have WHILE (c) {P; Q}, where P is “invariant”, then P; WHILE (c) {Q} will (potentially) produce a different result if c is not satisfied. So the way to go is to use inversion on the loop and then to pull out P from the resulting DO WHILE-loop. Interestingly enough the Wikipedia page doesn’t actually qualify what code can be pulled out safely. I think a basic prerequisite is that P is idempotent but I have no clue how to tell that to the compiler.

One big problem I see is that we don’t have DO WHILE-loops and the parser does not accept DO.

I left out loop switching/splitting and software pipelining because they seem very specifc to FOR-loops. If I understand software pipelining correctly it is this:

  • If you have a process, consisting of two tasks, that is repeated many times and switching from one task to the other incurs overhead then doing all first tasks at once and then all second tasks at once is faster overall.

But it has the drawback that you have to store all the the results of the first tasks before you can start with the second one, thus forcing you into poor storage options (stack, hard drive) which incur retrieval costs.

Expression and Statement optimizations

These seem to be concerned with removing redundant lines of code and/or expression evaluations, thus saving resources.

  • Common Sub-expression Elimination: In short it is wasteful to evaluate x + y twice in the evaluation of the term (x + y) + (x + y) * 2. The problem with this is that a term might have side effects, i.e. if I evaluate (x + y) + f() + (x + y) where f is a function that modifies x then I could be screwed depending on the order of evaluation. How is our compiler supposed to handle this?.
  • Constant Folding/Propagation: The folding is a part of the bonus points but propagation is left out here. Basically int x = 2 + 3; int y = x; becomes int x = 5; int y = x; via folding and then int x = 5; int y = 5; via propagation. Propagation probably requires clever handling of scopes, so you don’t overwrite a variable that is shadowing a previously defined variable. (Additionally certain things can be folded based on generic identities, such as 0 * a, 1 * a, a - a, a && a, a || a, etc.)
  • Dead-store Elimination: Removes any assignment that is never used, i.e. the int x = 5; from the previous optimization. This seems dangerous because overwriting memory is potentially a desirable side effect, no matter the value you write. Can this affect tests on the server? Locally the tests in CodeGenTests.java compare the output of MarsUtil.run with the expected value. But run returns the value in $a0, not $v0 or $v1. Is this the same on the server? Also doesn’t it (at least in principle) violate the calling convention?
  • Dead-code Elimination: Remove anything that isn’t used. This seems difficult because I need to know what code is actually used.
  • Inlining: Replace a function call by the function body. This seems to become more efficient for function calls with a lot of constant arguments. In case all arguments are constant the function can be evaluated and the function call can be replaced by that value. This last variant is basically folding as described above with the caveat that global variables might be in the function body, which would screw everything up.

Apparently you can do all of these separately but the cool kids use something called SSA form where they translate the program into a representation in which all variables have unique names and then they throw it all into a graph and can track exactly which things are used at all and which aren’t (which may be obscured by shadowing in the original form)
I don’t see a way around this at all, because as long as I have variables shadowing other variables any pruning of the AST means that I might accidentally kill a vital part of the program. (Which is probably a real problem for source languages which promote shadowing as a “”“feature”"")

Are there any simple intermediate representations available to perform optimizations? I would appreciate any optimization aimed at removing shadowing variables.

Try-hard optimizations

By this I mean replacing multiplications by linear combinations of left shifts and anything similar. But this is already quite close to the hardware as it must make assumptions about the relative costs of operations. Thus I think it must be a code generation issue, but I figured I would ask anyway.

Thanks for any answers, I know it is a lot of text, but with this project planning ahead before implementation is actually necessary.

1 Like

Almost none of these optimizations is feasible in the next three weeks. At least, I could not implement them.

“Trivial” AST constant folding might be feasible, and some other “easy” optimizations might. I highly doubt that you will be able to do any sophisticated loop optimizations, or any more sophisticated constant folding or what not. Be reminded that you have two and a half weeks and need to write the type checker and a working compiler first.

But feel free to try!

1 Like