Compiler organization (Bonus)

Hello,

here are a bunch of questions regarding the organization of the compiler project, especially the bonus parts:

  1. It is stated in section 6.3 that in order to be eligible for bonus points you need to pass all public tests for non-bonus assignments. So if I did the AST, semantic analysis and code generation but NOT verification, would I be able to get the 3 = (1 + 2 + 0) bonus points for break and the 5 bonus points for optimization?
  2. From the method order in Compiler.java it looks as if optimizations come before code generation, i.e. they are applied to the AST, not the generated code? Would we also be awarded points for optimizing the machine code?
  3. The BREAK and CONTINUE tokens are already taken into account by the parser (lines 146 and 153 of Parser.java) but I see that there are many more tokens in TokenKind.java than are in our grammar. May we modify the parser so that it can handle some of them?
  4. If the answer to the previous is yes, then which things are interesting/feasible to implement?

For the last question I have some thoughts but since this is my first compiler I would rather ask an expert to make sure I am not taking on more work than I can handle in three weeks time:

  • CONST: I just need to check that a variable is defined at most once. We already do this for functions.
  • GOTO: This one seems tricky because I need to restrict where I am allowed to jump. So I would have to say which scopes are “permeable”.
  • SWITCH, CASE: I would opt for a switch expression without allowing statements as the results of cases (because that seems hard). How does the code generation for this work? I guess I jump to each branch?
  • INLINE: I don’t know exactly how our code generation works but maybe it is not sophisticated enough to profit from this? I imagine a worst case scenario where the compiler inlines and each inlined function is surrounded by useless moves, stores and loads which bloats the code instead of making it faster.

I also saw the FLOAT and DOUBLE tokens but I am not deluded enough to attempt that :upside_down_face:

1 Like

You can get points for the break codegen if you didn’t implement the verification part.
But the verification part is quite fun to implement and play around with.
Your calculation is correct.

Most interesting optimizations operate on the AST.
It is easy to manipulate, a bit more high-level, and allows for easy analysis.

For optimizations on the MIPS code, you would need to rewrite the code generation function to not be syntax guided.
You could optimize instruction selection, register usage, … .
But I would first focus on the more interesting optimizations on the AST. (We can discuss interesting optimizations if you are interested)

The parser supports a bit more than the grammar.
For instance, pre- and post-increment and decrement and the ternary conditional operator are also supported.
If you want to support more, feel free to extend the parser.

First, you should implement the regular things and maybe verification if you first implemented codegen.
But if you then want to extend the compiler, a first few interesting extensions might be

  • break continue (bonus)
  • short circuit expressions (&&, ||)
  • increment/decrement operations and other convenience features like x+=y.

In practice, switch statements are often translated into lookup tables (branch table) to save on comparisons.
One area of programming that uses such tricks extensively is branchless programming.
The idea behind that is that a jump is quite expensive so you try to avoid jumps.
You are right that a switch can be quite complicated.
Even more so in C. For a horror story of syntax see Duff’s Device.

The approach of inlining functions and maybe simplifying them in case some arguments were constants is called partial evaluation.
Partial evaluation (and the surrounding optimizations) are quite powerful and can (together with other metaprogramming techniques) significantly improve the performance.
You need good heuristics/measures when to inline or not (or an annotation by the programmer).
But this optimization is also not in the codegen phase but rather on the AST (or a similar representation like a CFG).

Float is not that difficult if you use the coprocessor 1 floating registers of mips.
But there are certainly more interesting aspects to implement first.

1 Like

Note that in practice, modifying the parser or the ASTFactory will lead to issues since these classes may not be modified and are overwritten with the template before testing on the server. So I would advise against modifying it, or at least against attempting to do this before the deadline for the regular project.

1 Like

Thanks for the answers, I will open a thread for optimizations shortly and that should also address some of the points in the answer that fit better there than here.

You can get points for the break codegen if you did implement the verification part.
But the verification part is quite fun to implement and play around with.

I think you wanted to write didn't? In any case I will give verification a shot after the generation is done.

Most interesting optimizations operate on the AST.

But I would first focus on the more interesting optimizations on the AST. (We can discuss interesting optimizations if you are interested)

After reading up on some basic optimizations I would now agree that the AST optimizations look more interesting. I will open a separate thread for that.

The parser supports a bit more than the grammar.
For instance, pre- and post-increment and decrement and the ternary conditional operator are also supported.
If you want to support more, feel free to extend the parser.

But if you then want to extend the compiler, a first few interesting extensions might be

  • break continue (bonus)
  • short circuit expressions (&&, ||)
  • increment/decrement operations and other convenience features like x+=y.

I did not realize that ++, -- and ? : are supported, thanks for pointing it out. It seems like starting with the supported statements/expressions is indeed the best way to proceed. Once I have something that works I can still extend it.

The idea behind that is that a jump is quite expensive so you try to avoid jumps.

I found that this was a common theme in the Wikipedia list of compiler optimizations as apparently whenever there is a branch in the assembly the processor doesn’t know which instructions it should pipeline, causing it to sometimes calculate something and then having to discard it, thus wasting time.

@johannes That is a valid point. Maybe optimizations are a better first priority in the bonus category.

Yes, autocorrect changed it again.
I wanted to express that you only needed the correlated parts:
For codegen of break, you need the test for codegen, for break sema, the sema tests and so on.

Exactly, a modern processor always performs multiple instructions at once using a process called pipelining.
For jumps, it has to speculate whether to jump away or execute the code after the jump.
If the speculation was wrong, bad things happen (leading to slower execution times and security issues (spectre, meltdown, …)). (More in the systems architecture lecture)
Another important aspects for jumps are GPU architectures or more generally, SIMD, where one code snippet is executed many times (in parallel). If you have jumps, they might differ and lead to overhead and non-uniform execution.

1 Like