The true dilemma (Which one to implement for third part of the project CodeGen or Verification Conditions)

Even though I did not get done with the Check Semantics part with all cases, I would like to implement and try to get points for the third part of the project. But obviously it is the case that the time is relatively short to do that. I wonder which one is (relatively) easier to implement for the third part until tomorrow night and want to hear your comments about it.

Thanks
Bahadir.

The verification part has less lines for me. However, in order to write it, you need to actually understand Hoare logic and so on.

Usually, students already understand CodeGen (and MIPS), so they can debug their issues. Most students are not that good at formal logic, so they tend to struggle with that (but I can not speak for you personally).

However, CodeGen covers a lot more cases (for example function calls), where generating correct code is hard. Also, you need a working type checker / AST annotator to properly compile things like re-declarations.

In general, it depends. I can tell you what I would do, but my experience and background is not the same as yours (in particular, I have already written this compiler 3 times and know what to avoid).

Perhaps others can give their recommendation.

2 Likes

You are not going to get many points doing either at this point. Here are some things to consider:

  • Once you have the basic infrastructure for verification the actual implementation of the formula generation is quite easy, especially since a lot of formulas are just true. (The while formulas make up for that though)
  • Verification is in a sense mathematics (with some sad realities thrown it, like variable naming), so if you like that it is going to enjoyable.
  • Code gen would be interesting to see how you can generate actual machine code given a program but I have a feeling that the model (syntax driven code generation) we use is … bad?. The benefit I see here is that you get a taste of the struggles involved in generating code, so that if you ever see a better way presented to you, you will be able to appreciate its design choices more.

Of course I never wrote a compiler before and my understanding is that this project is essentially a commercial for the compiler design lecture, so if you find it interesting you can write an actual compiler there and if not, then just leave it be and learn for the exam instead.

4 Likes

One goal is to write a more complex software with nice ideas.
Compilers unite many areas of computer science (some aspects here (with overlap)):

  • Programming language design
  • computer linguistics (parser)
  • type theory
  • algorithm design (optimizations)
  • analysis and all aspects of semantics in general
  • software design

A compiler nicely combines theory and practice.
As you noticed, some aspects are theoretical computer science, some are mathematics, some are practical considerations, … .
It also fits nicely as last project in programming 2 as it combines many topics you learned during the course:

  • Object orientation in Java
  • Semantics (type system, evaluation, transformation, translation, analysis)
  • Formal semantics of C
  • MIPS assembly

It uses all three languages you used in the other projects

  • Java for the compiler
  • C as compilation source
  • MIPS as the compilation target

Therefore, it recaps many aspects and also helps in exam preparation.
Simultaneously it gives you the possibility to apply what you have learned so far.

As an added bonus, it combines nicely with what you learn in the systems engineering basic lecturs (for most students now in semester 4, previously in semester 2):

  • (Software stack)
  • You have a compiler in a very high level language
  • that compiles a simple (and very important) language
  • down to assembly
  • (Hardware stack)
  • that is loaded by the os (sysarch project)
  • and executed in the MIPS processor (sysarch project)

Additionally, it gives a taste of what you can learn in the compiler construction core course.

Back to the ad for compiler construction:
You will learn tips and tricks on how to generate more efficient code there.

Our code generation is not so bad.
You can save quite a bit of operations by using good register allocation instead of operating entirely on the stack. (Look at any generated code: half of it will be stores and loads (relatively slow))
Other than that, code generation is mostly intelligent matching of patterns and emitting code for them.
In our case, the patterns are very simple and coincide with the AST nodes.

Compared to no code generation (one would write everything in assembly), our code generation is fantastic.
To quote John Ousterhout:

"The best performance improvement is the transition from the non-working state to the working state.
That’s infinite speedup.”

5 Likes

Back to the ad for compiler construction: …

What is the toolbox that we use there? I.e. programming language (source, target, implementation), generators (parser, lexer?), intermediate representations (llvm?), etc.?

You’ll write the compiler up to LLVM IR and later implement optimizations on LLVM IR. Since LLVM is written in C++ and since there are no good FFIs to it, you should write your compiler in C++, too. You can use parser generators but we recommend you do not.

LLVM is modern, well-designed and used heavily in both academia and industry, so it’s a natural choice.

The language you compile from is C with a set of restrictions. But you will need to read and understand the C standard.

1 Like

For anyone as puzzled as me: FFI.

… up to LLVM IR …

LLVM does the codegen for us?

So in total:

Parser/Lexer (handcrafted) -> AST? -> Semantic analysis -> LLVM IR -> Optimization

What version of the C standard will be used? (There are like a million documents to choose from)
Will it be provided? (Most of them seem to be behind paywalls)

1 Like

Yes

Last time around it was C11

1 Like