Intended way to solve the Parser


I had a working parser and it essentially revolved around carefully building a Regex (or more precisely a wrapper around regex which keeps track of group names) out of the Uri grammar in small steps and then just matching that.

But this means that I hardcoded the grammar as a Regex, so I took a step back and implemented classes inheriting from an interface Expression, meant to model regular expressions as an algebraic data type, as done in the lecture and a visitor to build the Regex for me if given an expression.

So now instead of hardcoding the grammar as a Regex I have it hardcoded as an Expression. The next logical step would be to invent a file format that stores a grammar like it is done in the doc string of Uri which each rule being a definition of a regular expression and then to build a lexer that produces such an expression programmatically from the lines of the file.

If I did that I would have build a lexer/parser for regular expressions which reads a regular grammar and produces a regex out of it which I then give to Java’s Regex engine to match the String uri.

I believe that this is not the intended way to solve the problem and besides I would have still hardcoded something, namely the grammar of regular expressions. So:

  1. What was I actually meant to do? (Something like making the rules of the URI grammar the expressions? Then I hardcode the grammar again and I also have to write a lexer to create expressions out of the input string)
  2. Is there anyway to remove hardcoded grammars out of a lexer/parser/compiler/whatever?

Bonus question: My regex is build statically in a class consisting only of constants with a private constructor. I do this because:

  1. There is no main method, so I have to get the regex either statically or I would have to build it every time I call the UriParserFactory (= waste of time and not the place it should happen anyway).
  2. This is essentially the place where a lexer would go, but that is missing. So this class is a band aid.

Are there secret tests which deduct points for such bad style?

How you implement the parser is up to you.
We only require that the parser is correct, and does only use standard libraries (but not
How much you abstract or hardcode is your choice.
Certain things get easier with abstractions, certain things get harder.
As often the case in computer science, one has to balance the tradeoff.

What you describe with the approach to parse the grammar from files and use them, is called a parser generator.
Notable examples are Flex (lexer generator), Bison, Yacc, JavaCC, CoCo/R or Antlr.

You need one way to communicate with the program that generates the parser or the program that parses what grammar should be used (what is valid, what is invalid).
As you noticed, this communication can happen at different locations inside or outside the code and on different levels.
Even if you move out the rules themselves, you need to communicate how to interpret the rules.
Depending on the approach, you can use parsers that other people have written:

  • If you communicate the rules as Java code, you use the Java parser
  • If you communicate them as RegEx, you can use the RegEx functionality of Java
  • If you communicate them in some new custom format, you have to write the parser for that format

To have the RegEx/patterns as static final is a good idea as they are constant and independent of instances (they do not change).

But it might be better to move the constants to the place where they are used to group them into logical units instead of a separate class that makes refactoring harder.
Another problem might be that you overlook constants from that class and duplicate them if you introduce constants at another location.

But we won’t deduct points for such things in this project.


Note that hardcoding things is not necessarily bad. After all, the program has to parse one specific grammar, which will not change. That specific grammar needs to exist somewhere, so by building more fancy grammar parser generator parser generator generators ( :slightly_smiling_face: ), you’re just making your program more complicated.

1 Like