on
Writing compilers
A few notes on writing compilers
Here are a few short notes on writing a compiler. These should be treated as opinions - perhaps you even agree with them.
Notes
Store line number and position with each token to enhance error reporting and handling.
Store the lexical token type in the AST tree node to avoid introducing a separate AST node type. Add tokens if necessary to support special AST nodes.
Use 32-bit (or smaller) indexes to reference objects instead of pointers. This saves memory in 64-bit environments and potentially makes things more cache friendly.
Use a Pratt/#2 recursive descent parser to handle operator precedence in a compact and efficient way.
EBNF is a good method for documenting language syntax.
C++ Exceptions are a viable way to report errors in the parser and handle continue-after-error procedures.
Recusive descent parsers may require backtracking. Use queued lexer to help with this.
Output options
- Transpile to other language such as C.
- Output LLVM IR -> native target.
- Custom VM.
- Dedicated assembly backend.