Compiler Commentary

These pages give an overview of how the compiler fits together, and record notes on the implementation.

Intermediate Languages (IRs)

The compiler uses four intermediate languages.

  • Source Language
    A representation of the source program that preserves all the syntactic structure of the original.
  • Desugared Language
    The source is desugared into this simpler language before extracting type constraints. This makes the compiler easier to change and maintain, at the cost of not being able to produce type error messages as nice as the ones in GHC (which does type inference directly on the source). We extract type constraints from the desugared program, then solve them, which gives us enough information to convert the program to the core language.
  • Core Language
    A System-F style core language, complete with type abstraction and application. Most optimizations are performed in the core language.
  • Sea Language
    A cut down version of C, just enough to form the target of our compiler. The Sea program can be pretty printed into real ANSI C code, which is compiled into object code by GCC. Alternatively, the LLVM backend converts Sea code into LLVM assembly which is compiled to object code by the LLVM compiler.

Compiler Stages

The top level driver is this lists all the stages.

A general outline is as follows:

  • Lexer converts string containing source program to tokens.
  • Parser extracts Source IR from tokens.
  • Renamer resolves scoping and checks for undefined variables.
  • Defixer resolves uses of infix operators.
  • Convert Source IR -> Desugared IR
  • Elaborator defaults missing region, effect and closure variables into type sigs.
  • Projector adds default projection functions for record types.
  • Slurper extracts type constraints from the Desugared IR.
  • Solve type constraints.
  • Use the solved type information to convert Desugared IR -> Core IR
  • Tidy perform some light simplification of the Core IR to clean up after conversion from Desugared IR.
  • Thread through type information to construct witnesses.
  • Lint the Core IR to check it is well typed.
  • Dict rewrites projection functions to their instances, and resolves statically known overloadings.
  • Prim identifies primitive operations.
  • Simplify performs simplifications and optimisations on the Core IR.
  • LambdaLift lifts nested functions to top level, producing supercombinators.
  • Prep normalises Core IR in preparation for conversion to Sea IR.
  • Curry decides how to perform function calls, ie what is under or over applied.
  • Create .di interface file for the module.
  • Convert Core IR -> Sea IR
  • ExpandCtors expands out code for constructors.
  • Thunking expands out code to create thunks.
  • Forcing adds code for suspension and forcing of thunks.
  • Slotify store pointers to boxed objects on the GC shadow stack.
  • Flatten out case and match statements to use goto and switch.
  • Init generates module initialisation functions to create CAFs at startup.
  • (optionally) Convert Sea IR -> ANSI C -> Object code.
  • (optionally) Convert Sea IR -> LLVM -> Object code.
  • Object code is linked against the Runtime System which is written in C.
Last modified 8 years ago Last modified on Feb 16, 2012, 1:04:18 PM