Some useful notes by Iavor S. Diatchki. (Converted to HTML by Thomas Hallgren)
LexerGen/HsLexerGen
to generate
Lexer/HsLex.hs
which is the
basic lexer. Its main goal is to define:
haskellLex :: String -> [(Token,String)]
Token
is defined in
Lexer/HsTokens.hs
,
and it is basically the
different types of token we have. The string is the value of the
token.
Lexer/HsLexerPass1.hs
which makes use of the basic lexer
haskellLex
. Its main goal is to define:
lexerPass0 :: String -> [(Token,(Pos,String))] lexerPass1Only :: [(Token,(Pos,String))] -> [(Token,(Pos,String))]
lexerPass0
uses haskellLex to separate the input into tokens, and
then annotates them with their positions. lexerPass1Only
removes
whitespace tokens from the input.
At this stage Pos
is simply a pair of Int (defined in
Lexer/HsLexerPass1.hs
). The format is (rows,cols). Positions start
at (1,1).
Lexer/HsLexer.hs
contains the
real lexer that can
interact with the Happy grammar. It defines lexer
lexer :: ((Token,(SrcLoc,String)) -> PM a) -> PM aHere
PM
is the parsing monad, located in
ParseMonad.hs
The lexer expects the parsing monad to have a state component of type:
type State = ([(Token,(SrcPos,String)],[Int])The first component of the state is the list of remaining tokens. The second component is the layout context, i.e. a stack keeping track of indentations of blocks of declarations.
The lexer accesses the state with the aid of three functions:
get :: PM State set :: State -> PM () setreturn :: a -> State -> PM a fail :: String -> PM a(setreturn is simply an optiomization (is it wortherd?), setreturn x s = set s >> return x )
Currently the file ParseMonad.hs
also defines:
eoftoken = (GotEOF,(eof,"")) eof = SrcLoc "?" (-1) (-1) -- hmm (this is also used in HsLexer.hs)
SrcLoc
is a type defined in
../AST/SrcLoc.hs
and is positions with file
names in them. The conversion from Pos
to SrcLoc
happens in a function parseTokens defined in ParseMonad.hs
.