Cactus

10 June 2003

MSc Thesis by Niklas Martinsson, Chalmers, Göteborg, 2001

Tuesday seminar at OGI by Thomas Hallgren


Cactus - a Concrete to Abstract syntax Conversion Tool with User-friendly Syntax

MSc Thesis by Niklas Martinsson, Chalmers, Göteborg, 2001.

Tuesday seminar at OGI on 10 June, 2003 by Thomas Hallgren.

Why a new parser generator?

There are so many already!

  • Accent
  • AnaGram
  • Antlr/PCCTS
  • BTYACC
  • BYACC
  • Centaur
  • CLaRet
  • Cocktail
  • Coco/R
  • COCOM / Russian Armoury
  • CUP
  • Depot4
  • Elegant
  • Eli
  • FNC-2
  • Gentle
  • JavaCC
  • Lemon
  • Lisa
  • Mango
  • NewYacc
  • PRECC
  • Rie
  • Rigal
  • SableCC
  • Yay
  • Yoocc/Trooper
  • Zyacc

Links to some of these tools

Goals for Cactus

A comparison

We compare how to specify the syntax of simple arithmetic expression.

Grammar:

expr ::= expr + term
       | expr - term
       | term
term ::= term * factor
       | term / factor
       | factor
factor ::= number
         | ( expr )

Happy+Alex example

SabelCC example

Package SimpleMath;

Tokens
  l_paren = '(';
  r_paren = ')';
  plus = '+';
  minus = '-';
  mult = '*';
  div = '/';
  number = ['0'..'9']+;
  blank = [' ' 9 12 8 11 13 10];

Ignored Tokens
  blank;

Productions
  expr   = {term}   term
         | {plus}   expr plus  term
         | {minus}  expr minus term ;
  term   = {factor} factor
         | {mult}   term mult factor
         | {div}    term div  factor ;
  factor = {number} number
         | {expr}   l_paren expr r_paren ;

Notes

SabelCC

Happy+Alex

A note on declaring all tokens

Can be considered good practice, but what is the point of naming all the keywords?!

[Long list of Java tokens]

Cactus example

.blank = [" \t\f\b\v\r\n"]
.number :: Int = ['0'..'9']+

expr :: Expr
  = ..       ::= term
  | Add..    ::= expr '+' term
  | Sub..    ::= expr '-' term ;

term
  = ..       ::= factor
  | Mul..    ::= term '*' factor
  | Div..    ::= term '/' factor ;

factor
  = Number.. ::= number
  | ..       ::= '(' expr ')'

Cactus example notes

From the 13 lines of code on the previous slice, Cactus generates

Haskell back-endC back-end
  • A Happy parser
  • An Alex lexer
  • Data types for the abstract syntax
  • A main program for testing
  • A Yacc parser
  • A flex lexer
  • Data types for the abstract syntax (.c and .h files)
  • A main program for testing
  • A Makefile
Cactus also supplies a generic show for abstract syntax trees.

The user needs to supply a function for converting the number strings to Int.

Cactus example made more verbose

.blank = [" \t\f\b\v\r\n"]
.number :: Int = ['0'..'9']+

expr :: Expr
  = term              ::= term
  | Add(expr1,term1)  ::= expr '+' term
  | Sub(expr,term)    ::= expr '-' term ;

term
  = $1         ::= factor
  | Mul($1,$3) ::= term '*' factor
  | Div($1,$3) ::= term '/' factor ;

factor
  = Number($1) ::= number
  | $2         ::= '(' expr ')'

%data Expr
  = Add(Expr, Expr)
  | Sub(Expr, Expr)
  | Mul(Expr, Expr)
  | Div(Expr, Expr)
  | Number(Int)

Cactus type inference

Cactus can infer many thing automatically:

CategoryCan be inferredCan be given explicitly
Types of nonterminalsby type inferenceyes
Names of typesfrom names of nonterminalsyes
Names of data constructorsnoyes
Types of data constructorsfrom their useyes
Keywordsfrom use in concrete syntaxyes
Abstract syntaxyes...yes

Cactus example made slightly simpler

Cactus supports precedence declarations.
.blank = [" \t\f\b\v\r\n"]
.number :: Int = ['0'..'9']+

%left.6 = '+', '-'
%left.7 = '*', '/'

expr
  = Add..    ::= expr '+' expr
  | Sub..    ::= expr '-' expr
  | Mul..    ::= expr '*' expr
  | Div..    ::= expr '/' expr
  | Number.. ::= number
  | ..       ::= '(' expr ')' ;

Cactus example made even simpler

.blank = [" \t\f\b\v\r\n"]
.number :: Int = ['0'..'9']+

%left.6 = '+', '-'
%left.7 = '*', '/'

expr
  = Op..    ::= expr op expr
  | Number.. ::= number
  | ..       ::= '(' expr ')' ;

%expand op ::= '+' | '-' | '*' | '/' ;

Cactus supports EBNF

To further simplify grammar specifications, Cactus support EBNF notation: But what abstract syntax is constructed, and how do you refer to parts of these compound constructions?

Cactus references

Cactus supports several ways to refer to parts of the syntax tree when specifying the abstract syntax.

By position List($2:$3.2) ::= '[' exp (',' exp)+ ']'
By name+position List(exp1:exp2) ::= '[' exp (',' exp)+ ']'
By explicit names List(e1:e2) ::= '[' exp:e1 (',' exp:e2)+ ']'
Automatically List.. ::= '[' exp:e1 (',' exp:e2)+ ']'

(In this example, $3 would work instead of $3.2)

Built-in types

Apart from (inferred or explicitly defined) algebraic data types, Cactus has the following built-in types and operators:

Bigger examples

Future work

The end


[ Valid HTML?| Check Links]