Page 1
Command-Line Parsing Combinators
Half-baked talk on February 11, 2005
Page 2
Introduction
How to implement command line parsing
- In a modular/reusable/extensible way
- In a self-documenting way
- Generate user documentation and parser from the same source
Page 3
Context
We write our programs in Haskell
- We need to provide a main program,
main :: IO ()
- We can use
getArgs :: IO [String]
- We have functions to implement the desired functionality:
ls :: Bool -> [FilePath] -> IO ()
- How to bridge the gap?
Page 4
Example
silly date
silly bla
silly help
Page 5
Documentation ≈ Implementation
- Documentation
- Implementation
Page 6
Implementation of the example
Page 7
Implementation of the example
...and implementations of the four commands:
cat :: [FilePath] -> IO ()
ls :: Bool -> [FilePath] -> IO ()
date :: IO ()
help :: IO ()
Page 8
Parsing combinators
Traditional, without documentation
Page 9
Parsing combinators
Modified for producing documentation
Page 10
Parsing combinators
Some derived combinators
Page 11
A note on how to parse sequences
- Possible alternatives
ap :: P (a->b) -> P a -> P b
>>= :: P a -> (a->P b) -> P b
pair :: P a -> P b -> P (a,b)
- Monadic parsing combinators?
- No!
- With >>= you can do more than context-free grammars.
- Static analysis of the grammar (extracting the documentation, for example)
becomes difficult...
Page 12
Traditional parsing combinators
- Backtracking parsers:
type P a = String -> Maybe (a,String)
- How to turn a failure into a list of successes:
type P a = String -> [(a,String)]
- More sophisticated implemtations are needed
- to get good error messages
- for efficiency (e.g. to avoid space leaks)
- How to extract documentation?
Page 13
Simplifying assumptions for our purposes
- Command line grammars fairly simple
- Not recursive: can extract grammar without any tricks
- The commands that we parse are not particularly long
- Efficiency is not a problem: implementation can be kept simple
Page 14
Implementation of our command line parsing combinators
Page 15
Implementation of our command line parsing combinators
But the above is not a valid Haskell 98 data type!
- Existential quantification is needed for the type of
Ap
.
- Indexed families of types (aka GADTs) are needed for
the type of
Many
Page 16
Implementation of our command line parsing combinators
Implementation using existential data types
Page 17
Implementation of our command line parsing combinators
Given the above data type we can easily implement the desired operations
on P a
parse :: P a -> [String] -> Either ErrorMessage a
- Traverse and translate to conventional parsing combinators
usage :: P a -> String
- Traverse and pretty print
- Because of the existential quantification in the data type,
polymorphic recursion is required to type these traversal
functions, so type signatures have to be given explicitly.
- See the source code.
Page 18
Alternative implementation of our command line parsing combinators
New combinators that return a pair of
- a traditional combinator parser, and
- a representation of the grammar that can be printed:
data P a = P Grammar (PM a)
data Grammar = ...
- No need for type system extensions. Pure Haskell 98 is enough!
Page 19
Other potential solutions (1)
GetOpt
An existing library for parsing command line options
- Supports the traditional, fairly flexible option syntax, but seems
more limited otherwise
- Monolithic
- Originates in a language (C) that doesn't lead the thoughts to parsing
combinators...
Page 20
Other potential solutions (2)
Use the type class system to construct parser from types
- Similar to what I did with marshalling for the FFI in 2001
- Less work for the programmer
- More complex implementation?
- How to add documentation?
Page 21
Conclusion
- Implemented but not described today: support for named nonterminals
- Lessons learnt: having access to fancy language features doesn't
make you a better programmer!
- Related work and links
Page 22
Conclusion
TODO
- Prettier printing of grammars
- Add a combinator to accept options in arbitrary order
- Extract documentation from recursive grammars
- Convenient support for extra checks that requires access to an
underlying monad. (E.g. checking that a file exists.)
- Report ambiguities in the grammar
- ...
Page 23
The end
Questions?