ANNOUNCE: grammar-combinators 0.1 (initial release): A parsing library of context-free grammar combinators

The grammar-combinators library is a parsing library employing a novel grammar representation with explicit recursion. The library features much of the power of a parser generator like Happy or ANTLR, but with the library approach and most of the benefits of a parser combinator library. Grammars and grammar algorithms are defined in a functional style. The library currently has the following features: * Grammar specified completely in Haskell using an elegant syntax * Grammar algorithms implemented in a functional style (no fresh identifiers), with elegant and meaningful types. * Multi-backend: use the same grammar with a Packrat, Parsec or UUParse parser * Grammar transformations: use left-recursive grammars directly thanks to a powerful grammar transformation library, featuring the left-corner left-recursion removal transform, a uniform version of the classic Paull left-recursion removal, and various smaller transformations (dead-branch removal, dead non-terminal removal, consecutive epsilon combination, selective unfolding etc.). * Grammar utility functions: printing of grammars, FIRST-set calculation, reachability analysis of non-terminals, etc. * Compile-time transformations (using Template Haskell), given a suitable definition of the grammar. This is currently limited to a certain set of transformations. The library is currently not intended for mainstream use. Its API is relatively stable, but performance needs to be looked at further. We are submitting a paper about the ideas behind this library to PADL 2011. A draft is linked on the project's website. More information: * Project website: http://projects.haskell.org/grammar-combinators/ * Tutorial: http://projects.haskell.org/grammar-combinators/tutorial.html * Hackage: http://hackage.haskell.org/package/grammar-combinators All comments welcome! Dominique PS. The documentation on hackage currently doesn't build because of (seemingly) a Hackage dependency problem during the build [1]. Compiling and generating the documentation locally should work fine. A version of the docs is available on the project's webpage as a temporary replacement [2]. Footnotes: [1] http://www.haskell.org/pipermail/libraries/2010-September/014168.html [2] http://projects.haskell.org/grammar-combinators/docs/index.html

Some snippets from the Tutorial [1] to give an idea of the grammar-combinator library's approach, its functional style and its additional power (e.g. the transformations used): Defining a simple expresssions grammar: grammarArith :: ExtendedContextFreeGrammar ArithDomain Char grammarArith Line = LineF $>> ref Expr >>>* endOfInput grammarArith Expr = SubtractionF $>> ref Expr >>>* token '-' >>> ref Term ||| SumF $>> ref Expr >>>* token '+' >>> ref Term ||| SingleTermF $>> ref Term grammarArith Term = SingleFactorF $>> ref Factor ||| QuotientF $>> ref Term >>>* token '/' >>> ref Factor ||| ProductF $>> ref Term >>>* token '*' >>> ref Factor grammarArith Factor = NumberF $>> many1Ref Digit ||| ParenthesizedF $>>* token '(' >>> ref Expr >>>* token ')' grammarArith Digit = DigitF $>> tokenRange ['0' .. '9'] A semantic processor: data family ArithValue ix newtype instance ArithValue Line = ArithValueL Int deriving (Show) newtype instance ArithValue Expr = ArithValueE Int deriving (Show) newtype instance ArithValue Term = ArithValueT Int deriving (Show) newtype instance ArithValue Factor = ArithValueF Int deriving (Show) newtype instance ArithValue Digit = ArithValueD Char deriving (Show) calcArith :: Processor ArithDomain ArithValue calcArith Line (LineF (ArithValueE e)) = ArithValueL e calcArith Expr (SumF (ArithValueE e) (ArithValueT t)) = ArithValueE $ e + t calcArith Expr (SingleTermF (ArithValueT t)) = ArithValueE t calcArith Term (ProductF (ArithValueT e) (ArithValueF t)) = ArithValueT $ e * t calcArith Term (SingleFactorF (ArithValueF t)) = ArithValueT t calcArith Factor (ParenthesizedF (ArithValueE e)) = ArithValueF e calcArith Factor (NumberF ds) = ArithValueF $ read $ map unArithValueD ds calcArith Digit (DigitF c) = ArithValueD c unArithValueD :: ArithValue Digit -> Char unArithValueD (ArithValueD c) = c Transforming the grammar: calcGrammarArith :: ProcessingExtendedContextFreeGrammar ArithDomain Char ArithValue calcGrammarArith = applyProcessorE grammarArith calcArith calcGrammarArithTP :: ProcessingExtendedContextFreeGrammar (UPDomain ArithDomain) Char (UPValue ArithValue) calcGrammarArithTP = transformUniformPaullE calcGrammarArith calcGrammarArithTPF :: ProcessingExtendedContextFreeGrammar (UPDomain ArithDomain) Char (UPValue ArithValue) calcGrammarArithTPF = filterDiesE (unfoldDeadE calcGrammarArithTP) calcGrammarArithTPFF :: ProcessingContextFreeGrammar (FoldLoopsDomain (UPDomain ArithDomain)) Char (FoldLoopsValue (UPValue ArithValue)) calcGrammarArithTPFF = foldAndProcessLoops calcGrammarArithTPF Parsing: *Main> parsePackrat calcGrammarArithTPFF (FLBase (UPBase Line)) "123" Parsed FLBV {unFLBV = UPBV {unUPBV = ArithValueL 123}} _ *Main> parsePackrat calcGrammarArithTPFF (FLBase (UPBase Line)) "123+" NoParse *Main> parsePackrat calcGrammarArithTPFF (FLBase (UPBase Line)) "123+12" Parsed FLBV {unFLBV = UPBV {unUPBV = ArithValueL 135}} _ *Main> parseParsec calcGrammarArithTPFF (FLBase (UPBase Line)) "" "123+12" Right (FLBV {unFLBV = UPBV {unUPBV = ArithValueL 135}}) *Main> parseUU calcGrammarArithTPFF (FLBase (UPBase Line)) "123+12" FLBV {unFLBV = UPBV {unUPBV = ArithValueL 135}} Dominique Footnotes: [1] http://projects.haskell.org/grammar-combinators/tutorial.html

.. grammar-combinator library's approach ..
am I reading this correctly: in the traditional combinator approach, a grammer (a parser) is a Haskell value, while in your approach, the grammar is a Haskell (GAD)type? then you'll get more static guarantees (e.g., context-freeness) but you need extra (type-level, or even syntax-level) machinery to handle grammars. Convince me that it's worth it ... I guess the proper solution (TM) is to blur the distiction between types and values by switching to dependent types altogether... J.W.

On Sep 8, 2010, at 7:49 AM, Johannes Waldmann wrote:
then you'll get more static guarantees (e.g., context-freeness) but you need extra (type-level, or even syntax-level) machinery to handle grammars. Convince me that it's worth it ...
Those guarantees, along with just the fact that the parser specification is data rather than a black-box function, explicitly make possible some very cool stuff, not least of which is true parser-generator-quality parsers. Just like a regex can be compiled once and then run repeatedly very fast, a context-free grammar can be compiled once to a stack-machine specification and run repeatedly. That compilation process is highly nonlocal and would never be possible with, e.g., the Parsec approach. On the other hand, parser generators such as Happy (which perform just such a transformation) cannot allow you to construct a grammar at run-time. Even if you end up deploying a parser using a different framework, the grammar transformation stuff is pretty cool too. By specifying your grammar in this system, you get to play around with it, transform it, etc., and see what the transformed grammar looks like. Incidentally, it'd be pretty nifty if someone made a Happy backend, or even just a TH snippet, that generated a grammar-combinators grammar and semantic action from a Happy parser specification. -- James

That compilation process is highly nonlocal and would never be possible with, e.g., the Parsec approach.
Pipe dream: attach such a grammar object to every Parsec parser, and include the "compiler" with the combinators, and have them run at (Haskell) compile time (in ghc's specializer). Should work for some subset (e.g., just let, not letrec, use proper combinators instead) and with some future ghc version ... When I teach parsing (in Compiler Construction), for lack of time it's either "traditional" (CFG -> PDA) or "combinator" (not both), and I'm not happy with that, since both are important concepts. But then, semantics is more important than syntax ... J.W.

Johannes,
2010/9/8 Johannes Waldmann
That compilation process is highly nonlocal and would never be possible with, e.g., the Parsec approach.
Pipe dream: attach such a grammar object to every Parsec parser, and include the "compiler" with the combinators, and have them run at (Haskell) compile time (in ghc's specializer).
You can actually use a grammar-combinators parser with Parsec (although the current implementation will use backtracking on every branch), keeping the original grammar around for other purposes. About the compile-time stuff, there is code in the library doing compile-time transformations using Template-Haskell (but requiring a grammar with embedded TH splices for injected values). You could also do a similar compilation to a PDA parser in TH if you want, again keeping the full grammar available for other stuff. Additionally, I have noted that passing certain GHC inlining flags as has been suggested for generic code [1] produces spectacular (execution time/16) optimizations for a test grammar, but I have not investigated what resulting code GHC actually produces in this case. This is also related to what you talk about, since the compiler does part of the transformation from abstract grammar at compile time.
Should work for some subset (e.g., just let, not letrec, use proper combinators instead) and with some future ghc version ...
When I teach parsing (in Compiler Construction), for lack of time it's either "traditional" (CFG -> PDA) or "combinator" (not both), and I'm not happy with that, since both are important concepts. But then, semantics is more important than syntax ...
I actually think of the grammar-combinators approach as an attempt to bring the power available in parser combinator libraries to the level of what can be done in parser generators. Dominique Footnotes: [1] http://www.cs.uu.nl/research/techreps/repo/CS-2009/2009-022.pdf

Johannes,
(sorry for the double mail)
I will give some short answers below, but you can find more details in
the paper we are submitting to PADL 2011 [1].
2010/9/8 Johannes Waldmann
.. grammar-combinator library's approach .. am I reading this correctly: in the traditional combinator approach, a grammer (a parser) is a Haskell value, while in your approach, the grammar is a Haskell (GAD)type?
Not completely. A grammar-combinators grammar is a Haskell value with a different (more complicated) type than a traditional parser combinator value. It is actually a function that returns the production rules for a given non-terminal. Because the non-terminals are modelled using a GADT and do not have the same type, the grammar's production rules' types can depend on the non-terminal in question.
then you'll get more static guarantees (e.g., context-freeness) but you need extra (type-level, or even syntax-level) machinery to handle grammars. Convince me that it's worth it ...
The advantage of the grammar-combinators approach is that grammar algorithms have a lot more power, because they can reason explicitly about the recursion in the grammar, whereas the recursion is not observable in the traditional parser combinators approach. The Parser combinator model is in fact so limited that something simple as pretty-printing a BNF representation of the grammar is fundamentally impossible. More details in the PADL-submitted draft. As James says below, a grammar algorithm using grammar-combinators grammars can observe the recursion in the grammar and can therefore do stuff for which you would otherwise have to use a parser generator.
I guess the proper solution (TM) is to blur the distiction between types and values by switching to dependent types altogether...
There is actually some very interesting work about dependently typed parser combinator libraries, I discuss this in the related work section of the PADL paper. Dominique Footnotes: [1] http://projects.haskell.org/grammar-combinators/#background
participants (3)
-
Dominique Devriese
-
James Andrew Cook
-
Johannes Waldmann