Fwd: [Haskell-cafe] Parsing in Practice

[Sending it again to haskell-cafe]
On 10/18/05, Tom Hawkins
I am writing a parser for a big, ugly, standard language and I need to decide between using either Happy or Parsec.
I wrote a parser for a big, ugly, non-standard language - Transact-SQL from MSSQL.
I currently have a priliminary LALR(1) grammar, so a port to Happy would be relatively easy.
In such situation I would try Happy first, but with some cautiousness. I also started with LALR(1) parsing in ocamlyacc, but I didn't manage to cover the whole language with this approach. The more productions I added the more conflicts I had. Of course I tried to remove the conflicts, but often when I added new productions I had to repeat the work. It is possible that my understanding of LALR parsing was insufficient, but the problem is that the set of LALR grammars is not closed under composition (as I've read in some paper on GLR parsing). Do you know that Happy supports GLR parsing?
But, I'm wondering if life would be easier if I chose Parsec's combinator parsing instead.
From my experience it was indeed quite nice. However, there were some parts of the grammer that I didn't know how to handle without using "try", which could result in worse efficiency and error reporting.
It's error reporting seems to be top notch
So long as you can avoid using "try" for non-terminals.
and it's "optional", "many", and "sepBy1" combinators are very elegant.
Indeed they are, as they correspond nicely to E in EBNF ;-)
However, I have a few concerns with Parsec. First is performance; what factor of slow-down should I expect?
I only have a Parsec parser, so I can't compare. I dimly remember that my parser had an average efficiency, ie. it could parse about 100kB per second on P3 850. There is a parser-combinator version of a parser for Haskell which you could compare to the one from GHC.
Second is bug prevention. I don't have much experience writing LL(n) grammars, so how easy is it to introduce bugs in a Parsec grammar?
I always feared that this would be a problem, but it wasn't. I used a sample of about 1MB of application code to test, and I didn't get to using QuickCheck.
Even though I hate debugging LALR(1) parsing ambiguities, it prevents problems.
Yes, when you fight the conflicts, you get static guarantees in return. I wonder how it is with GLR? Best regards Tomasz

Tomasz Zielonka
The problem is that the set of LALR grammars is not closed under composition (as I've read in some paper on GLR parsing).
If I'm right, the set of unambiguous grammars is not closed under concatenation nor union. This makes things rather difficult. If you are looking for closed sets under most language operations, but deterministic parsing, you can give a look at packrat parsers, but then you may have some issues when trying to find out which language you are really defining.
On 10/18/05, Tom Hawkins
wrote: Even though I hate debugging LALR(1) parsing ambiguities, it prevents problems.
Yes, when you fight the conflicts, you get static guarantees in return. I wonder how it is with GLR?
You will not get any static guarantees with GLR. Until you feed the generated parser with an input that reveals an ambiguity, you will not know for sure whether such a case might occur or not. Except of course if your grammar is in one of the nice subsets of all CFGs for which you do have static guarantees, like LALR(1). But in that case, there is no point in using a GLR parser. Thus, when using a GLR parser, try to have foolproof code ready to catch an unexpected ambiguity. -- Regards, Sylvain
participants (2)
-
Sylvain Schmitz
-
Tomasz Zielonka