
I am writing a parser for a big, ugly, standard language and I need to decide between using either Happy or Parsec. I currently have a priliminary LALR(1) grammar, so a port to Happy would be relatively easy. But, I'm wondering if life would be easier if I chose Parsec's combinator parsing instead. It's error reporting seems to be top notch and it's "optional", "many", and "sepBy1" combinators are very elegant. However, I have a few concerns with Parsec. First is performance; what factor of slow-down should I expect? Second is bug prevention. I don't have much experience writing LL(n) grammars, so how easy is it to introduce bugs in a Parsec grammar? Even though I hate debugging LALR(1) parsing ambiguities, it prevents problems. Also, it appears the GHC developers chose Happy for Haskell. What was their rational? Thanks for your recommendations, one way or another. -Tom

On Tue, 18 Oct 2005, Tom Hawkins wrote:
However, I have a few concerns with Parsec. First is performance; what factor of slow-down should I expect? Second is bug prevention. I don't have much experience writing LL(n) grammars, so how easy is it to introduce bugs in a Parsec grammar? Even though I hate debugging LALR(1) parsing ambiguities, it prevents problems.
I can't give you figures on performance I'm afraid - I think I saw some somewhere but I honestly can't remember where. It's possible to shoot yourself in the foot with the no-backtracking-by-default behaviour, and I've done so a couple of times when not thinking although I don't think I've ever spent more than a couple of hours spotting it. If you've got a decent chunk of test data or don't mind generating it with QuickCheck odds are you can spot it reasonably quickly when it happens. You can always litter try everywhere and then remove it when you're confident you can do so, too. More generally I find it fairly easy to 'just write the grammar' with Parsec, though I've not done anything overly complex and I've still done silly things like use *bold* as the bold syntax in a wiki clone then realise I wanted * lists * like * this which the library doesn't really do anything to help you spot until you try out the list syntax and watch it parse as bold because you listed the bold parser first. It might be interesting to try writing a variant of Parsec designed to search itself for possibilities like that - it's never going to be possible to fix it entirely though because being monadic it's also comfortably capable of parsing anything a turing machine can. IIRC there's a reasonably well-understood type of grammar that corresponds to the first-order fragment of Parsec though?
Also, it appears the GHC developers chose Happy for Haskell. What was their rational?
Amongst other things, Parsec didn't exist at the time. -- flippa@flippac.org Society does not owe people jobs. Society owes it to itself to find people jobs.

On Tue, 18 Oct 2005, Philippa Cowderoy
If you've got a decent chunk of test data or don't mind generating it with QuickCheck odds are you can spot it reasonably quickly when it happens.
When I write a parser I usually also write a pretty-printer (or ugly-printer) plus the following property: prop_parsePretty x = parse (pretty x) == x Randomly generating abstract syntax trees using QuickCheck is often straightforward. I can't remember ever having found a bug in code that has passed this kind of test. (But I guess that I haven't written parsers for very complicated languages...) -- /NAD
participants (3)
-
Nils Anders Danielsson
-
Philippa Cowderoy
-
Tom Hawkins