Why GHC is written in Happy and not a monadic parser library?

Hi! Is there any specific reason why GHC is written in a parser GENERATOR (Happy) and not in MONADIC PARSER COMBINATOR (like parsec)? Is Happy faster / handles better errors / hase some great features or anything else? thank you :)

On Fri, Aug 2, 2013 at 8:49 PM, blackbox.dev.ml
Is there any specific reason why GHC is written in a parser GENERATOR (Happy) and not in MONADIC PARSER COMBINATOR (like parsec)?
Is Happy faster / handles better errors / hase some great features or anything else?
Most probably because GHC predates practical parser combinators. Happy is just a yacc clone, really; ancient tech. And I would suspect that replacing the parser at this point could get pretty painful. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Fri, Aug 2, 2013 at 5:49 PM, blackbox.dev.ml
Hi! Is there any specific reason why GHC is written in a parser GENERATOR (Happy) and not in MONADIC PARSER COMBINATOR (like parsec)?
Is Happy faster / handles better errors / hase some great features or anything else?
One reason is that it predates monadic parser libraries. Pros of parser generators: + Static analysis of the grammar (shift/reduce conflicts, debuggers, etc) + More succinct definition of grammar / semantic actions (easier to formally reason about the language) + Efficient parser generation + attribute grammars Monadic parsers are not amenable to static analysis in the same ways. Applicative parsers hypothetically do not have this limitation, but none of the applicative parsers provide static analysis. I would expect parsec to be slower than happy (although I haven't benchmarked it). The place where monadic parsers shine is when you don't know the formal grammar or you can't have one. Personally, I prefer parser generators although I think monadic parsers have their time and place. Parsec is a wonderful replacement to writing a recursive descent parser. Lately, I've been modernizing the happy source code and making it so that it better integrates with cabal. I hope this will help improve the image of happy. Jason

On 3 Aug 2013, at 02:20, Jason Dagit wrote:
Hi! Is there any specific reason why GHC is written in a parser GENERATOR (Happy) and not in MONADIC PARSER COMBINATOR (like parsec)?
Is Happy faster / handles better errors / hase some great features or anything else?
One reason is that it predates monadic parser libraries.
I'm not entirely sure this is true. I reckon the development of applicative parser combinators (used in the implementation of the nhc12 compiler, way back in 1995 or so), is roughly contemporaneous with the development of Happy, and its use inside ghc. (I found a release note from Sept 1997 that said ghc had just converted its interface-file parser to use Happy.) Certainly table-driven parsers in non-functional languages go back a lot further, and functional combinator-based parsing was then the relative newcomer. As to why ghc switched to Happy, the literature of the time suggests that generated table-driven parsers were faster than combinator-based parsers. I'm not sure I have ever seen any performance figures to back that up however. And with the general improvement in performance of idiomatic Haskell over the last twenty years, I'd be interested to see a modern comparison. Regards, Malcolm

On Sat, Aug 3, 2013 at 3:36 AM, Malcolm Wallace
On 3 Aug 2013, at 02:20, Jason Dagit wrote:
Hi! Is there any specific reason why GHC is written in a parser GENERATOR (Happy) and not in MONADIC PARSER COMBINATOR (like parsec)?
Is Happy faster / handles better errors / hase some great features or anything else?
One reason is that it predates monadic parser libraries.
I'm not entirely sure this is true. I reckon the development of applicative parser combinators (used in the implementation of the nhc12 compiler, way back in 1995 or so), is roughly contemporaneous with the development of Happy, and its use inside ghc. (I found a release note from Sept 1997 that said ghc had just converted its interface-file parser to use Happy.) Certainly table-driven parsers in non-functional languages go back a lot further, and functional combinator-based parsing was then the relative newcomer.
Interesting. I know the happy source has copyright dates starting in 1991, so I figured it was developed to help write ghc. Perhaps they were using it for source parsing before they adopted it for interface-file parsing in 1997?
As to why ghc switched to Happy, the literature of the time suggests that generated table-driven parsers were faster than combinator-based parsers. I'm not sure I have ever seen any performance figures to back that up however. And with the general improvement in performance of idiomatic Haskell over the last twenty years, I'd be interested to see a modern comparison.
Another con of using parsec that I forgot to mention in my previous email is that with Parsec you need to be explicit about backtracking (use of try). Reasoning about the correct places to put try is not always easy and parsec doesn't help you with the task. In my experience, this is the main bug that people run into when using parsec. Jason

On 3 Aug 2013, at 21:03, Jason Dagit wrote:
Another con of using parsec that I forgot to mention in my previous email is that with Parsec you need to be explicit about backtracking (use of try). Reasoning about the correct places to put try is not always easy and parsec doesn't help you with the task. In my experience, this is the main bug that people run into when using parsec.
Although the original question did not mention parsec explicitly, I find it disappointing that many people immediately think of it as the epitome of monadic combinator parsing. The power of good marketing, eh? There are so many other good parsing libraries out there. Parsec happened to cure some known space-leaks in rival libraries about the time of its release (2000 or so), but the main reason it is popular is simply because it was distributed alongside ghc for a long time. Curiously enough, the complaint you make about parsec is exactly the same as the observation that drove the development of my own set of combinators - polyparse. But the concept of commitment to a partial parse (to prevent backtracking) was already present somewhat in Röjemo's applicative parsers way back in 1994. He had both `ap` and `apCut` (the naming of "cut" borrowed from Prolog I suppose). Space performance and the elimination of space leaks was totally his focus: he mentions being able to recompile the compiler itself in just 3Mb of RAM. Regards, Malcolm

* Malcolm Wallace
On 3 Aug 2013, at 21:03, Jason Dagit wrote:
Another con of using parsec that I forgot to mention in my previous email is that with Parsec you need to be explicit about backtracking (use of try). Reasoning about the correct places to put try is not always easy and parsec doesn't help you with the task. In my experience, this is the main bug that people run into when using parsec.
Although the original question did not mention parsec explicitly, I find it disappointing that many people immediately think of it as the epitome of monadic combinator parsing. The power of good marketing, eh? There are so many other good parsing libraries out there. Parsec happened to cure some known space-leaks in rival libraries about the time of its release (2000 or so), but the main reason it is popular is simply because it was distributed alongside ghc for a long time.
I would also add to this a catchy and well-marketed name, which is often used as a generic name for a parser combinator library. Roman
participants (5)
-
blackbox.dev.ml
-
Brandon Allbery
-
Jason Dagit
-
Malcolm Wallace
-
Roman Cheplyaka