
Hi, I've been looking at the Haskell parser in ghc/compiler/GHC/Parser.y. It relies on post-processing pretty heavily, both to determine the type of parsed expressions (i.e. is "(x,y)" a pattern or expression?) and to reject invalid syntax (i.e. field declarations are parsed as a type, but this is rejected during postprocessing I think, except in constructor declarations). This makes the grammar rather hard to read. To quote [1]:
Insteadof describingthe languageto be parsed,thegrammardescribes theprocessused to parseit; it'smore like a hand-crafted parsing program,butcrammed into Backus-Naur Form. Does anybody know if there is another version of the parser generator uses grammar rules that are closer to the grammar rules in the 2010 report? Maybe a GLR grammar?
Also, I see that Happy is able to generate GLR parsers. I'm curious if GLR parsers aren't being used just because they are slow, or if there is some other reason they are hard to use. I am not an expert on parsing or grammars, so any insight would be appreciated. Thanks! -BenRI [1] https://escholarship.org/uc/item/8559j464

Hi! I think the updated tree sitter grammar might be relevant to you - https://github.com/tree-sitter/tree-sitter-haskell Cheers, ====== Georgi

Thanks, that is interesting. I see this is a different type of grammar than in Parser.y, in that (for example) it does not have any actions on the rules. I'm still curious about why the GHC parser does not use a grammar that is closer to the language grammar. Is this mostly for historical reasons? Are GLR parsers too slow? I don't know what fraction of the compilation time is spent in parsing, but would suspect it is not that much. -BenRI On 4/15/21 11:55 AM, Georgi Lyubenov wrote:
Hi!
I think the updated tree sitter grammar might be relevant to you -� https://github.com/tree-sitter/tree-sitter-haskell https://github.com/tree-sitter/tree-sitter-haskell
Cheers,
====== Georgi

GLR support in happy is much younger than ghc is, it'd probably be a massive amount of work to change now. On Tue, Apr 20, 2021 at 6:57 AM Benjamin Redelings < benjamin.redelings@gmail.com> wrote:
Thanks, that is interesting. I see this is a different type of grammar than in Parser.y, in that (for example) it does not have any actions on the rules.
I'm still curious about why the GHC parser does not use a grammar that is closer to the language grammar. Is this mostly for historical reasons? Are GLR parsers too slow? I don't know what fraction of the compilation time is spent in parsing, but would suspect it is not that much.
-BenRI
On 4/15/21 11:55 AM, Georgi Lyubenov wrote:
Hi!
I think the updated tree sitter grammar might be relevant to you -Â https://github.com/tree-sitter/tree-sitter-haskell https://github.com/tree-sitter/tree-sitter-haskell
Cheers,
====== Georgi
Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

Am Di., 20. Apr. 2021 um 12:57 Uhr schrieb Benjamin Redelings < benjamin.redelings@gmail.com>:
[...] I'm still curious about why the GHC parser does not use a grammar that is closer to the language grammar. Is this mostly for historical reasons? Are GLR parsers too slow? I don't know what fraction of the compilation time is spent in parsing, but would suspect it is not that much.
I think there are various aspects: * Tool support (i.e. historic reasons): Happy is a LALR parser generator. * Quality of error reporting: I don't have a clue how good GLR parsers are in this area. Often more powerful parsing methods have horrible reporting, at least that's my impression. * Does the GLR parser generator detect ambiguities and report them to the grammar writer? To me, ambiguities (like the shift/reduce & reduce/reduce conflicts in LALR parsing) are a big red flag and a sign of a questionable language/grammar: Even if the generator/parser is able to figure things out correctly, humans have a much harder time. Cheers, S.

Am 20.04.21 um 13:12 schrieb Sven Panne:
* Does the GLR parser generator detect ambiguities and report them to the grammar writer?
That would be a solution to the decision problem. It could try heuristics though, but I have no idea whether this particular parser does.
To me, ambiguities (like the shift/reduce &
reduce/reduce conflicts in LALR parsing) are a big red flag and a sign of a questionable language/grammar: Even if the generator/parser is able to figure things out correctly, humans have a much harder time.
Some conflicts indicate language design problems, others do not. If you are after an algorithm that has less unnecessary conflicts, try ILALR (Sönke Kannapinn's PdD thesis - well-written and easy to understand if you can deal with German; see http://webdoc.sub.gwdg.de/ebook/diss/2003/tu-berlin/diss/2001/kannapinn_soen... ; my own analysis, years ago, concluded that ILALR should have its conflicts exactly where a human would have difficulties parsing, but I never progressed to a proof of concept). Note that ILALR does does not help with conflict resolution; a user would likely have less unexpected conflicts, but the remaining ones wouldn't be easier to solve than with LALR. But I may be wrong. As I said, I never used anything like that in anger. Regards, Jo
participants (5)
-
Benjamin Redelings
-
Brandon Allbery
-
Georgi Lyubenov
-
Joachim Durchholz
-
Sven Panne