
The grammar is ambiguous regarding the extent of lambda abstractions, let expressions, and conditionals. The ambiguity is resolved by the
Hello Café, With the BlockArguments proposal[1], the formal grammar of Haskell has become much nicer, with all of the "boring" expression constructs (i.e., other than function or operator application) under one non-terminal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0090-bl... Its one flaw, inherited from Haskell2010, is ambiguity. I wonder whether that ambiguity is necessary. Currently, a meta-rule fixes that flaw: meta-rule that each of these constructs extends as far to the right as possible. That is quite informal, so interpreting it takes a bit of thinking. 1. Does the rule make sense? That seems quite likely, but at the same time it seems to depend on some good behavior of the grammar. Like, aren't there grammars where such an "extend far to the right" meta-rule would itself be ambiguous? What's the key criterion for that meta-rule to make sense? 2. How to implement this rule? It might be obvious, with shift-reduce parsing ("shift always"). It also seems feasible with Earley parsing (the only other technique I am familiar with). But, either way, you have to dig into the internals of the chosen parsing method to resolve ambiguities. Instead, what if the grammar were unambiguous to start with? I share an attempt at the bottom of this email. Which brings me to my main two questions: 3. Is this grammar unambiguous? 4. Does it match the meta-rule? I also can't help but think about how it might look like in a hypothetical language report. 5. What are the pros and cons of using a ambiguous or unambiguous grammar for a programming language standard? A plausible starting point is thus: - An ambiguous grammar may be more readable (since there are strictly more choices available), especially if you care more about the AST than the concrete syntax. - An unambiguous grammar is easier to implement: just throw it into an Earley parser. This unambiguous grammar can somewhat easily be "flattened" into the ambiguous one from the report. That challenges the assumption that ambiguous grammars are generally more readable. In fact, the current ambiguous grammar already exposes other concrete syntax concerns (via the distinction between exp, infixexp, fexp, and aexp). In contrast, turning an ambiguous grammar into an unambiguous one takes a bit of effort (when it's possible at all). So if there are ambiguities worth keeping, is there a clear threshold? One other possible issue is that non-ambiguity is a fragile property, so extending the grammar requires extra care. Furthermore, even if the updated grammar is unambiguous, it might not be obvious that it still encodes the intent behind the meta-rule. That meta-rule may still be worth clarifying as a requirement for language extensions at least. Haskell syntax can be pretty, but the process behind it is a wee bit complicated. Regards, Li-yao --- # Unambiguous grammar for expressions (some rules omitted for brevity) # Compared to Haskell2010+BlockArguments, we move the rules for if, lambdas and lets into their own non-terminal rexp, and add four extra rules in exp, ensuring that rexp occurs only to the right end of function/operator applications. # Possibly simplifiable more # * and + represent repetition. # Expressions. exp -> iexp "::" type | iexp # EXTRA RULES with rexp | rexp | iexp qop rexp | iexp qop frexp | frexp # EXTRA SYMBOL rexp # Expressions with right-recursive syntax. rexp -> "if" exp "then" exp "else" exp | "\\" var "->" exp | "let" decls "in" exp # EXTRA (AUXILIARY) SYMBOL frexp # Function application with a trailing block argument. frexp -> aexp aexp* rexp # Infix expression (was infixexp). OLD iexp -> fexp (qop fexp)* # Optional function applications. OLD fexp -> aexp | aexp aexp+ # Atomic expressions. OLD with removed if/lambda/let aexp -> qvar | "[" "]" | "(" ")" | qcon | "(" exp ")" | "do" "{" stmts "}" | "case" exp "of" "{" alts "}"

Am Mi., 16. Juni 2021 um 08:47 Uhr schrieb Li-yao Xia
[...] - An ambiguous grammar may be more readable (since there are strictly more choices available), especially if you care more about the AST than the concrete syntax. [...]
IMHO it is exactly the other way around: I consider ambiguous grammars a serious usability bug. Even if you hack around the ambiguities via meta rules and/or technical tricks like "prefer shift" in (LA)LR parsers, you as a poor human have to do the disambiguation for yourself. This makes it *harder* to read code following the grammar, not easier. The BlockArguments proposal introduced lots of additional ambiguities, and that's the reason why I'm still opposed to it. I'm too lazy to dig out the old discussions, but I gave a few examples where it was *highly* unclear/confusing what the right parse should be with BlockArguments. This has been totally ignored, and that ship has sailed... In general, I consider it a very bad idea to give more and more strings a meaning as a program: Some redundancy, like keywords and/or punctuation, is highly useful as a human reader. Remember the old saying "A good language should not make it easy to write correct programs, it should make it hard to write incorrect ones", and BlockArguments was IMHO a step into the wrong direction.

On Wed, Jun 16, 2021 at 09:48:06AM +0200, Sven Panne wrote:
Am Mi., 16. Juni 2021 um 08:47 Uhr schrieb Li-yao Xia
: [...] - An ambiguous grammar may be more readable (since there are strictly more choices available), especially if you care more about the AST than the concrete syntax. [...]
IMHO it is exactly the other way around: I consider ambiguous grammars a serious usability bug. [...] The BlockArguments proposal introduced lots of additional ambiguities, and that's the reason why I'm still opposed to it.
[...]
In general, I consider it a very bad idea to give more and more strings a meaning as a program: Some redundancy, like keywords and/or punctuation, is highly useful as a human reader. Remember the old saying "A good language should not make it easy to write correct programs, it should make it hard to write incorrect ones", and BlockArguments was IMHO a step into the wrong direction.
Entirely agreed. The additional difficulty in parsing means that BlockArguments provides *negative* value to me, even if I don't enable it: http://h2.jaguarpaw.co.uk/posts/unhelpful-ghc-error-messages/ Tom

On 6/16/2021 5:00 AM, Tom Ellis wrote:
[...]
Entirely agreed. The additional difficulty in parsing means that BlockArguments provides *negative* value to me, even if I don't enable it:
http://h2.jaguarpaw.co.uk/posts/unhelpful-ghc-error-messages/
Ouch, that error message is terrible. But isn't that an argument against BlockArguments more than against ambiguity? By that I mean that the current ambiguities for function applications were already present in a similar form for infix expressions in Haskell2010 (replacing a space with an operator). In a parallel world, the Haskell2010 grammar could be unambiguous, BlockArguments would keep the extended grammar unambiguous. But since the intended meaning is the same, GHC would be the same as in this world, and you would still be having the same issues with BlockArguments.
On Wed, Jun 16, 2021 at 09:48:06AM +0200, Sven Panne wrote:
Am Mi., 16. Juni 2021 um 08:47 Uhr schrieb Li-yao Xia
: [...] - An ambiguous grammar may be more readable (since there are strictly more choices available), especially if you care more about the AST than the concrete syntax. [...]
IMHO it is exactly the other way around: I consider ambiguous grammars a serious usability bug. [...] The BlockArguments proposal introduced lots of additional ambiguities, and that's the reason why I'm still opposed to it.
[...]
In general, I consider it a very bad idea to give more and more strings a meaning as a program: Some redundancy, like keywords and/or punctuation, is highly useful as a human reader. Remember the old saying "A good language should not make it easy to write correct programs, it should make it hard to write incorrect ones", and BlockArguments was IMHO a step into the wrong direction.
Tom _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Entirely agreed. The additional difficulty in parsing means that BlockArguments provides *negative* value to me, even if I don't enable it:
http://h2.jaguarpaw.co.uk/posts/unhelpful-ghc-error-messages/ I agree. I always use curly braces with "do" and "case", i. e.: do { ...; ...; }
== Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

On 16 Jun 2021, at 10:38 am, Askar Safin via Haskell-Cafe
wrote: I agree. I always use curly braces with "do" and "case", i. e.: do { ...; ...; }
I, on the other hand find, e.g., BlockArguments and ApplicativeDo to be useful decluttering extensions. These are not without their "pockets of support". Perhaps some of the error messages could be less optimistic in suggesting "BlockArguments", the user who meant to use them would typically deduce the forgotten pragma even without an explicit hint when the hint degrades error quality for others. The specific error message choices are not inherent to the extension. -- Viktor.

Am Mi., 16. Juni 2021 um 17:06 Uhr schrieb Viktor Dukhovni < ietf-dane@dukhovni.org>:
I, on the other hand find, e.g., BlockArguments and ApplicativeDo to be useful decluttering extensions. [...]
Some people call this "decluttering", other people call this "obfuscating" and/or "reducing readability". ;-) Haskell already had a very lightweight syntax right from the beginning, making it even lighter transforms it slowly into https://en.wikipedia.org/wiki/Whitespace_(programming_language). ..

3. Is this grammar unambiguous? It is unambiguous.
I converted in into happy grammar ( https://www.haskell.org/happy/ ). Here is result: https://paste.debian.net/1201427/ . Then I run "happy" binary on it and got no output. This means that "happy" was able to check that this CFG is element of particular class of grammars (as well as I understand this is LALR(1) class), and thus is unambiguous. But this result is useless, because I simply assumed that symbols "type", "qvar" and others are unique terminals (I listed them as tokens in happy grammar). This is not true. For example, string "a" can be "type" and in the same time "qvar". So I suggest writing proper happy grammar with unique terminals. (Also, I converted your grammar to happy's manually, it is possible I did some mistake.) Some additional notes (with a bit of advertisement). Checking whether given CFG is ambiguous is in general impossible to do on Turing machine, this is theorem. So none of existing tools will give you answer for any grammar. Happy checks whether a grammar is element of particular class (as well as I understand, LALR(1)). If it is, happy prints nothing, this means, this is LALR(1) grammar, and thus unambiguous. If not, happy reports conflicts. But this doesn't mean the grammar is ambiguous. It can be ambiguous or unambiguous. My package http://hackage.haskell.org/package/check-cfg-ambiguity can check grammar, too. It reports given CFG either as certainly ambiguous or as possibly unambiguous. Still I was unable to find grammar, arising from practice, which can fool my package. Setting count of steps as 20 will give you very low probability of mistake. (Again: "probability of mistake" means probability of answer "it is unambiguous" being false, answer "it is ambiguous" is always certainly true.) Also you can try this package: https://www.brics.dk/grammar/ for checking ambiguity. I spent lot of time learning various methods of checking ambiguity, feel free to ask me questions. Add me to CC in case i will unsubscribe from this list. I like your desire to give proper grammar for Haskell. What you want? Fixing Haskell Report? Well, this would be very good, unfortunately Haskell Report process is staled ( https://reasonablypolymorphic.com/blog/haskell202x/ ). Of course, I hope it will resurrect. Also, I agree that BlockArguments was mistake. Also, grammar given in Haskell Report 2010 and grammar from ghc source tree are 2 completely different things, see here: https://mail.haskell.org/pipermail/haskell-cafe/2021-April/133847.html == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

Thanks for your great answer, Askar. Using happy to optimistically test for ambiguity is a great idea, and I'll be sure to check your other references.
I like your desire to give proper grammar for Haskell. What you want? Fixing Haskell Report? Well, this would be very good, unfortunately Haskell Report process is staled ( https://reasonablypolymorphic.com/blog/haskell202x/ ). Of course, I hope it will resurrect.
Right now I'm trying to understand Haskell's concrete syntax enough to 1. Teach it (with "better parser error messages" as a potential subgoal) 2. Implement it (as a concrete lower bound for the abstract goal of "completely understand it") And right now it seems reading the Haskell2010 report and also GHC/Parser.y is not enough. One also has to think hard on top of that. This can usually be fixed by producing a better reference document. Li-yao On 6/16/2021 10:22 AM, Askar Safin wrote:
3. Is this grammar unambiguous? It is unambiguous.
I converted in into happy grammar ( https://www.haskell.org/happy/ ). Here is result: https://paste.debian.net/1201427/ . Then I run "happy" binary on it and got no output. This means that "happy" was able to check that this CFG is element of particular class of grammars (as well as I understand this is LALR(1) class), and thus is unambiguous.
But this result is useless, because I simply assumed that symbols "type", "qvar" and others are unique terminals (I listed them as tokens in happy grammar). This is not true. For example, string "a" can be "type" and in the same time "qvar". So I suggest writing proper happy grammar with unique terminals.
(Also, I converted your grammar to happy's manually, it is possible I did some mistake.)
Some additional notes (with a bit of advertisement).
Checking whether given CFG is ambiguous is in general impossible to do on Turing machine, this is theorem. So none of existing tools will give you answer for any grammar.
Happy checks whether a grammar is element of particular class (as well as I understand, LALR(1)). If it is, happy prints nothing, this means, this is LALR(1) grammar, and thus unambiguous. If not, happy reports conflicts. But this doesn't mean the grammar is ambiguous. It can be ambiguous or unambiguous.
My package http://hackage.haskell.org/package/check-cfg-ambiguity can check grammar, too. It reports given CFG either as certainly ambiguous or as possibly unambiguous. Still I was unable to find grammar, arising from practice, which can fool my package. Setting count of steps as 20 will give you very low probability of mistake. (Again: "probability of mistake" means probability of answer "it is unambiguous" being false, answer "it is ambiguous" is always certainly true.)
Also you can try this package: https://www.brics.dk/grammar/ for checking ambiguity.
I spent lot of time learning various methods of checking ambiguity, feel free to ask me questions. Add me to CC in case i will unsubscribe from this list.
I like your desire to give proper grammar for Haskell. What you want? Fixing Haskell Report? Well, this would be very good, unfortunately Haskell Report process is staled ( https://reasonablypolymorphic.com/blog/haskell202x/ ). Of course, I hope it will resurrect.
Also, I agree that BlockArguments was mistake.
Also, grammar given in Haskell Report 2010 and grammar from ghc source tree are 2 completely different things, see here: https://mail.haskell.org/pipermail/haskell-cafe/2021-April/133847.html
== Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar
participants (5)
-
Askar Safin
-
Li-yao Xia
-
Sven Panne
-
Tom Ellis
-
Viktor Dukhovni