
The grammar is ambiguous regarding the extent of lambda abstractions, let expressions, and conditionals. The ambiguity is resolved by the
Hello Café, With the BlockArguments proposal[1], the formal grammar of Haskell has become much nicer, with all of the "boring" expression constructs (i.e., other than function or operator application) under one non-terminal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0090-bl... Its one flaw, inherited from Haskell2010, is ambiguity. I wonder whether that ambiguity is necessary. Currently, a meta-rule fixes that flaw: meta-rule that each of these constructs extends as far to the right as possible. That is quite informal, so interpreting it takes a bit of thinking. 1. Does the rule make sense? That seems quite likely, but at the same time it seems to depend on some good behavior of the grammar. Like, aren't there grammars where such an "extend far to the right" meta-rule would itself be ambiguous? What's the key criterion for that meta-rule to make sense? 2. How to implement this rule? It might be obvious, with shift-reduce parsing ("shift always"). It also seems feasible with Earley parsing (the only other technique I am familiar with). But, either way, you have to dig into the internals of the chosen parsing method to resolve ambiguities. Instead, what if the grammar were unambiguous to start with? I share an attempt at the bottom of this email. Which brings me to my main two questions: 3. Is this grammar unambiguous? 4. Does it match the meta-rule? I also can't help but think about how it might look like in a hypothetical language report. 5. What are the pros and cons of using a ambiguous or unambiguous grammar for a programming language standard? A plausible starting point is thus: - An ambiguous grammar may be more readable (since there are strictly more choices available), especially if you care more about the AST than the concrete syntax. - An unambiguous grammar is easier to implement: just throw it into an Earley parser. This unambiguous grammar can somewhat easily be "flattened" into the ambiguous one from the report. That challenges the assumption that ambiguous grammars are generally more readable. In fact, the current ambiguous grammar already exposes other concrete syntax concerns (via the distinction between exp, infixexp, fexp, and aexp). In contrast, turning an ambiguous grammar into an unambiguous one takes a bit of effort (when it's possible at all). So if there are ambiguities worth keeping, is there a clear threshold? One other possible issue is that non-ambiguity is a fragile property, so extending the grammar requires extra care. Furthermore, even if the updated grammar is unambiguous, it might not be obvious that it still encodes the intent behind the meta-rule. That meta-rule may still be worth clarifying as a requirement for language extensions at least. Haskell syntax can be pretty, but the process behind it is a wee bit complicated. Regards, Li-yao --- # Unambiguous grammar for expressions (some rules omitted for brevity) # Compared to Haskell2010+BlockArguments, we move the rules for if, lambdas and lets into their own non-terminal rexp, and add four extra rules in exp, ensuring that rexp occurs only to the right end of function/operator applications. # Possibly simplifiable more # * and + represent repetition. # Expressions. exp -> iexp "::" type | iexp # EXTRA RULES with rexp | rexp | iexp qop rexp | iexp qop frexp | frexp # EXTRA SYMBOL rexp # Expressions with right-recursive syntax. rexp -> "if" exp "then" exp "else" exp | "\\" var "->" exp | "let" decls "in" exp # EXTRA (AUXILIARY) SYMBOL frexp # Function application with a trailing block argument. frexp -> aexp aexp* rexp # Infix expression (was infixexp). OLD iexp -> fexp (qop fexp)* # Optional function applications. OLD fexp -> aexp | aexp aexp+ # Atomic expressions. OLD with removed if/lambda/let aexp -> qvar | "[" "]" | "(" ")" | qcon | "(" exp ")" | "do" "{" stmts "}" | "case" exp "of" "{" alts "}"