
Hello devs, I am trying to modify GHC's parser to allow the following syntax in types: a -> @m b but my naive attempt was unsuccessful: type :: { LHsType GhcPs } : btype { $1 } | btype '->' PREFIX_AT btype ctype ... For example when I try to parse the following code (and turn on the lexer debug log): test :: a -> @m b test = undefined I get the following token: ITvarid "test" token: ITdcolon NormalSyntax token: ITvarid "a" token: ITrarrow NormalSyntax token: ITtypeApp token: ITvarid "m" token: ITvarid "b" token: ITsemi Parse.hs:2:1: error: parse error (possibly incorrect indentation or mismatched brackets) | 2 | test = undefined I don't have much experience with hacking on the parser so I'm likely missing something obvious. Could someone please point at what I might be doing wrong? Thanks in advance. Cheers, Csongor

Hi Csongor, I believe the reason for this failure is that a -> @m b gets parsed as a -> @(m b). Why is that? Because a ‘btype’ includes type-level application. If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it. - Vlad
On 29 Aug 2020, at 01:32, Csongor Kiss
wrote: Hello devs,
I am trying to modify GHC's parser to allow the following syntax in types:
a -> @m b
but my naive attempt was unsuccessful:
type :: { LHsType GhcPs } : btype { $1 } | btype '->' PREFIX_AT btype ctype ...
For example when I try to parse the following code (and turn on the lexer debug log):
test :: a -> @m b test = undefined
I get the following
token: ITvarid "test" token: ITdcolon NormalSyntax token: ITvarid "a" token: ITrarrow NormalSyntax token: ITtypeApp token: ITvarid "m" token: ITvarid "b" token: ITsemi
Parse.hs:2:1: error: parse error (possibly incorrect indentation or mismatched brackets) | 2 | test = undefined
I don't have much experience with hacking on the parser so I'm likely missing something obvious. Could someone please point at what I might be doing wrong?
Thanks in advance.
Cheers, Csongor _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov
Hi Csongor,
I believe the reason for this failure is that a -> @m b gets parsed as a -> @(m b). Why is that? Because a ‘btype’ includes type-level application.
If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.
I confirm that this is correct and with that change the example string reduces as hoped. - Vlad
-- Shayne Fletcher

On Fri, Aug 28, 2020 at 7:48 PM Shayne Fletcher < shayne.fletcher.50@gmail.com> wrote:
On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov
wrote: Hi Csongor,
I believe the reason for this failure is that a -> @m b gets parsed as a -> @(m b). Why is that? Because a ‘btype’ includes type-level application.
If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.
I confirm that this is correct and with that change the example string reduces as hoped.
- Vlad
Also, with that correction there are no new shift/reduce conflicts. The original rule gave rise to 3. -- Shayne Fletcher

Thanks a lot Vlad and Shayne, that indeed did the trick! Out of curiosity, how could I have figured out that this was the culprit? The parse error I got was a bit puzzling, and I couldn't find any flags that would give more information (I think I was looking for the parser equivalent of -ddump-tc-trace). Best, Csongor
On 29 Aug 2020, at 00:51, Shayne Fletcher
wrote: On Fri, Aug 28, 2020 at 7:48 PM Shayne Fletcher
mailto:shayne.fletcher.50@gmail.com> wrote: On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov
mailto:vladislav@serokell.io> wrote: Hi Csongor, I believe the reason for this failure is that a -> @m b gets parsed as a -> @(m b). Why is that? Because a ‘btype’ includes type-level application.
If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.
I confirm that this is correct and with that change the example string reduces as hoped.
- Vlad
Also, with that correction there are no new shift/reduce conflicts. The original rule gave rise to 3.
-- Shayne Fletcher

The lexer produces only as many tokens as the parser requires. In the ‘lexerDbg’ dump that you included in the message, there were these lines: token: ITvarid "m" token: ITvarid "b" token: ITsemi So I knew that the parser consumed the entire string, up to the virtual semicolon. I also recognized the parse error as the one produced by ‘happy’ rather than by a later validation pass. So even though the parser consumed the entire string, it failed. Therefore, it didn’t expect this string to end so abruptly, it expected it to continue. But what did it expect to find? To figure it out, we need to know which grammar production is involved in the failure. The only grammar production that could’ve consumed the ‘->’ PREFIX_AT sequence successfully and proceed to process the rest of the string is this one: btype '->' PREFIX_AT btype ctype By inspecting the definitions of ‘btype’ and ‘ctype’, one can see that neither of those accept the empty string, and both of those accept type-level function application. Thus it’s possible that ‘btype’ consumed “m b” as an application, and ‘ctype’ failed because it didn’t accept the remaining empty string: btype = “m b” ctype = parse error (nothing to consume) But what you wanted instead was: btype = “m” ctype = “b” The solution is to use ‘atype’ instead of ‘btype’, as ‘atype’ does not accept type-level application. By the way, there’s also the input string on which the original grammar would’ve succeeded (or at least I think so): test :: a -> @m forall b. b test = undefined That’s because ‘btype’ wouldn’t have consumed the ‘forall’, it would’ve stopped at this point. And then ‘ctype’ could’ve consumed “forall b. b”. I don’t think there’s a parser equivalent of -ddump-tc-trace. You’ll need to figure this stuff out by reading the grammar and keeping in mind that ‘happy’ generates a shift-reduce parser that does not backtrack. The ‘lexerDbg’ output is useful to see how far the parser got. And there’s also this command in case you want to go low level and inspect the state machine generated by ‘happy’: happy -agc --strict compiler/GHC/Parser.y -idetailed-info Hope this helps, - Vlad
On 29 Aug 2020, at 10:16, Csongor Kiss
wrote: Thanks a lot Vlad and Shayne, that indeed did the trick!
Out of curiosity, how could I have figured out that this was the culprit? The parse error I got was a bit puzzling, and I couldn't find any flags that would give more information (I think I was looking for the parser equivalent of -ddump-tc-trace).
Best, Csongor

Thanks once again for your detailed analysis, it's really helpful. Especially the forall example - I did try constructing some string that would parse, but I was shooting in the dark. Best, Csongor
On 29 Aug 2020, at 09:05, Vladislav Zavialov
wrote: The lexer produces only as many tokens as the parser requires. In the ‘lexerDbg’ dump that you included in the message, there were these lines:
token: ITvarid "m" token: ITvarid "b" token: ITsemi
So I knew that the parser consumed the entire string, up to the virtual semicolon. I also recognized the parse error as the one produced by ‘happy’ rather than by a later validation pass. So even though the parser consumed the entire string, it failed. Therefore, it didn’t expect this string to end so abruptly, it expected it to continue.
But what did it expect to find? To figure it out, we need to know which grammar production is involved in the failure. The only grammar production that could’ve consumed the ‘->’ PREFIX_AT sequence successfully and proceed to process the rest of the string is this one:
btype '->' PREFIX_AT btype ctype
By inspecting the definitions of ‘btype’ and ‘ctype’, one can see that neither of those accept the empty string, and both of those accept type-level function application. Thus it’s possible that ‘btype’ consumed “m b” as an application, and ‘ctype’ failed because it didn’t accept the remaining empty string:
btype = “m b” ctype = parse error (nothing to consume)
But what you wanted instead was:
btype = “m” ctype = “b”
The solution is to use ‘atype’ instead of ‘btype’, as ‘atype’ does not accept type-level application.
By the way, there’s also the input string on which the original grammar would’ve succeeded (or at least I think so):
test :: a -> @m forall b. b test = undefined
That’s because ‘btype’ wouldn’t have consumed the ‘forall’, it would’ve stopped at this point. And then ‘ctype’ could’ve consumed “forall b. b”.
I don’t think there’s a parser equivalent of -ddump-tc-trace. You’ll need to figure this stuff out by reading the grammar and keeping in mind that ‘happy’ generates a shift-reduce parser that does not backtrack. The ‘lexerDbg’ output is useful to see how far the parser got. And there’s also this command in case you want to go low level and inspect the state machine generated by ‘happy’:
happy -agc --strict compiler/GHC/Parser.y -idetailed-info
Hope this helps, - Vlad
On 29 Aug 2020, at 10:16, Csongor Kiss
wrote: Thanks a lot Vlad and Shayne, that indeed did the trick!
Out of curiosity, how could I have figured out that this was the culprit? The parse error I got was a bit puzzling, and I couldn't find any flags that would give more information (I think I was looking for the parser equivalent of -ddump-tc-trace).
Best, Csongor

Another way to figure it out is the shift/reduce conflict on @, which tells
you it had two ways to recognize it. "Reduce" here means returning to your
parser rule, so "shift" means btype wanted to recognize the @. Inspecting
btype would then have shown that it was looking for a type application.
On Sat, Aug 29, 2020, 03:17 Csongor Kiss
Thanks a lot Vlad and Shayne, that indeed did the trick!
Out of curiosity, how could I have figured out that this was the culprit? The parse error I got was a bit puzzling, and I couldn't find any flags that would give more information (I think I was looking for the parser equivalent of -ddump-tc-trace).
Best, Csongor
On 29 Aug 2020, at 00:51, Shayne Fletcher
wrote: On Fri, Aug 28, 2020 at 7:48 PM Shayne Fletcher < shayne.fletcher.50@gmail.com> wrote:
On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov
wrote: Hi Csongor,
I believe the reason for this failure is that a -> @m b gets parsed as a -> @(m b). Why is that? Because a ‘btype’ includes type-level application.
If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.
I confirm that this is correct and with that change the example string reduces as hoped.
- Vlad
Also, with that correction there are no new shift/reduce conflicts. The original rule gave rise to 3.
-- Shayne Fletcher
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi Brandon, I’m afraid your analysis is not entirely correct. The shift/reduce conflict is not on @ but after it. - Vlad
On 29 Aug 2020, at 14:42, Brandon Allbery
wrote: Another way to figure it out is the shift/reduce conflict on @, which tells you it had two ways to recognize it. "Reduce" here means returning to your parser rule, so "shift" means btype wanted to recognize the @. Inspecting btype would then have shown that it was looking for a type application.
participants (4)
-
Brandon Allbery
-
Csongor Kiss
-
Shayne Fletcher
-
Vladislav Zavialov