Lexing / Parsing and final token

I am (still) working on !2418 to bring the API Annotations into the GHC ParsedSource, and making good progress. I am currently making a rough port of ghc-exactprint, to ensure I can get all the tests around modifying the AST to work. One of the last pieces is being able to capture the spacing from the last token in the file to the EOF. I guess technically it is the second last token. Empirically (calling getTokenStream), it seems this is always ITsemi. I am not sure how this comes about, as the `module` parsing rule in Parser.y ends with body or body2, and those both finish with an actual or virtual '}'. Can I rely on the token before ITEof always being ITsemi? Alan

That's bizarre. Does it still happen with explicit braces? Just to test, I tried module Bug where { x = 5; y = 6; }; and GHC rejected because of the trailing ;. Richard
On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman
wrote: I am (still) working on !2418 to bring the API Annotations into the GHC ParsedSource, and making good progress.
I am currently making a rough port of ghc-exactprint, to ensure I can get all the tests around modifying the AST to work.
One of the last pieces is being able to capture the spacing from the last token in the file to the EOF. I guess technically it is the second last token.
Empirically (calling getTokenStream), it seems this is always ITsemi. I am not sure how this comes about, as the `module` parsing rule in Parser.y ends with body or body2, and those both finish with an actual or virtual '}'.
Can I rely on the token before ITEof always being ITsemi?
Alan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Changing it to remove the final ';' gives a last token of ITccurly.
Changing it to
module Bug where
x = 5
y = 6
Gives a last token of ITsemi.
Alan
On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg
That's bizarre. Does it still happen with explicit braces?
Just to test, I tried
module Bug where { x = 5; y = 6; };
and GHC rejected because of the trailing ;.
Richard
On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman
wrote: I am (still) working on !2418 to bring the API Annotations into the GHC ParsedSource, and making good progress.
I am currently making a rough port of ghc-exactprint, to ensure I can get all the tests around modifying the AST to work.
One of the last pieces is being able to capture the spacing from the last token in the file to the EOF. I guess technically it is the second last token.
Empirically (calling getTokenStream), it seems this is always ITsemi. I am not sure how this comes about, as the `module` parsing rule in Parser.y ends with body or body2, and those both finish with an actual or virtual '}'.
Can I rely on the token before ITEof always being ITsemi?
Alan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

So, I think there's your answer: the last token might be ITccurly, not ITsemi. It seems that the "insert invisible curlies and semis" is taken more literally for semis than for curlies. Richard
On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman
wrote: Changing it to remove the final ';' gives a last token of ITccurly.
Changing it to
module Bug where x = 5 y = 6
Gives a last token of ITsemi.
Alan
On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg
mailto:rae@richarde.dev> wrote: That's bizarre. Does it still happen with explicit braces? Just to test, I tried
module Bug where { x = 5; y = 6; };
and GHC rejected because of the trailing ;.
Richard
On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman
mailto:alan.zimm@gmail.com> wrote: I am (still) working on !2418 to bring the API Annotations into the GHC ParsedSource, and making good progress.
I am currently making a rough port of ghc-exactprint, to ensure I can get all the tests around modifying the AST to work.
One of the last pieces is being able to capture the spacing from the last token in the file to the EOF. I guess technically it is the second last token.
Empirically (calling getTokenStream), it seems this is always ITsemi. I am not sure how this comes about, as the `module` parsing rule in Parser.y ends with body or body2, and those both finish with an actual or virtual '}'.
Can I rely on the token before ITEof always being ITsemi?
Alan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

And if there is a comment after the '}' and then more blank lines, the last
token is a comment.
If no curlies, it is a ITsemi for the last location, after the comment.
So my hacky scheme of using ITsemi as the means to track the last gap is
not viable.
And I don't want to put extra housekeeping on every token to track two
tokens back, not just one. Back to the drawing board.
Thanks
Alan
On Tue, 19 Jan 2021 at 21:59, Richard Eisenberg
So, I think there's your answer: the last token might be ITccurly, not ITsemi. It seems that the "insert invisible curlies and semis" is taken more literally for semis than for curlies.
Richard
On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman
wrote: Changing it to remove the final ';' gives a last token of ITccurly.
Changing it to
module Bug where x = 5 y = 6
Gives a last token of ITsemi.
Alan
On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg
wrote: That's bizarre. Does it still happen with explicit braces?
Just to test, I tried
module Bug where { x = 5; y = 6; };
and GHC rejected because of the trailing ;.
Richard
On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman
wrote: I am (still) working on !2418 to bring the API Annotations into the GHC ParsedSource, and making good progress.
I am currently making a rough port of ghc-exactprint, to ensure I can get all the tests around modifying the AST to work.
One of the last pieces is being able to capture the spacing from the last token in the file to the EOF. I guess technically it is the second last token.
Empirically (calling getTokenStream), it seems this is always ITsemi. I am not sure how this comes about, as the `module` parsing rule in Parser.y ends with body or body2, and those both finish with an actual or virtual '}'.
Can I rely on the token before ITEof always being ITsemi?
Alan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

FYI I did the horrible thing for now, optimisations welcome.
The change is at [1]
Alan
[1]
https://gitlab.haskell.org/ghc/ghc/-/commit/742273a94c187f51e3b143f9c206c420...
On Tue, 19 Jan 2021 at 22:04, Alan & Kim Zimmerman
And if there is a comment after the '}' and then more blank lines, the last token is a comment.
If no curlies, it is a ITsemi for the last location, after the comment.
So my hacky scheme of using ITsemi as the means to track the last gap is not viable.
And I don't want to put extra housekeeping on every token to track two tokens back, not just one. Back to the drawing board.
Thanks Alan
On Tue, 19 Jan 2021 at 21:59, Richard Eisenberg
wrote: So, I think there's your answer: the last token might be ITccurly, not ITsemi. It seems that the "insert invisible curlies and semis" is taken more literally for semis than for curlies.
Richard
On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman
wrote: Changing it to remove the final ';' gives a last token of ITccurly.
Changing it to
module Bug where x = 5 y = 6
Gives a last token of ITsemi.
Alan
On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg
wrote: That's bizarre. Does it still happen with explicit braces?
Just to test, I tried
module Bug where { x = 5; y = 6; };
and GHC rejected because of the trailing ;.
Richard
On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman
wrote: I am (still) working on !2418 to bring the API Annotations into the GHC ParsedSource, and making good progress.
I am currently making a rough port of ghc-exactprint, to ensure I can get all the tests around modifying the AST to work.
One of the last pieces is being able to capture the spacing from the last token in the file to the EOF. I guess technically it is the second last token.
Empirically (calling getTokenStream), it seems this is always ITsemi. I am not sure how this comes about, as the `module` parsing rule in Parser.y ends with body or body2, and those both finish with an actual or virtual '}'.
Can I rely on the token before ITEof always being ITsemi?
Alan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (2)
-
Alan & Kim Zimmerman
-
Richard Eisenberg