How to use Lexer.lexer to produce closing braces as well?

Hi, I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib package. Given module HelloWorld where main = putStrLn "Hello World!\n" it produces stack exec -- lexer-exe ./examples/HelloWorld.hs Lexing&Parsing: ./examples/HelloWorld.hs module at (1, 1): module CONID at (1, 8): CONID where at (1, 19): where vocurly at (3, 1): vocurly <==== { is inserted automatically!! VARID at (3, 1): VARID = at (3, 6): = VARID at (3, 8): VARID STRING at (3, 17): STRING ; at (4, 1): ; By the example above, the lexer automatically inserts an opening brace (i.e. vocurly) right after 'where'. But it does not insert a matching closing brace (i.e., vccurly), which would lead to a failure in parsing a list of tokens produced by the lexer. My question is how to use the GHC lexer to produce closing braces as well. All my code is available - https://github.com/kwanghoon/hslexer To save your time, the relevant part of the code is as follows: In app/HaskellLexer.hs, singleHaskellToken :: P (Located Token) singleHaskellToken = Lexer.lexer False (\locatedToken -> P (\pstate -> POk pstate locatedToken)) tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token]) tokInfos s = do locatedToken <- singleHaskellToken case locatedToken of L srcspan ITeof -> let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in return (end_line, end_col, s) L srcspan tok -> let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in tokInfos (Terminal (fromToken tok) start_line start_col (Just tok) : s) Thanks in advance Best regards, Kwanghoon

Unfortunately, the current parsing rules for Haskell aren't fully
phase-separable like this.
If you look at the rules for Layout token insertion in the Haskell report
the 9th rule requires that in the event the parser encounters a parse
error it should insert a virtual close brace and continue on!
Otherwise you couldn't parse things like *let **{** foo = bar **}** in
baz *where
the {}'s are virtual without reframing *let* and *in* as a different kind
of paired opening and closing brace or using other hacks in the grammar. It
is quite difficult to hack around all the ways parses can go wrong.
The main downside this has from a language standpoint is you simply can't
properly lex Haskell without more or less fully parsing Haskell.
-Edward
On Wed, Aug 18, 2021 at 7:22 AM Kwanghoon Choi
Hi,
I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib package.
Given
module HelloWorld where
main = putStrLn "Hello World!\n"
it produces
stack exec -- lexer-exe ./examples/HelloWorld.hs Lexing&Parsing: ./examples/HelloWorld.hs module at (1, 1): module CONID at (1, 8): CONID where at (1, 19): where vocurly at (3, 1): vocurly <==== { is inserted automatically!! VARID at (3, 1): VARID = at (3, 6): = VARID at (3, 8): VARID STRING at (3, 17): STRING ; at (4, 1): ;
By the example above, the lexer automatically inserts an opening brace (i.e. vocurly) right after 'where'. But it does not insert a matching closing brace (i.e., vccurly), which would lead to a failure in parsing a list of tokens produced by the lexer.
My question is how to use the GHC lexer to produce closing braces as well.
All my code is available - https://github.com/kwanghoon/hslexer
To save your time, the relevant part of the code is as follows:
In app/HaskellLexer.hs,
singleHaskellToken :: P (Located Token) singleHaskellToken = Lexer.lexer False (\locatedToken -> P (\pstate -> POk pstate locatedToken))
tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token]) tokInfos s = do locatedToken <- singleHaskellToken case locatedToken of L srcspan ITeof -> let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in return (end_line, end_col, s)
L srcspan tok -> let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in tokInfos (Terminal (fromToken tok) start_line start_col (Just tok) : s)
Thanks in advance
Best regards,
Kwanghoon
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Thank you Edward for your reply.
It took some time for me to understand a formal description (translation
function L) of
the Haskell layout rule in Section 10.3 of Haskell 2010 language report. :)
I myself tried to write an example with the Haskell layout rule for me to
understand it.
I am sharing this for those who are curious about the Haskell layout rules.
- http://lazyswamp.blogspot.com/2021/10/on-haskell-layout-rule.html
Kwanghoon
PS.
To be honest, however, I still don't know how Haskell parser (Parser.y) and
lexer (Lexer.x) interact with each other
to implement the following rule on the insertion of a closing brace on a
parse error.
[Sec. 10.3]
- L (t : ts) (m : ms) = } : (L (t : ts) ms) if m /= 0 and
parse-error(t)
It seems that Happy's error recover is relevant to the interaction via a
production rule:
[ Parser.y ]
- close :: { () }
: vccurly { () } -- context popped in lexer.
| error {% popContext }
Is it enough for a Haskell lexer just to pop the top context? It is not
clear to me how vccurly is inserted just
by having such a production rule and an error recovery.
On Wed, 18 Aug 2021 at 17:05, Edward Kmett
Unfortunately, the current parsing rules for Haskell aren't fully phase-separable like this.
If you look at the rules for Layout token insertion in the Haskell report the 9th rule requires that in the event the parser encounters a parse error it should insert a virtual close brace and continue on!
Otherwise you couldn't parse things like *let **{** foo = bar **}** in baz *where the {}'s are virtual without reframing *let* and *in* as a different kind of paired opening and closing brace or using other hacks in the grammar. It is quite difficult to hack around all the ways parses can go wrong.
The main downside this has from a language standpoint is you simply can't properly lex Haskell without more or less fully parsing Haskell.
-Edward
On Wed, Aug 18, 2021 at 7:22 AM Kwanghoon Choi
wrote: Hi,
I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib package.
Given
module HelloWorld where
main = putStrLn "Hello World!\n"
it produces
stack exec -- lexer-exe ./examples/HelloWorld.hs Lexing&Parsing: ./examples/HelloWorld.hs module at (1, 1): module CONID at (1, 8): CONID where at (1, 19): where vocurly at (3, 1): vocurly <==== { is inserted automatically!! VARID at (3, 1): VARID = at (3, 6): = VARID at (3, 8): VARID STRING at (3, 17): STRING ; at (4, 1): ;
By the example above, the lexer automatically inserts an opening brace (i.e. vocurly) right after 'where'. But it does not insert a matching closing brace (i.e., vccurly), which would lead to a failure in parsing a list of tokens produced by the lexer.
My question is how to use the GHC lexer to produce closing braces as well.
All my code is available - https://github.com/kwanghoon/hslexer
To save your time, the relevant part of the code is as follows:
In app/HaskellLexer.hs,
singleHaskellToken :: P (Located Token) singleHaskellToken = Lexer.lexer False (\locatedToken -> P (\pstate -> POk pstate locatedToken))
tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token]) tokInfos s = do locatedToken <- singleHaskellToken case locatedToken of L srcspan ITeof -> let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in return (end_line, end_col, s)
L srcspan tok -> let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in tokInfos (Terminal (fromToken tok) start_line start_col (Just tok) : s)
Thanks in advance
Best regards,
Kwanghoon
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (2)
-
Edward Kmett
-
Kwanghoon Choi