Unfortunately, the current parsing rules for Haskell aren't fully phase-separable like this.

If you look at the rules for Layout token insertion in the Haskell report the 9th rule requires that in the event the parser encounters a parse error it should insert a virtual close brace and continue on!

Otherwise you couldn't parse things like let { foo = bar } in baz where the {}'s are virtual without reframing let and in as a different kind of paired opening and closing brace or using other hacks in the grammar. It is quite difficult to hack around all the ways parses can go wrong.

The main downside this has from a language standpoint is you simply can't properly lex Haskell without more or less fully parsing Haskell.

-Edward

On Wed, Aug 18, 2021 at 7:22 AM Kwanghoon Choi <lazyswamp@gmail.com> wrote:

Hi,

I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib package.

Given

   module HelloWorld where

   main = putStrLn "Hello World!\n"

it produces

   stack exec -- lexer-exe ./examples/HelloWorld.hs   
   Lexing&Parsing: ./examples/HelloWorld.hs
   module at (1, 1): module
   CONID at (1, 8): CONID
   where at (1, 19): where
   vocurly at (3, 1): vocurly    <==== { is inserted automatically!!
   VARID at (3, 1): VARID
   = at (3, 6): =
   VARID at (3, 8): VARID
   STRING at (3, 17): STRING
   ; at (4, 1): ;

By the example above, the lexer automatically inserts an opening brace (i.e. vocurly) right after 'where'. But it does not insert a matching closing brace (i.e., vccurly), which would lead to a failure in parsing a list of tokens produced by the lexer. 

My question is how to use the GHC lexer to produce closing braces as well. 

All my code is available 
 - https://github.com/kwanghoon/hslexer

To save your time, the relevant part of the code is as follows:

In app/HaskellLexer.hs,

    singleHaskellToken :: P (Located Token)
    singleHaskellToken =
      Lexer.lexer False
        (\locatedToken -> P (\pstate -> POk pstate locatedToken))

    tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token])
    tokInfos s = do
      locatedToken <- singleHaskellToken
      case locatedToken of
        L srcspan ITeof ->
          let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in
          return (end_line, end_col, s)
         
        L srcspan tok ->
          let (start_line, start_col, end_line, end_col) = srcSpanToLineCol srcspan in
          tokInfos (Terminal (fromToken tok) start_line start_col (Just tok) : s)

Thanks in advance

Best regards,

Kwanghoon




_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs