[Git][ghc/ghc][master] Fix tabs in string gaps (#26415)
Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC Commits: e9c5e46f by Brandon Chinn at 2025-09-25T09:48:36-04:00 Fix tabs in string gaps (#26415) Tabs in string gaps were broken in bb030d0d because previously, string gaps were manually parsed, but now it's lexed by the usual Alex grammar and post-processed after successful lexing. It broke because of a discrepancy between GHC's lexer grammar and the Haskell Report. The Haskell Report includes tabs in whitechar: whitechar → newline | vertab | space | tab | uniWhite $whitechar used to include tabs until 18 years ago, when it was removed in order to exclude tabs from $white_no_nl in order to warn on tabs: 6e202120. In this MR, I'm adding \t back into $whitechar, and explicitly excluding \t from the $white_no_nl+ rule ignoring all whitespace in source code, which more accurately colocates the "ignore all whitespace except tabs, which is handled in the next line" logic. As a side effect of this MR, tabs are now allowed in pragmas; currently, a pragma written as {-# \t LANGUAGE ... #-} is interpreted as the tab character being the pragma name, and GHC warns "Unrecognized pragma". With this change, tabs are ignored as whitespace, which more closely matches the Report anyway. - - - - - 5 changed files: - compiler/GHC/Parser/Lexer.x - compiler/GHC/Parser/Lexer/String.x - + testsuite/tests/parser/should_run/T26415.hs - + testsuite/tests/parser/should_run/T26415.stdout - testsuite/tests/parser/should_run/all.T Changes: ===================================== compiler/GHC/Parser/Lexer.x ===================================== @@ -145,7 +145,7 @@ import GHC.Parser.String $unispace = \x05 -- Trick Alex into handling Unicode. See Note [Unicode in Alex]. $nl = [\n\r\f] $space = [\ $unispace] -$whitechar = [$nl \v $space] +$whitechar = [$nl \t \v $space] $white_no_nl = $whitechar # \n -- TODO #8424 $tab = \t @@ -248,7 +248,7 @@ haskell :- -- Alex "Rules" -- everywhere: skip whitespace -$white_no_nl+ ; +($white_no_nl # \t)+ ; $tab { warnTab } -- Everywhere: deal with nested comments. We explicitly rule out ===================================== compiler/GHC/Parser/Lexer/String.x ===================================== @@ -25,7 +25,7 @@ import GHC.Utils.Panic (panic) $unispace = \x05 -- Trick Alex into handling Unicode. See Note [Unicode in Alex]. $nl = [\n\r\f] $space = [\ $unispace] -$whitechar = [$nl \v $space] +$whitechar = [$nl \t \v $space] $tab = \t $ascdigit = 0-9 ===================================== testsuite/tests/parser/should_run/T26415.hs ===================================== @@ -0,0 +1,7 @@ +{-# LANGUAGE MultilineStrings #-} + +main :: IO () +main = do + -- The below strings contain the characters ['\\', '\t', '\\'] + print "\ \" + print """\ \""" ===================================== testsuite/tests/parser/should_run/T26415.stdout ===================================== @@ -0,0 +1,2 @@ +"" +"" ===================================== testsuite/tests/parser/should_run/all.T ===================================== @@ -27,6 +27,7 @@ test('RecordDotSyntax4', [extra_files(['RecordDotSyntaxA.hs'])], multimod_compil test('RecordDotSyntax5', normal, compile_and_run, ['']) test('ListTuplePunsConstraints', extra_files(['ListTuplePunsConstraints.hs']), ghci_script, ['ListTuplePunsConstraints.script']) test('T25937', normal, compile_and_run, ['']) +test('T26415', normal, compile_and_run, ['']) # Multiline strings test('MultilineStrings', normal, compile_and_run, ['']) View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/e9c5e46ffdb3cd8725e2ffdc2c440ea5... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/e9c5e46ffdb3cd8725e2ffdc2c440ea5... You're receiving this email because of your account on gitlab.haskell.org.
participants (1)
-
Marge Bot (@marge-bot)