
Alan Zimmerman pushed to branch wip/az/ghc-cpp at Glasgow Haskell Compiler / GHC Commits: 3b3a5dec by Ben Gamari at 2025-05-15T16:10:01-04:00 Don't emit unprintable characters when printing Uniques When faced with an unprintable tag we now instead print the codepoint number. Fixes #25989. (cherry picked from commit e832b1fadee66e8d6dd7b019368974756f8f8c46) - - - - - e1ef8974 by Mike Pilgrem at 2025-05-16T16:09:14-04:00 Translate iff in Haddock documentation into everyday English - - - - - b37711f9 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 GHC-CPP: first rough proof of concept Processes #define FOO #ifdef FOO x = 1 #endif Into [ITcppIgnored [L loc ITcppDefine] ,ITcppIgnored [L loc ITcppIfdef] ,ITvarid "x" ,ITequal ,ITinteger (IL {il_text = SourceText "1", il_neg = False, il_value = 1}) ,ITcppIgnored [L loc ITcppEndif] ,ITeof] In time, ITcppIgnored will be pushed into a comment - - - - - 155274a4 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Tidy up before re-visiting the continuation mechanic - - - - - e67cc209 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Switch preprocessor to continuation passing style Proof of concept, needs tidying up - - - - - 1a0613de by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Small cleanup - - - - - 28bb3dcd by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Get rid of some cruft - - - - - af244265 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Starting to integrate. Need to get the pragma recognised and set - - - - - 4df9b8db by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Make cppTokens extend to end of line, and process CPP comments - - - - - 571a3557 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Remove unused ITcppDefined - - - - - 04444ced by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Allow spaces between # and keyword for preprocessor directive - - - - - 56022164 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Process CPP continuation lines They are emited as separate ITcppContinue tokens. Perhaps the processing should be more like a comment, and keep on going to the end. BUT, the last line needs to be slurped as a whole. - - - - - f20ff9a2 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Accumulate CPP continuations, process when ready Can be simplified further, we only need one CPP token - - - - - 35e31452 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Simplify Lexer interface. Only ITcpp We transfer directive lines through it, then parse them from scratch in the preprocessor. - - - - - c9b03ce5 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Deal with directive on last line, with no trailing \n - - - - - e1f18f92 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Start parsing and processing the directives - - - - - 651b1e66 by Alan Zimmerman at 2025-05-17T09:54:42+01:00 Prepare for processing include files - - - - - 76f05ae3 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Move PpState into PreProcess And initParserState, initPragState too - - - - - f71d75df by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Process nested include files Also move PpState out of Lexer.x, so it is easy to evolve it in a ghci session, loading utils/check-cpp/Main.hs - - - - - 9a5d961d by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Split into separate files - - - - - 12fe7c28 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Starting on expression parser. But it hangs. Time for Text.Parsec.Expr - - - - - eab98997 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Start integrating the ghc-cpp work From https://github.com/alanz/ghc-cpp - - - - - 23e5c90c by Alan Zimmerman at 2025-05-17T09:54:43+01:00 WIP - - - - - 109fe6fc by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Fixup after rebase - - - - - b218f624 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 WIP - - - - - 5946cc99 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Fixup after rebase, including all tests pass - - - - - f7c374ac by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Change pragma usage to GHC_CPP from GhcCPP - - - - - b63bfb1d by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Some comments - - - - - 885d9ab6 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Reformat - - - - - 9595ad65 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Delete unused file - - - - - 090a3e45 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Rename module Parse to ParsePP - - - - - 8d85e179 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Clarify naming in the parser - - - - - eabddb9a by Alan Zimmerman at 2025-05-17T09:54:43+01:00 WIP. Switching to alex/happy to be able to work in-tree Since Parsec is not available - - - - - de751411 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Layering is now correct - GHC lexer, emits CPP tokens - accumulated in Preprocessor state - Lexed by CPP lexer, CPP command extracted, tokens concated with spaces (to get rid of token pasting via comments) - if directive lexed and parsed by CPP lexer/parser, and evaluated - - - - - e640ec71 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 First example working Loading Example1.hs into ghci, getting the right results ``` {-# LANGUAGE GHC_CPP #-} module Example1 where y = 3 x = "hello" "bye now" foo = putStrLn x ``` - - - - - 6220e06f by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Rebase, and all tests pass except whitespace for generated parser - - - - - 183393b6 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 More plumbing. Ready for testing tomorrow. - - - - - 42f8b67e by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Proress. Renamed module State from Types And at first blush it seems to handle preprocessor scopes properly. - - - - - c7804af7 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Insert basic GHC version macros into parser __GLASGOW_HASKELL__ __GLASGOW_HASKELL_FULL_VERSION__ __GLASGOW_HASKELL_PATCHLEVEL1__ __GLASGOW_HASKELL_PATCHLEVEL2__ - - - - - 2bd6ae63 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Re-sync check-cpp for easy ghci work - - - - - 5c143c42 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Get rid of warnings - - - - - 6d51665d by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Rework macro processing, in check-cpp Macros kept at the top level, looked up via name, multiple arity versions per name can be stored - - - - - 93a18c1e by Alan Zimmerman at 2025-05-17T09:54:43+01:00 WIP. Can crack arguments for #define Next step it to crack out args in an expansion - - - - - 75bf2b6b by Alan Zimmerman at 2025-05-17T09:54:43+01:00 WIP on arg parsing. - - - - - 8b5d99d8 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Progress. Still screwing up nested parens. - - - - - a529b10f by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Seems to work, but has redundant code - - - - - 05d7ca7f by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Remove redundant code - - - - - fcb2387e by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Reformat - - - - - dfaf1a46 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Expand args, single pass Still need to repeat until fixpoint - - - - - c806eb22 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Fixed point expansion - - - - - 86403450 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Sync the playground to compiler - - - - - 917a66b2 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Working on dumping the GHC_CPP result But We need to keep the BufSpan in a comment - - - - - 3bb0bb30 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Keep BufSpan in queued comments in GHC.Parser.Lexer - - - - - 82106a2e by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Getting close to being able to print the combined tokens showing what is in and what is out - - - - - 493c0253 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 First implementation of dumpGhcCpp. Example output First dumps all macros in the state, then the source, showing which lines are in and which are out ------------------------------ - |#define FOO(A,B) A + B - |#define FOO(A,B,C) A + B + C - |#if FOO(1,FOO(3,4)) == 8 - |-- a comment |x = 1 - |#else - |x = 5 - |#endif - - - - - a8d628b2 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Clean up a bit - - - - - 206e4773 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Add -ddump-ghc-cpp option and a test based on it - - - - - 65bb5bc2 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Restore Lexer.x rules, we need them for continuation lines - - - - - 5726e351 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Lexer.x: trying to sort out the span for continuations - We need to match on \n at the end of the line - We cannot simply back up for it - - - - - 7612ec92 by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Inserts predefined macros. But does not dump properly Because the cpp tokens have a trailing newline - - - - - 62b5ea7d by Alan Zimmerman at 2025-05-17T09:54:43+01:00 Remove unnecessary LExer rules We *need* the ones that explicitly match to the end of the line. - - - - - 47d703cf by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Generate correct span for ITcpp Dump now works, except we do not render trailing `\` for continuation lines. This is good enough for use in test output. - - - - - aef0b466 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Reduce duplication in lexer - - - - - cc158a75 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Tweaks - - - - - c192915c by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Insert min_version predefined macros into state The mechanism now works. Still need to flesh out the full set. - - - - - 601395ff by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Trying my alternative pragma syntax. It works, but dumpGhcCpp is broken, I suspect from the ITcpp token span update. - - - - - 61117c67 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Pragma extraction now works, with both CPP and GHC_CPP For the following {-# LANGUAGE CPP #-} #if __GLASGOW_HASKELL__ >= 913 {-# LANGUAGE GHC_CPP #-} #endif We will enable GHC_CPP only - - - - - 96540e19 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Remove some tracing - - - - - 2bf2c60f by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Fix test exes for changes - - - - - a6e90845 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 For GHC_CPP tests, normalise config-time-based macros - - - - - 6665d0fa by Alan Zimmerman at 2025-05-17T09:54:44+01:00 WIP - - - - - 03283165 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 WIP again. What is wrong? - - - - - b56db99f by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Revert to dynflags for normal not pragma lexing - - - - - 75d67c2a by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Working on getting check-exact to work properly - - - - - 0908eb85 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Passes CppCommentPlacement test - - - - - 8880d51a by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Starting on exact printing with GHC_CPP While overriding normal CPP - - - - - 685963fd by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Correctly store CPP ignored tokens as comments By populating the lexeme string in it, based on the bufpos - - - - - 29f82644 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 WIP - - - - - addfca69 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Simplifying - - - - - 37a6f59f by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Update the active state logic - - - - - e1e11679 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Work the new logic into the mainline code - - - - - 1f8c610f by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Process `defined` operator - - - - - e3948c03 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Manage lexer state while skipping tokens There is very intricate layout-related state used when lexing. If a CPP directive blanks out some tokens, store this state when the blanking starts, and restore it when they are no longer being blanked. - - - - - b1ffd86f by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Track the last token buffer index, for ITCppIgnored We need to attach the source being skipped in an ITCppIgnored token. We cannot simply use its BufSpan as an index into the underlying StringBuffer as it counts unicode chars, not bytes. So we update the lexer state to store the starting StringBuffer location for the last token, and use the already-stored length to extract the correct portion of the StringBuffer being parsed. - - - - - 68494b79 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Process the ! operator in GHC_CPP expressions - - - - - cd161831 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Predefine a constant when GHC_CPP is being used. - - - - - fcc441ab by Alan Zimmerman at 2025-05-17T09:54:44+01:00 WIP - - - - - 42240bf2 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Skip lines directly in the lexer when required - - - - - df58fdcb by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Properly manage location when accepting tokens again - - - - - 0fd128b4 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Seems to be working now, for Example9 - - - - - 73ec0a2d by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Remove tracing - - - - - c0f73ffd by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Fix parsing '*' in block comments Instead of replacing them with '-' - - - - - 089cf569 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Keep the trailing backslash in a ITcpp token - - - - - 47d41734 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Deal with only enabling one section of a group. A group is an instance of a conditional introduced by #if/#ifdef/#ifndef, and ending at the final #endif, including intermediate #elsif sections - - - - - 1a3104bb by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Replace remaining identifiers with 0 when evaluating As per the spec - - - - - 23449d3b by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Snapshot before rebase - - - - - ce898e7a by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Skip non-processed lines starting with # - - - - - 428a0aa4 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Export generateMacros so we can use it in ghc-exactprint - - - - - 8bfe5dee by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Fix rebase - - - - - ba5cf313 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Expose initParserStateWithMacrosString - - - - - 7cc2a4dd by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Fix buggy lexer cppSkip It was skipping all lines, not just ones prefixed by # - - - - - 200ba48c by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Fix evaluation of && to use the correct operator - - - - - c5b56896 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Deal with closing #-} at the start of a line - - - - - 542a9e65 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Add the MIN_VERSION_GLASGOW_HASKELL predefined macro - - - - - c291b710 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Include MIN_VERSION_GLASGOW_HASKELL in GhcCpp01.stderr - - - - - c80ad8a9 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Use a strict map for macro defines - - - - - b13f79c9 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Process TIdentifierLParen Which only matters at the start of #define - - - - - 733ba1ef by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Do not provide TIdentifierLParen paren twice - - - - - 14469313 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Handle whitespace between identifier and '(' for directive only - - - - - b59f7099 by Alan Zimmerman at 2025-05-17T09:54:44+01:00 Expose some Lexer bitmap manipulation helpers - - - - - b5a71225 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Deal with line pragmas as tokens Blows up for dumpGhcCpp though - - - - - 6a4fe098 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Allow strings delimited by a single quote too - - - - - acf20184 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Allow leading whitespace on cpp directives As per https://timsong-cpp.github.io/cppwp/n4140/cpp#1 - - - - - 8a829253 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Implement GHC_CPP undef - - - - - 484c6719 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Sort out expansion of no-arg macros, in a context with args And make the expansion bottom out, in the case of recursion - - - - - b44c6320 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Fix GhcCpp01 test The LINE pragma stuff works in ghc-exactprint when specifically setting flag to emit ITline_pragma tokens - - - - - b9eb081e by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Process comments in CPP directives - - - - - 17348375 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Correctly lex pragmas with finel #-} on a newline - - - - - a168f964 by Alan Zimmerman at 2025-05-17T09:54:45+01:00 Do not process CPP-style comments - - - - - 7248c292 by Alan Zimmerman at 2025-05-18T09:10:52+01:00 Allow cpp-style comments when GHC_CPP enabled - - - - - 80 changed files: - compiler/GHC.hs - compiler/GHC/Cmm/Lexer.x - compiler/GHC/Cmm/Parser.y - compiler/GHC/Cmm/Parser/Monad.hs - compiler/GHC/Driver/Backpack.hs - compiler/GHC/Driver/Config/Parser.hs - compiler/GHC/Driver/Downsweep.hs - compiler/GHC/Driver/Flags.hs - compiler/GHC/Driver/Main.hs - compiler/GHC/Driver/Pipeline.hs - compiler/GHC/Driver/Pipeline/Execute.hs - compiler/GHC/Driver/Session.hs - compiler/GHC/Parser.hs-boot - compiler/GHC/Parser.y - compiler/GHC/Parser/Annotation.hs - compiler/GHC/Parser/HaddockLex.x - compiler/GHC/Parser/Header.hs - compiler/GHC/Parser/Lexer.x - compiler/GHC/Parser/PostProcess.hs - compiler/GHC/Parser/PostProcess/Haddock.hs - + compiler/GHC/Parser/PreProcess.hs - + compiler/GHC/Parser/PreProcess/Eval.hs - + compiler/GHC/Parser/PreProcess/Lexer.x - + compiler/GHC/Parser/PreProcess/Macro.hs - + compiler/GHC/Parser/PreProcess/ParsePP.hs - + compiler/GHC/Parser/PreProcess/Parser.y - + compiler/GHC/Parser/PreProcess/ParserM.hs - + compiler/GHC/Parser/PreProcess/State.hs - compiler/GHC/Parser/Utils.hs - compiler/GHC/SysTools/Cpp.hs - compiler/GHC/Types/Unique.hs - compiler/ghc.cabal.in - docs/users_guide/debugging.rst - ghc/GHCi/UI.hs - hadrian/src/Rules/SourceDist.hs - hadrian/stack.yaml.lock - libraries/ghc-internal/src/GHC/Internal/Data/Maybe.hs - libraries/ghc-internal/src/GHC/Internal/LanguageExtensions.hs - testsuite/tests/count-deps/CountDepsParser.stdout - testsuite/tests/driver/T4437.hs - testsuite/tests/ghc-api/T11579.hs - + testsuite/tests/ghc-cpp/GhcCpp01.hs - + testsuite/tests/ghc-cpp/GhcCpp01.stderr - + testsuite/tests/ghc-cpp/all.T - testsuite/tests/interface-stability/template-haskell-exports.stdout - + testsuite/tests/printer/CppCommentPlacement.hs - + utils/check-cpp/.ghci - + utils/check-cpp/.gitignore - + utils/check-cpp/Eval.hs - + utils/check-cpp/Example1.hs - + utils/check-cpp/Example10.hs - + utils/check-cpp/Example11.hs - + utils/check-cpp/Example12.hs - + utils/check-cpp/Example13.hs - + utils/check-cpp/Example2.hs - + utils/check-cpp/Example3.hs - + utils/check-cpp/Example4.hs - + utils/check-cpp/Example5.hs - + utils/check-cpp/Example6.hs - + utils/check-cpp/Example7.hs - + utils/check-cpp/Example8.hs - + utils/check-cpp/Example9.hs - + utils/check-cpp/Lexer.x - + utils/check-cpp/Macro.hs - + utils/check-cpp/Main.hs - + utils/check-cpp/ParsePP.hs - + utils/check-cpp/ParseSimulate.hs - + utils/check-cpp/Parser.y - + utils/check-cpp/ParserM.hs - + utils/check-cpp/PreProcess.hs - + utils/check-cpp/README.md - + utils/check-cpp/State.hs - + utils/check-cpp/run.sh - utils/check-exact/Main.hs - utils/check-exact/Parsers.hs - utils/check-exact/Preprocess.hs - utils/check-exact/Utils.hs - utils/haddock/haddock-api/src/Haddock/Backends/Hyperlinker/Parser.hs - utils/haddock/haddock-api/src/Haddock/Parser.hs - utils/haddock/haddock-api/src/Haddock/Types.hs The diff was not included because it is too large. View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/497d6c7e1af9df6b9712a2e50af0481... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/497d6c7e1af9df6b9712a2e50af0481... You're receiving this email because of your account on gitlab.haskell.org.