82fb5bed
by Alan Zimmerman at 2025-10-16T19:58:09+01:00
GHC-CPP: Initial implementation
Processes
#define FOO
#ifdef FOO
x = 1
#endif
Into
[ITcppIgnored [L loc ITcppDefine]
,ITcppIgnored [L loc ITcppIfdef]
,ITvarid "x"
,ITequal
,ITinteger (IL {il_text = SourceText "1", il_neg = False, il_value = 1})
,ITcppIgnored [L loc ITcppEndif]
,ITeof]
In time, ITcppIgnored will be pushed into a comment
Tidy up before re-visiting the continuation mechanic
Switch preprocessor to continuation passing style
Proof of concept, needs tidying up
Small cleanup
Get rid of some cruft
Summary: Patch:
Author: Alan Zimmerman <alan.zimm@gmail.com>
Date: 2025-10-12 16:23:56 +0100
Summary: Patch: rebase-and-tests-pass
Author: Alan Zimmerman <alan.zimm@gmail.com>
Date: 2025-10-12 14:19:04 +0100
Rebase, and all tests pass except whitespace for generated parser
Starting to integrate.
Need to get the pragma recognised and set
Make cppTokens extend to end of line, and process CPP comments
Remove unused ITcppDefined
Allow spaces between # and keyword for preprocessor directive
Process CPP continuation lines
They are emited as separate ITcppContinue tokens.
Perhaps the processing should be more like a comment, and keep on
going to the end.
BUT, the last line needs to be slurped as a whole.
Accumulate CPP continuations, process when ready
Can be simplified further, we only need one CPP token
Simplify Lexer interface. Only ITcpp
We transfer directive lines through it, then parse them from scratch
in the preprocessor.
Deal with directive on last line, with no trailing \n
Start parsing and processing the directives
Prepare for processing include files
Move PpState into PreProcess
And initParserState, initPragState too
Process nested include files
Also move PpState out of Lexer.x, so it is easy to evolve it in a ghci
session, loading utils/check-cpp/Main.hs
Split into separate files
Starting on expression parser.
But it hangs. Time for Text.Parsec.Expr
Start integrating the ghc-cpp work
From https://github.com/alanz/ghc-cpp
WIP
Fixup after rebase
WIP
Fixup after rebase, including all tests pass
Change pragma usage to GHC_CPP from GhcCPP
Some comments
Reformat
Delete unused file
Rename module Parse to ParsePP
Clarify naming in the parser
WIP. Switching to alex/happy to be able to work in-tree
Since Parsec is not available
Layering is now correct
- GHC lexer, emits CPP tokens
- accumulated in Preprocessor state
- Lexed by CPP lexer, CPP command extracted, tokens concated with
spaces (to get rid of token pasting via comments)
- if directive lexed and parsed by CPP lexer/parser, and evaluated
First example working
Loading Example1.hs into ghci, getting the right results
```
{-# LANGUAGE GHC_CPP #-}
module Example1 where
y = 3
x =
"hello"
"bye now"
foo = putStrLn x
```
Rebase, and all tests pass except whitespace for generated parser
info: patch template saved to `-`
More plumbing. Ready for testing tomorrow.
Proress. Renamed module State from Types
And at first blush it seems to handle preprocessor scopes properly.
Insert basic GHC version macros into parser
__GLASGOW_HASKELL__
__GLASGOW_HASKELL_FULL_VERSION__
__GLASGOW_HASKELL_PATCHLEVEL1__
__GLASGOW_HASKELL_PATCHLEVEL2__
Re-sync check-cpp for easy ghci work
Get rid of warnings
Rework macro processing, in check-cpp
Macros kept at the top level, looked up via name, multiple arity
versions per name can be stored
WIP. Can crack arguments for #define
Next step it to crack out args in an expansion
WIP on arg parsing.
Progress. Still screwing up nested parens.
Seems to work, but has redundant code
Remove redundant code
Reformat
Expand args, single pass
Still need to repeat until fixpoint
Fixed point expansion
Sync the playground to compiler
Working on dumping the GHC_CPP result
But We need to keep the BufSpan in a comment
Keep BufSpan in queued comments in GHC.Parser.Lexer
Getting close to being able to print the combined tokens
showing what is in and what is out
First implementation of dumpGhcCpp.
Example output
First dumps all macros in the state, then the source, showing which
lines are in and which are out
------------------------------
- |#define FOO(A,B) A + B
- |#define FOO(A,B,C) A + B + C
- |#if FOO(1,FOO(3,4)) == 8
- |-- a comment
|x = 1
- |#else
- |x = 5
- |#endif
Clean up a bit
Add -ddump-ghc-cpp option and a test based on it
Restore Lexer.x rules, we need them for continuation lines
Lexer.x: trying to sort out the span for continuations
- We need to match on \n at the end of the line
- We cannot simply back up for it
Inserts predefined macros. But does not dump properly
Because the cpp tokens have a trailing newline
Remove unnecessary LExer rules
We *need* the ones that explicitly match to the end of the line.
Generate correct span for ITcpp
Dump now works, except we do not render trailing `\` for continuation
lines. This is good enough for use in test output.
Reduce duplication in lexer
Tweaks
Insert min_version predefined macros into state
The mechanism now works. Still need to flesh out the full set.
Trying my alternative pragma syntax.
It works, but dumpGhcCpp is broken, I suspect from the ITcpp token
span update.
Pragma extraction now works, with both CPP and GHC_CPP
For the following
{-# LANGUAGE CPP #-}
#if __GLASGOW_HASKELL__ >= 913
{-# LANGUAGE GHC_CPP #-}
#endif
We will enable GHC_CPP only
Remove some tracing
Fix test exes for changes
For GHC_CPP tests, normalise config-time-based macros
WIP
WIP again. What is wrong?
Revert to dynflags for normal not pragma lexing
Working on getting check-exact to work properly
Passes CppCommentPlacement test
Starting on exact printing with GHC_CPP
While overriding normal CPP
Correctly store CPP ignored tokens as comments
By populating the lexeme string in it, based on the bufpos
WIP
Simplifying
Update the active state logic
Work the new logic into the mainline code
Process `defined` operator
Manage lexer state while skipping tokens
There is very intricate layout-related state used when lexing. If a
CPP directive blanks out some tokens, store this state when the
blanking starts, and restore it when they are no longer being blanked.
Track the last token buffer index, for ITCppIgnored
We need to attach the source being skipped in an ITCppIgnored token.
We cannot simply use its BufSpan as an index into the underlying
StringBuffer as it counts unicode chars, not bytes.
So we update the lexer state to store the starting StringBuffer
location for the last token, and use the already-stored length to
extract the correct portion of the StringBuffer being parsed.
Process the ! operator in GHC_CPP expressions
Predefine a constant when GHC_CPP is being used.
WIP
Skip lines directly in the lexer when required
Properly manage location when accepting tokens again
Seems to be working now, for Example9
Remove tracing
Fix parsing '*' in block comments
Instead of replacing them with '-'
Keep the trailing backslash in a ITcpp token
Deal with only enabling one section of a group.
A group is an instance of a conditional introduced by
#if/#ifdef/#ifndef,
and ending at the final #endif, including intermediate #elsif sections
Replace remaining identifiers with 0 when evaluating
As per the spec
Snapshot before rebase
Skip non-processed lines starting with #
Export generateMacros so we can use it in ghc-exactprint
Fix rebase
Expose initParserStateWithMacrosString
Fix buggy lexer cppSkip
It was skipping all lines, not just ones prefixed by #
Fix evaluation of && to use the correct operator
Deal with closing #-} at the start of a line
Add the MIN_VERSION_GLASGOW_HASKELL predefined macro
Include MIN_VERSION_GLASGOW_HASKELL in GhcCpp01.stderr
Use a strict map for macro defines
Process TIdentifierLParen
Which only matters at the start of #define
Do not provide TIdentifierLParen paren twice
Handle whitespace between identifier and '(' for directive only
Expose some Lexer bitmap manipulation helpers
Deal with line pragmas as tokens
Blows up for dumpGhcCpp though
Allow strings delimited by a single quote too
Allow leading whitespace on cpp directives
As per https://timsong-cpp.github.io/cppwp/n4140/cpp#1
Implement GHC_CPP undef
Sort out expansion of no-arg macros, in a context with args
And make the expansion bottom out, in the case of recursion
Fix GhcCpp01 test
The LINE pragma stuff works in ghc-exactprint when specifically
setting flag to emit ITline_pragma tokens
Process comments in CPP directives
Correctly lex pragmas with finel #-} on a newline
Do not process CPP-style comments
Allow cpp-style comments when GHC_CPP enabled
Return other pragmas as cpp ignored when GHC_CPP active
Reorganise getOptionsFromFile for use in ghc-exactprint
We want to be able to inject predefined macro definitions into the
parser preprocessor state for when we do a hackage roundtrip.
Tweak testing
Only allow unknown cpp pragmas with # in left margin
Require # against left margin for all GHC_CPP directives
Fix CPP directives appearing in pragmas
And add a test for error reporting for missing `#if`
Starting to report GHC_CPP errors using GHC machinery
More GHC_CPP diagnostic results
WIP on converting error calls to GHC diagnostics in GHC_CPP
Working on CPP diagnostic reporting
Tweak some tests/lint warnings
More error reporting in Macro
Some cleanups
Some cleanup
GHC_CPP: Working on improving error reporting
Harvest some commonality
Use PPM as Maybe inside PP
Clean up a bit
Fix GhcCpp01 test
I think this needs to be made more robust. Likely by not dumping the
(pre-)defined macros.
info: patch template saved to `-`
info: patch template saved to `-`