
Ah, I didn't think about the GHC options that change the lexical syntax. You're right, using the GHC lexer should be easier.
and, if you do that, you could also make the GHC lexer squirrel away the comments (including pragmas, if they aren't already in the AST) someplace safe, indexed by, or at least annotated with, their source locations, and make this comment/ pragma storage available via the GHC API. (1a) then, we'd need a way to merge those comments and pragmas back into the output during pretty printing, and we'd have made the first small step towards source-to-source transformations: making code survive semantically intact over (pretty . parse). (1b) that would still not quite fullfill the GHC API comment ticket (*), but that was only a quick sketch, not a definite design. it might be sufficient to let each GHC API client do its own search to associate bits of comment/pragma storage with bits of AST. if i understand you correctly, you are going to do (1a), so if you could add that to the GHC API, we'd only need (1b) to go from useable-for-analysis-and-extraction to useable-for-transformation. is that going to be a problem? claus (*) knowing the source location of some piece of AST is not sufficient for figuring out whether it has any immediately preceding or following comments (there might be other AST fragments in between, closer to the next comment). but, if one knows the nearest comment segment for each piece of AST, one could then build a map where the closest AST pieces are mapped to (Just commentID), and the other AST pieces are mapped to Nothing.