Ok, to answer my own question, I changed nested_comment to

    nested_comment :: P (RealLocated Token) -> Action
    nested_comment cont span buf len = do
      input <- getInput
      go (reverse $ lexemeToString buf len) (1::Int) input

It now starts off with the already lexed part.


On Wed, Oct 29, 2014 at 9:04 PM, Alan & Kim Zimmerman <alan.zimm@gmail.com> wrote:
As part of my ongoing efforts to round-trip source code, I have bumped into an issue around file header pragmas, e.g.

    {-# LANGUAGE PatternSynonyms #-}
    {-# Language DeriveFoldable #-}
    {-# options_ghc -w #-}


In normal mode, when not called from headerInfo, the file header pragmas are lexed enough to generate a warning about an invalid pragma if enabled, and then lexed to completion and returned as an `ITblockComment` if `Opt_KeepRawTokenStream` is enabled.

The relevant Alex rule is

    <0> {
      -- In the "0" mode we ignore these pragmas
      "{-#"  $whitechar* $pragmachar+ / { known_pragma fileHeaderPrags }
                         { nested_comment lexToken }
    }

The problem is that the tokens returned are
   
    ITblockComment " PatternSynonyms #"
    ITblockComment " DeriveFoldable #"
    ITblockComment " -w #"

It is not possible to reproduce the original comment from these.


It looks like nested comment ignores what has been lexed so far

    nested_comment :: P (RealLocated Token) -> Action
    nested_comment cont span _str _len = do
    ...

So my question is, is there any way to make the returned comment include the prefix part? Perhaps be a specific variation of nested_comment that uses str and len.

Alan