8.8.x change in lookupModule / findModule semantics?

Hi ghc-devs, When upgrading my dev tool to use the 8.8.x api I have found that calls to lookupModule / findModule are failing on uncompilable code, whereas these calls succeeded on 8.4 and 8.6. Unlike another question that I have asked [1] which seems related, I don't intend to do any typechecking here, I only want access to the ModuleInfo. Has this been an intentional change? Is there a way to recover the original behaviour? If I cannot get this to work then it means that a new feature introduced in https://gitlab.haskell.org/ghc/ghc/merge_requests/1541 that I was quite excited about is unusable :-( [1] https://mail.haskell.org/pipermail/ghc-devs/2020-February/018655.html

Hi, looking at `importsOnly`: #if MIN_VERSION_GLASGOW_HASKELL(8,8,1,0) pp <- liftIO $ preprocess sess file Nothing Nothing let (dflags, tmp) = case pp of Left _ -> error $ "preprocessing failed " <> show file Right success -> success #else Since 8.8.1 `preprocess` takes an addition argument of type `Maybe StringBuffer` that you're setting to `Nothing` here, so the pipeline will read your on-disk file instead of using the buffer. The whole point of that change was to allow passing in-memory buffers straight into the pipeline for tooling so I'm hoping you can use that to simplify your workarounds, see https://gitlab.haskell.org/ghc/ghc/merge_requests/1014/ for context. --Daniel On Wed, Feb 26, 2020 at 09:58:31PM +0000, Tseen She wrote:
Hi ghc-devs,
When upgrading my dev tool to use the 8.8.x api I have found that calls to lookupModule / findModule are failing on uncompilable code, whereas these calls succeeded on 8.4 and 8.6.
Unlike another question that I have asked [1] which seems related, I don't intend to do any typechecking here, I only want access to the ModuleInfo.
Has this been an intentional change? Is there a way to recover the original behaviour?
If I cannot get this to work then it means that a new feature introduced in https://gitlab.haskell.org/ghc/ghc/merge_requests/1541 that I was quite excited about is unusable :-(
[1] https://mail.haskell.org/pipermail/ghc-devs/2020-February/018655.html
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Thanks Daniel,
I think we might be mixing up the two threads here, please let me know if
I've misunderstood, so I'll respond as if this was in response to the
thread about in-memory buffers: the function you've hightlighted is working
as intended (thanks for clarifying the API change, I had to work that out
myself and was just guessing) and produces the minimal buffer that is then
used by the next function.
As to the relation between importsOnly and lookupModule / findModule, they
are independent.
On Wed, 26 Feb 2020 at 23:05, Daniel Gröber
Hi,
looking at `importsOnly`:
#if MIN_VERSION_GLASGOW_HASKELL(8,8,1,0) pp <- liftIO $ preprocess sess file Nothing Nothing let (dflags, tmp) = case pp of Left _ -> error $ "preprocessing failed " <> show file Right success -> success #else
Since 8.8.1 `preprocess` takes an addition argument of type `Maybe StringBuffer` that you're setting to `Nothing` here, so the pipeline will read your on-disk file instead of using the buffer.
The whole point of that change was to allow passing in-memory buffers straight into the pipeline for tooling so I'm hoping you can use that to simplify your workarounds, see https://gitlab.haskell.org/ghc/ghc/merge_requests/1014/ for context.
--Daniel
On Wed, Feb 26, 2020 at 09:58:31PM +0000, Tseen She wrote:
Hi ghc-devs,
When upgrading my dev tool to use the 8.8.x api I have found that calls to lookupModule / findModule are failing on uncompilable code, whereas these calls succeeded on 8.4 and 8.6.
Unlike another question that I have asked [1] which seems related, I don't intend to do any typechecking here, I only want access to the ModuleInfo.
Has this been an intentional change? Is there a way to recover the original behaviour?
If I cannot get this to work then it means that a new feature introduced in https://gitlab.haskell.org/ghc/ghc/merge_requests/1541 that I was quite excited about is unusable :-(
[1] https://mail.haskell.org/pipermail/ghc-devs/2020-February/018655.html
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi, On Thu, Feb 27, 2020 at 08:02:16AM +0000, Tseen She wrote:
I think we might be mixing up the two threads here, please let me know if I've misunderstood
Ah yes, I was reading both threads and ended up replying to the wrong one :)
so I'll respond as if this was in response to the thread about in-memory buffers: the function you've hightlighted is working as intended (thanks for clarifying the API change, I had to work that out myself and was just guessing) and produces the minimal buffer that is then used by the next function.
You know, it does look like you're implementing in importsOnly is very simmilar to ghc's internal getPreprocessedImports. While that function isn't exported it's just a simple wrapper around HeaderInfo.getImports which is. You might want to use that API instead or is there a reason you had to reimplement all of it? Anyways, looking at the code for parseModule as long as the ModSummary you pass in has ms_hspp_buf set it should use the buffer instead of the file. The code will eventually call `hscParse'` which has: let src_filename = ms_hspp_file mod_summary maybe_src_buf = ms_hspp_buf mod_summary -------------------------- Parser ---------------- -- sometimes we already have the buffer in memory, perhaps -- because we needed to parse the imports out of it, or get the -- module name. buf <- case maybe_src_buf of Just b -> return b Nothing -> liftIO $ hGetStringBuffer src_filename typecheckModule will just use the ParsedSource parseModule produced so it shouldn't do any more file reading.
As to the relation between importsOnly and lookupModule / findModule, they are independent.
I'm not actually seeing any calls to {lookup,find}Module in your code so it's very hard to say if the behaviour changed. Do you have a reproducer for that? --Daniel

On Thu, 27 Feb 2020 at 11:40, Daniel Gröber
You know, it does look like you're implementing in importsOnly is very simmilar to ghc's internal getPreprocessedImports
Very likely :-) but if it's not exported I am stuck with the workaround. . While that function
isn't exported it's just a simple wrapper around HeaderInfo.getImports which is. You might want to use that API instead or is there a reason you had to reimplement all of it?
getImports only returns located module names, but I need to recover the source code with module definition, pragmas, and imports, so that I can typecheck it (actually I just need to run namer, but that isn't exposed as a separate phase and typechecking basically isn't anything more than namer when there is nothing in the file except imports). Just to be clear though, there is absolutely no problem with the importsOnly code so much as I'm aware, although I'd be happy to throw it away if this was made available inside ghc :-)
Anyways, looking at the code for parseModule as long as the ModSummary you pass in has ms_hspp_buf set it should use the buffer instead of the file.
Yes, I agree. That function is fine. Using debug tracing, I can confirm that the ms_hspp_buf of a ModSummary has my in-memory version: I have isolated the problem to a change in behaviour in the next line: typecheckModule.
typecheckModule will just use the ParsedSource parseModule produced so it shouldn't do any more file reading.
As to the relation between importsOnly and lookupModule / findModule,
This is where I am not so sure. I am fairly certain that something has broken in the typechecking line. I will try again with debug tracing to confirm what the ParsedSource looks like. they
are independent.
I'm not actually seeing any calls to {lookup,find}Module in your code so it's very hard to say if the behaviour changed. Do you have a reproducer for that?
That's because it is in a different branch than the in-memory stuff. I am more or less getting a problem / regression with this code: let target = GHC.Target (GHC.TargetModule mn) False Nothing GHC.removeTarget $ TargetModule (traceShow (showGhc mn) mn) GHC.addTarget target _ <- GHC.load $ GHC.LoadUpTo mn m <- GHC.lookupModule mn Nothing And in my WIP branch https://gitlab.com/tseenshe/hsinspect/-/blob/wip/ghc882/library/HsInspect/Im... BTW, if you can point me to something in ghc that does inferModuleName then that'd be useful in several places.

Quick followup with trace debugging:
On Thu, 27 Feb 2020 at 14:50, Tseen She
typecheckModule will just use the ParsedSource parseModule produced so it shouldn't do any more file reading.
This is where I am not so sure. I am fairly certain that something has broken in the typechecking line. I will try again with debug tracing to confirm what the ParsedSource looks like.
I can confirm that a showPpr on `pm_parsed_source` has the in-memory version. But, perhaps, the problem might be that pragmas are not carried over? In my test the "importsOnly" version of the file is preserving two LANGUAGE pragmas that are necessary in order to parse the imports section (one of my tests uses PackageImports). Just a guess, but does 8.8.x now require more manual passing of the dynflags? This seems like a regression / bug to me but I'd be willing to workaround it if it is an intentional behaviour change.

and it looks like the language pragmas slurped from the in-memory buffer is
empty
tmod <- GHC.typecheckModule (trace (showGhc . GHC.extensions $
GHC.ms_hspp_opts modSum) pmod)
gives me [] on 8.8 but is [Off ImplicitPrelude, On PackageImports] on
8.6.5... that looks like an 8.8 regression to me.
On Thu, 27 Feb 2020 at 15:02, Tseen She
Quick followup with trace debugging:
On Thu, 27 Feb 2020 at 14:50, Tseen She
wrote: typecheckModule will just use the ParsedSource parseModule produced so it shouldn't do any more file reading.
This is where I am not so sure. I am fairly certain that something has broken in the typechecking line. I will try again with debug tracing to confirm what the ParsedSource looks like.
I can confirm that a showPpr on `pm_parsed_source` has the in-memory version.
But, perhaps, the problem might be that pragmas are not carried over? In my test the "importsOnly" version of the file is preserving two LANGUAGE pragmas that are necessary in order to parse the imports section (one of my tests uses PackageImports). Just a guess, but does 8.8.x now require more manual passing of the dynflags? This seems like a regression / bug to me but I'd be willing to workaround it if it is an intentional behaviour change.

Sorry for the spam, but I think this is definitely a bug/regression in
ghc-8.8.1 (still present in 8.8.3).
I don't know which commit introduced the problem, but it seems that
getModSummary is no longer reporting the correct ms_hspp_opts, at least for
an in-memory file but it could also be for a file on disk as well (I
haven't excluded that as a possibility).
Here is my workaround
#if MIN_VERSION_GLASGOW_HASKELL(8,8,1,0)
dflags <- GHC.getSessionDynFlags
let file = GHC.ms_hspp_file modSum
buf <- case GHC.ms_hspp_buf modSum of
Nothing -> liftIO $ hGetStringBuffer file
Just b -> pure b
let pragmas = getOptions dflags buf file
(dflags', _, _) <- parseDynamicFilePragma dflags pragmas
let modSum' = modSum { GHC.ms_hspp_opts = dflags' }
#else
let modSum' = modSum
#endif
On Thu, 27 Feb 2020 at 15:18, Tseen She
and it looks like the language pragmas slurped from the in-memory buffer is empty
tmod <- GHC.typecheckModule (trace (showGhc . GHC.extensions $ GHC.ms_hspp_opts modSum) pmod)
gives me [] on 8.8 but is [Off ImplicitPrelude, On PackageImports] on 8.6.5... that looks like an 8.8 regression to me.
On Thu, 27 Feb 2020 at 15:02, Tseen She
wrote: Quick followup with trace debugging:
On Thu, 27 Feb 2020 at 14:50, Tseen She
wrote: typecheckModule will just use the ParsedSource parseModule produced so it shouldn't do any more file reading.
This is where I am not so sure. I am fairly certain that something has broken in the typechecking line. I will try again with debug tracing to confirm what the ParsedSource looks like.
I can confirm that a showPpr on `pm_parsed_source` has the in-memory version.
But, perhaps, the problem might be that pragmas are not carried over? In my test the "importsOnly" version of the file is preserving two LANGUAGE pragmas that are necessary in order to parse the imports section (one of my tests uses PackageImports). Just a guess, but does 8.8.x now require more manual passing of the dynflags? This seems like a regression / bug to me but I'd be willing to workaround it if it is an intentional behaviour change.

Hi, On Thu, Feb 27, 2020 at 03:43:35PM +0000, Tseen She wrote:
Sorry for the spam, but I think this is definitely a bug/regression in ghc-8.8.1 (still present in 8.8.3).
No worries
I don't know which commit introduced the problem, but it seems that getModSummary is no longer reporting the correct ms_hspp_opts, at least for an in-memory file but it could also be for a file on disk as well (I haven't excluded that as a possibility).
On a quick testcase I cannot reproduce this behaviour: -- $ ghc -package ghc -package ghc-paths TargetContents.hs module Main where import GHC import GHC.Paths (libdir) import MonadUtils import DynFlags import StringBuffer import Data.Time.Clock main :: IO () main = do defaultErrorHandler defaultFatalMessager defaultFlushOut $ do runGhc (Just libdir) $ do dflags0 <- getSessionDynFlags (dflags1, _, _) <- parseDynamicFlags dflags0 $ map noLoc ["-package", "base"] _ <- setSessionDynFlags dflags1 t <- liftIO getCurrentTime setTargets [Target (TargetFile "Main.hs" Nothing) False (Just (stringToStringBuffer buffer, t)) ] _ <- depanal [] False ms <- getModSummary (mkModuleName "Main") pm <- parseModule ms liftIO $ print $ extensions $ ms_hspp_opts ms _ <- typecheckModule pm return () buffer = "{-# LANGUAGE PackageImports #-}\nimport \"base\" Data.List\nmain = return ()" Running it it prints the PackageImports ext from the in-memory buffer just fine: $ ghc-8.8.1 -package ghc -package ghc-paths TargetContents.hs $ ./TargetContents [On PackageImports] One change in behaviour to note is that when doing this with <8.8 we first need to create the Main.hs file so GHC doesn't complain about it missing. Maybe you can fiddle with the test case until it reflects what you're doing? --Daniel

Bingo!
setTargets [Target (TargetFile "Main.hs" (Just $ Hsc HsSrcFile))
False (Just (stringToStringBuffer buffer, t))]
i.e. an explicit phase.
I will just use Nothing in my TargetFile. Was this an intended change?
On Thu, 27 Feb 2020 at 18:13, Daniel Gröber
Hi,
On Thu, Feb 27, 2020 at 03:43:35PM +0000, Tseen She wrote:
Sorry for the spam, but I think this is definitely a bug/regression in ghc-8.8.1 (still present in 8.8.3).
No worries
I don't know which commit introduced the problem, but it seems that getModSummary is no longer reporting the correct ms_hspp_opts, at least for an in-memory file but it could also be for a file on disk as well (I haven't excluded that as a possibility).
On a quick testcase I cannot reproduce this behaviour:
-- $ ghc -package ghc -package ghc-paths TargetContents.hs module Main where
import GHC import GHC.Paths (libdir) import MonadUtils import DynFlags import StringBuffer import Data.Time.Clock
main :: IO () main = do defaultErrorHandler defaultFatalMessager defaultFlushOut $ do runGhc (Just libdir) $ do dflags0 <- getSessionDynFlags (dflags1, _, _) <- parseDynamicFlags dflags0 $ map noLoc ["-package", "base"] _ <- setSessionDynFlags dflags1
t <- liftIO getCurrentTime setTargets [Target (TargetFile "Main.hs" Nothing) False (Just (stringToStringBuffer buffer, t)) ]
_ <- depanal [] False
ms <- getModSummary (mkModuleName "Main") pm <- parseModule ms
liftIO $ print $ extensions $ ms_hspp_opts ms
_ <- typecheckModule pm
return ()
buffer = "{-# LANGUAGE PackageImports #-}\nimport \"base\" Data.List\nmain = return ()"
Running it it prints the PackageImports ext from the in-memory buffer just fine:
$ ghc-8.8.1 -package ghc -package ghc-paths TargetContents.hs $ ./TargetContents [On PackageImports]
One change in behaviour to note is that when doing this with <8.8 we first need to create the Main.hs file so GHC doesn't complain about it missing.
Maybe you can fiddle with the test case until it reflects what you're doing?
--Daniel

Additionally, now that importsOnly is working for me on 8.8.x I can use
lookupModule on *that* instead of the original module file, and then I can
make use of the newly exposed modInfoRdrEnv thus answering my second
thread's question too.
It is, however, a shame that lookupModule seems to require the entire file
to parse / typecheck. That also smells like a regression, but not one that
impacts me anymore.
On Thu, 27 Feb 2020 at 20:37, Tseen She
Bingo!
setTargets [Target (TargetFile "Main.hs" (Just $ Hsc HsSrcFile)) False (Just (stringToStringBuffer buffer, t))]
i.e. an explicit phase.
I will just use Nothing in my TargetFile. Was this an intended change?
On Thu, 27 Feb 2020 at 18:13, Daniel Gröber
wrote: Hi,
On Thu, Feb 27, 2020 at 03:43:35PM +0000, Tseen She wrote:
Sorry for the spam, but I think this is definitely a bug/regression in ghc-8.8.1 (still present in 8.8.3).
No worries
I don't know which commit introduced the problem, but it seems that getModSummary is no longer reporting the correct ms_hspp_opts, at least for an in-memory file but it could also be for a file on disk as well (I haven't excluded that as a possibility).
On a quick testcase I cannot reproduce this behaviour:
-- $ ghc -package ghc -package ghc-paths TargetContents.hs module Main where
import GHC import GHC.Paths (libdir) import MonadUtils import DynFlags import StringBuffer import Data.Time.Clock
main :: IO () main = do defaultErrorHandler defaultFatalMessager defaultFlushOut $ do runGhc (Just libdir) $ do dflags0 <- getSessionDynFlags (dflags1, _, _) <- parseDynamicFlags dflags0 $ map noLoc ["-package", "base"] _ <- setSessionDynFlags dflags1
t <- liftIO getCurrentTime setTargets [Target (TargetFile "Main.hs" Nothing) False (Just (stringToStringBuffer buffer, t)) ]
_ <- depanal [] False
ms <- getModSummary (mkModuleName "Main") pm <- parseModule ms
liftIO $ print $ extensions $ ms_hspp_opts ms
_ <- typecheckModule pm
return ()
buffer = "{-# LANGUAGE PackageImports #-}\nimport \"base\" Data.List\nmain = return ()"
Running it it prints the PackageImports ext from the in-memory buffer just fine:
$ ghc-8.8.1 -package ghc -package ghc-paths TargetContents.hs $ ./TargetContents [On PackageImports]
One change in behaviour to note is that when doing this with <8.8 we first need to create the Main.hs file so GHC doesn't complain about it missing.
Maybe you can fiddle with the test case until it reflects what you're doing?
--Daniel

Hi, On Thu, Feb 27, 2020 at 08:37:12PM +0000, Tseen She wrote:
setTargets [Target (TargetFile "Main.hs" (Just $ Hsc HsSrcFile)) False (Just (stringToStringBuffer buffer, t))]
i.e. an explicit phase.
I will just use Nothing in my TargetFile. Was this an intended change?
I found the reason for the change, my commit 0f9ec9d1ff ("Allow using tagetContents for modules needing preprocessing") removes the special casing in `preprocessFile` for in-memory buffers: -preprocessFile hsc_env src_fn mb_phase (Just (buf, _time)) - = do - let dflags = hsc_dflags hsc_env - let local_opts = getOptions dflags buf src_fn - - (dflags', leftovers, warns) - <- parseDynamicFilePragma dflags local_opts It used to do a `parseDynamicFilePragma` which parses the LANGUAGE and OPTIONS pragmas. I think this change is actually for the better though, as this dflags modification does not occur with a regular file, so we're actually being more consistent. I just tested this out and when setting the phase but not passing a buffer 8.6 will also fail to get the pragma in the ModSummary. When not giving an explicit phase it works for 8.6, 8.8, with and without in-memory buffers though. So I think that is the proper solution here, unless you can think of a reason running the rest of the pipeline is a problem in this case? On Thu, Feb 27, 2020 at 08:49:43PM +0000, Tseen She wrote:
It is, however, a shame that lookupModule seems to require the entire file to parse / typecheck. That also smells like a regression, but not one that impacts me anymore.
I would still like to reproduce your problem but I'm again not sure how you're triggering it, this is what I have so far: -- $ ghc -package ghc -package ghc-paths FindModule.hs module Main where import GHC import GHC.Paths (libdir) import MonadUtils import DynFlags import Module main :: IO () main = do defaultErrorHandler defaultFatalMessager defaultFlushOut $ do runGhc (Just libdir) $ do dflags0 <- getSessionDynFlags (dflags1, _, _) <- parseDynamicFlags dflags0 $ map noLoc ["-package", "base"] _ <- setSessionDynFlags dflags1 setTargets [Target (TargetFile "Main.hs" Nothing) False Nothing] _ <- load LoadAllTargets m <- findModule (mkModuleName "Main") Nothing liftIO $ print $ (unitIdFS (moduleUnitId m), moduleNameFS (moduleName m)) Running it with 8.8 and 8.4 i get the same results: $ ghc-$ver -package ghc -package ghc-paths FindModule.hs $ echo 'main = return()' > Main.hs # a working file $ ./FindModule ("main","Main") $ echo '=' > Main.hs # a syntax error $ ./FindModule Main.hs:1:1: error: parse error on input ‘=’ Perhaps you need a 'let' in a 'do' block? e.g. 'let x = 5' instead of 'x = 5' | 1 | = | ^ <command line>: module is not loaded: ‘Main’ (Main.hs) so this seems to be consistent? --Daniel

Thanks you Daniel, that explains the mystery of the in-memory buffer.
With regards to findModule / lookupModule, it seems that the semantics have
not changed. This is perhaps an artefact of the previous issue: i.e. the
lookupModule was likely loading the disk version (not the in-memory)
version of a file.
If I discover anything else, I will create a new thread. I will also let
the group know when I cut a release, because this has been very helpful and
I would like to get a wider audience looking at the code to give me
recommendations for improvements (and especially to avoid reinventing the
wheel of what is already inside ghc).
On Thu, 27 Feb 2020 at 23:21, Daniel Gröber
Hi,
On Thu, Feb 27, 2020 at 08:37:12PM +0000, Tseen She wrote:
setTargets [Target (TargetFile "Main.hs" (Just $ Hsc HsSrcFile)) False (Just (stringToStringBuffer buffer, t))]
i.e. an explicit phase.
I will just use Nothing in my TargetFile. Was this an intended change?
I found the reason for the change, my commit 0f9ec9d1ff ("Allow using tagetContents for modules needing preprocessing") removes the special casing in `preprocessFile` for in-memory buffers:
-preprocessFile hsc_env src_fn mb_phase (Just (buf, _time)) - = do - let dflags = hsc_dflags hsc_env - let local_opts = getOptions dflags buf src_fn - - (dflags', leftovers, warns) - <- parseDynamicFilePragma dflags local_opts
It used to do a `parseDynamicFilePragma` which parses the LANGUAGE and OPTIONS pragmas.
I think this change is actually for the better though, as this dflags modification does not occur with a regular file, so we're actually being more consistent.
I just tested this out and when setting the phase but not passing a buffer 8.6 will also fail to get the pragma in the ModSummary. When not giving an explicit phase it works for 8.6, 8.8, with and without in-memory buffers though. So I think that is the proper solution here, unless you can think of a reason running the rest of the pipeline is a problem in this case?
It is, however, a shame that lookupModule seems to require the entire file to parse / typecheck. That also smells like a regression, but not one
On Thu, Feb 27, 2020 at 08:49:43PM +0000, Tseen She wrote: that
impacts me anymore.
I would still like to reproduce your problem but I'm again not sure how you're triggering it, this is what I have so far:
-- $ ghc -package ghc -package ghc-paths FindModule.hs module Main where
import GHC import GHC.Paths (libdir) import MonadUtils import DynFlags import Module
main :: IO () main = do defaultErrorHandler defaultFatalMessager defaultFlushOut $ do runGhc (Just libdir) $ do dflags0 <- getSessionDynFlags (dflags1, _, _) <- parseDynamicFlags dflags0 $ map noLoc ["-package", "base"] _ <- setSessionDynFlags dflags1
setTargets [Target (TargetFile "Main.hs" Nothing) False Nothing]
_ <- load LoadAllTargets
m <- findModule (mkModuleName "Main") Nothing liftIO $ print $ (unitIdFS (moduleUnitId m), moduleNameFS (moduleName m))
Running it with 8.8 and 8.4 i get the same results:
$ ghc-$ver -package ghc -package ghc-paths FindModule.hs $ echo 'main = return()' > Main.hs # a working file $ ./FindModule ("main","Main")
$ echo '=' > Main.hs # a syntax error $ ./FindModule Main.hs:1:1: error: parse error on input ‘=’ Perhaps you need a 'let' in a 'do' block? e.g. 'let x = 5' instead of 'x = 5' | 1 | = | ^ <command line>: module is not loaded: ‘Main’ (Main.hs)
so this seems to be consistent?
--Daniel
participants (2)
-
Daniel Gröber
-
Tseen She