[GHC] #13299: Typecheck multiple modules at the same time

#13299: Typecheck multiple modules at the same time -------------------------------------+------------------------------------- Reporter: ezyang | Owner: (none) Type: feature | Status: new request | Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 (Type checker) | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- angerman asked me to outline how one might go about fixing #1409 (mutually recursive modules without hs-boot). Here is the most recent plan based on #10681 and discussion with SPJ. **The general approach.** Traditionally, users write hs-boot files, which compile to hi-boot files that are subsequently used for compilation. The approach that SPJ and I would like to take is to replace this step with a new one that generates hi-boot files from hs files. Everything else otherwise stays the same. **More details.** Let's suppose we have A.hs and B.hs which import each other, A imports B using a `SOURCE` import, but no B.hs-boot is defined. We ask GHC to typecheck A.hs and B.hs together to produce hi-boot files for each of the modules. To implement this, we need both a new major mode for this operation (similar to `ghc -c`); and GhcMake needs to be adjusted to call this step on every SCC in the import graph, when one or more modules in the import graph do not have an hs-boot file. This part of the implementation is a bit annoying and was what thwarted me when I've made some stabs at this issue in the past. Probably the easiest thing to do initially is to fix up GhcMake to call your new frontend (you'll put it in `HscMain`) on every SCC. An easy way to check progress here is to get `ghc --make` to print out SCCs before it starts compiling them. GHC needs to learn how to typecheck multiple modules at the same time. Let's talk a little bit about how typechecking works today: by the time we are at `HscMain` we generally have a `ModSummary` per source module to be compiled. You pass the ModSummary to something like `tcRnModule` and you get back out a `TcGblEnv` containing the results of typechecking. Look at `hscIncrementalCompile`: if you're compiling a module proper, we desugar and optimize it properly (`finish`) and then create an interface for it; if we're only typechecking (`finishTypecheckOnly`) we go straight to generating the interface file after checking. All of these functions assume, of course, that only one module is being typechecked at a time. So you must break this assumption. This comes in multiple steps. First, the actual typechecking, `tcRnModule` needs to be adjusted. Notice `tcRnModule` takes a single `HsParsedModule`; now you need to feed it multiple parsed modules. You probably want a new function for this? What should this function look like? Trace further into `initTc`: notice that the `TcGblEnv` structure assumes that there is only one module being compiled at a time `tcg_mod` (and other modules). So this assumption needs to be broken. Now, trace into the main body of typechecking `tcRnModuleTcRnM`. Normally the way we go about doing things is we rename imports, and then we rename and typecheck declarations. Clearly each of your parsed modules needs to have their imports resolved separately; furthermore, they might import each other. This needs to be made to work. I think this will have to be totally reimplemented, because you are going to have to deal with cases like: {{{ module A(T) where import B(T) module B(T) where import A(T) }}} This is obviously nonsense and your algorithm needs to identify this and kill it. Once you've done this you should have a separate `tcg_rdr_env` for each of the parsed modules. `tcRnImports` sets up a pile of other variables in `TcGblEnv` too (see bottom) and I'm not sure what should be done with those. Now we need to rename and typecheck the top-level declarations. Renaming of imported entities should proceed straightforwardly because you set up the GlobalRdrEnv correctly, but you need to give the correct module to each of the top-level declarations. Maybe assume no Template Haskell for now because I have no idea how that's supposed to work. The crux of the matter, though, is that once you've renamed all of the declarations, you now need to compute SCCs over ALL of the modules, because how else are you going to typecheck two mutually recursive declarations over two modules. At this point the one-module assumption of TcGblEnv shouldn't be a problem anymore because when we're dealing with renamed source everything knows its name. There's a little bit more about the export list (`tcRnExports`) but you've probably already handled this to handle the recursive imports correctly. Finally, what you'll get out in the end is a big pile of types from DIFFERENT modules `tcg_type_env` (and all of the other things: instances, etc. Though, I guess if an instance is defined in one module of a recursive module loop, it should be in scope everywhere?!) So now in the final stage, serializing to interface files, we need to disentangle everything and put the declarations for each module into a separate interface file per module. Maybe best to have kept them separate to begin with. To conclude, adding support for typechecking multiple modules at once will probably involve rewriting large swathes of the renamer and top-level typechecking driver, but everything past that should basically be unchanged. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13299 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13299: Typecheck multiple modules at the same time -------------------------------------+------------------------------------- Reporter: ezyang | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler (Type | Version: 8.0.1 checker) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
The approach that SPJ and I would like to take is to replace this step with a new one that generates hi-boot files from hs files.
Really? Everything that follows appears to follow a different plan: compile the modules of a mutually recursive group of modules all together, as if they were one big module. Which makes sense. I don't get the bit about generating hi-boot files. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13299#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13299: Typecheck multiple modules at the same time -------------------------------------+------------------------------------- Reporter: ezyang | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler (Type | Version: 8.0.1 checker) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Old description:
angerman asked me to outline how one might go about fixing #1409 (mutually recursive modules without hs-boot). Here is the most recent plan based on #10681 and discussion with SPJ.
**The general approach.** Traditionally, users write hs-boot files, which compile to hi-boot files that are subsequently used for compilation. The approach that SPJ and I would like to take is to replace this step with a new one that generates hi-boot files from hs files. Everything else otherwise stays the same.
**More details.** Let's suppose we have A.hs and B.hs which import each other, A imports B using a `SOURCE` import, but no B.hs-boot is defined.
We ask GHC to typecheck A.hs and B.hs together to produce hi-boot files for each of the modules. To implement this, we need both a new major mode for this operation (similar to `ghc -c`); and GhcMake needs to be adjusted to call this step on every SCC in the import graph, when one or more modules in the import graph do not have an hs-boot file. This part of the implementation is a bit annoying and was what thwarted me when I've made some stabs at this issue in the past. Probably the easiest thing to do initially is to fix up GhcMake to call your new frontend (you'll put it in `HscMain`) on every SCC. An easy way to check progress here is to get `ghc --make` to print out SCCs before it starts compiling them.
GHC needs to learn how to typecheck multiple modules at the same time. Let's talk a little bit about how typechecking works today: by the time we are at `HscMain` we generally have a `ModSummary` per source module to be compiled. You pass the ModSummary to something like `tcRnModule` and you get back out a `TcGblEnv` containing the results of typechecking. Look at `hscIncrementalCompile`: if you're compiling a module proper, we desugar and optimize it properly (`finish`) and then create an interface for it; if we're only typechecking (`finishTypecheckOnly`) we go straight to generating the interface file after checking.
All of these functions assume, of course, that only one module is being typechecked at a time. So you must break this assumption. This comes in multiple steps. First, the actual typechecking, `tcRnModule` needs to be adjusted. Notice `tcRnModule` takes a single `HsParsedModule`; now you need to feed it multiple parsed modules. You probably want a new function for this? What should this function look like? Trace further into `initTc`: notice that the `TcGblEnv` structure assumes that there is only one module being compiled at a time `tcg_mod` (and other modules). So this assumption needs to be broken.
Now, trace into the main body of typechecking `tcRnModuleTcRnM`. Normally the way we go about doing things is we rename imports, and then we rename and typecheck declarations. Clearly each of your parsed modules needs to have their imports resolved separately; furthermore, they might import each other. This needs to be made to work. I think this will have to be totally reimplemented, because you are going to have to deal with cases like:
{{{ module A(T) where import B(T)
module B(T) where import A(T) }}}
This is obviously nonsense and your algorithm needs to identify this and kill it. Once you've done this you should have a separate `tcg_rdr_env` for each of the parsed modules. `tcRnImports` sets up a pile of other variables in `TcGblEnv` too (see bottom) and I'm not sure what should be done with those.
Now we need to rename and typecheck the top-level declarations. Renaming of imported entities should proceed straightforwardly because you set up the GlobalRdrEnv correctly, but you need to give the correct module to each of the top-level declarations. Maybe assume no Template Haskell for now because I have no idea how that's supposed to work. The crux of the matter, though, is that once you've renamed all of the declarations, you now need to compute SCCs over ALL of the modules, because how else are you going to typecheck two mutually recursive declarations over two modules. At this point the one-module assumption of TcGblEnv shouldn't be a problem anymore because when we're dealing with renamed source everything knows its name.
There's a little bit more about the export list (`tcRnExports`) but you've probably already handled this to handle the recursive imports correctly.
Finally, what you'll get out in the end is a big pile of types from DIFFERENT modules `tcg_type_env` (and all of the other things: instances, etc. Though, I guess if an instance is defined in one module of a recursive module loop, it should be in scope everywhere?!) So now in the final stage, serializing to interface files, we need to disentangle everything and put the declarations for each module into a separate interface file per module. Maybe best to have kept them separate to begin with.
To conclude, adding support for typechecking multiple modules at once will probably involve rewriting large swathes of the renamer and top- level typechecking driver, but everything past that should basically be unchanged.
New description: angerman asked me to outline how one might go about fixing #1409 (mutually recursive modules without hs-boot). Here is the most recent plan based on #10681 and discussion with SPJ. **The general approach.** Traditionally, users write hs-boot files, which compile to hi-boot files that are subsequently used for compilation. The approach that SPJ and I would like to take is to replace this step with a new one that typechecks all of the hs files at once. **More details.** Let's suppose we have A.hs and B.hs which import each other, A imports B using a `SOURCE` import, but no B.hs-boot is defined. We ask GHC to typecheck A.hs and B.hs together to produce hi-boot files for each of the modules. To implement this, we need both a new major mode for this operation (similar to `ghc -c`); and GhcMake needs to be adjusted to call this step on every SCC in the import graph, when one or more modules in the import graph do not have an hs-boot file. This part of the implementation is a bit annoying and was what thwarted me when I've made some stabs at this issue in the past. Probably the easiest thing to do initially is to fix up GhcMake to call your new frontend (you'll put it in `HscMain`) on every SCC. An easy way to check progress here is to get `ghc --make` to print out SCCs before it starts compiling them. GHC needs to learn how to typecheck multiple modules at the same time. Let's talk a little bit about how typechecking works today: by the time we are at `HscMain` we generally have a `ModSummary` per source module to be compiled. You pass the ModSummary to something like `tcRnModule` and you get back out a `TcGblEnv` containing the results of typechecking. Look at `hscIncrementalCompile`: if you're compiling a module proper, we desugar and optimize it properly (`finish`) and then create an interface for it; if we're only typechecking (`finishTypecheckOnly`) we go straight to generating the interface file after checking. All of these functions assume, of course, that only one module is being typechecked at a time. So you must break this assumption. This comes in multiple steps. First, the actual typechecking, `tcRnModule` needs to be adjusted. Notice `tcRnModule` takes a single `HsParsedModule`; now you need to feed it multiple parsed modules. You probably want a new function for this? What should this function look like? Trace further into `initTc`: notice that the `TcGblEnv` structure assumes that there is only one module being compiled at a time `tcg_mod` (and other modules). So this assumption needs to be broken. Now, trace into the main body of typechecking `tcRnModuleTcRnM`. Normally the way we go about doing things is we rename imports, and then we rename and typecheck declarations. Clearly each of your parsed modules needs to have their imports resolved separately; furthermore, they might import each other. This needs to be made to work. I think this will have to be totally reimplemented, because you are going to have to deal with cases like: {{{ module A(T) where import B(T) module B(T) where import A(T) }}} This is obviously nonsense and your algorithm needs to identify this and kill it. Once you've done this you should have a separate `tcg_rdr_env` for each of the parsed modules. `tcRnImports` sets up a pile of other variables in `TcGblEnv` too (see bottom) and I'm not sure what should be done with those. Now we need to rename and typecheck the top-level declarations. Renaming of imported entities should proceed straightforwardly because you set up the GlobalRdrEnv correctly, but you need to give the correct module to each of the top-level declarations. Maybe assume no Template Haskell for now because I have no idea how that's supposed to work. The crux of the matter, though, is that once you've renamed all of the declarations, you now need to compute SCCs over ALL of the modules, because how else are you going to typecheck two mutually recursive declarations over two modules. At this point the one-module assumption of TcGblEnv shouldn't be a problem anymore because when we're dealing with renamed source everything knows its name. There's a little bit more about the export list (`tcRnExports`) but you've probably already handled this to handle the recursive imports correctly. Finally, what you'll get out in the end is a big pile of types from DIFFERENT modules `tcg_type_env` (and all of the other things: instances, etc. Though, I guess if an instance is defined in one module of a recursive module loop, it should be in scope everywhere?!) So now in the final stage, serializing to interface files, we need to disentangle everything and put the declarations for each module into a separate interface file per module. Maybe best to have kept them separate to begin with. To conclude, adding support for typechecking multiple modules at once will probably involve rewriting large swathes of the renamer and top-level typechecking driver, but everything past that should basically be unchanged. -- Comment (by ezyang): Sorry, that sentence wasn't worded very clearly. The distinction is when we actually run the optimizer. The "obvious" thing to do is, after typechecking all of the modules in the loop together, is to optimize all of the modules together. However, an alternate strategy, that gets you more separate compilation, is to stop, emit an hi-boot file for each module you typechecked, and then do another pass compiling the modules in order (using the hi-boot interfaces to break loops, as is the case today.) The benefit of this strategy is that, while you have to retypecheck every module in the loop any time you edit any module, you don't necessarily have to recompile every module. It also saves you from having to teach the optimizer how to optimize multiple modules at the same time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13299#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13299: Typecheck multiple modules at the same time -------------------------------------+------------------------------------- Reporter: ezyang | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler (Type | Version: 8.0.1 checker) | Resolution: | Keywords: hs-boot Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by duog): * keywords: => hs-boot -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13299#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC