
However, the GHC API doesn't provide a way to do this directly (I hadn't really thought about this when I suggested the idea before, sorry). The GHC API provides support for compiling multiple modules in the way that GHCi and --make work; each module is added to the HPT as it is compiled. But when compiling single modules, GHC doesn't normally use the HPT - interfaces for modules in the home package are normally demand-loaded in the same way as interfaces for package modules, and added to the PIT. The crucial difference between the HPT and the PIT is that the PIT supports demand-loading of interfaces, but the HPT is supposed to be populated in the right order by the compilation manager - home package modules are assumed to be present in the HPT when they are required.
Yah, that's what I don't understand about HscEnv. The HPT doc says that in one-shot mode, the HPT is empty and even local modules are demand-cached in the ExternalPackageState (which the PIT belongs to). And the EPS doc itself reinforces that where it says in one-shot mode "home-package modules accumulate in the external package state". So why not just ignore the HPT, and run multiple "one-shot" compiles, and let all the info accumulate in the PIT? A fair amount of work in GhcMake is concerned with trimming old data out of the HPT, I assume this is for ghci that wants to reload changed modules but keep unchanged ones. I don't actually care about that since I can assume the modules will be unchanged over one run. So I tried just calling compileFile multiple times in the same GhcMonad, assuming the mutable bits of the HscEnv get updated appropriately. Here are the results for a build of about 200 modules: with persistent server: no link: 3.30s user 1.60s system 12% cpu 38.323 total 3.50s user 1.66s system 13% cpu 38.368 total link: 21.66s user 4.13s system 35% cpu 1:11.62 total 21.59s user 4.54s system 38% cpu 1:08.13 total 21.82s user 4.70s system 35% cpu 1:14.56 total without server (ghc -c): no link: 109.25s user 19.90s system 240% cpu 53.750 total 109.11s user 19.23s system 243% cpu 52.794 total link: 128.10s user 21.66s system 201% cpu 1:14.29 total ghc --make (with linking since I can't turn that off): 42.57s user 5.83s system 74% cpu 1:05.15 total The 'user' is low for the server because it doesn't count time spent by the subprocesses on the other end of the socket, but excluding linking it looks like I can shave about 25% off compile time. Unfortunately it winds up being just about the same as ghc --make, so it seems too low. Perhaps I should be using the HPT? I'm also falling back to plain ghc for linking, maybe --make can link faster when it has everything cached? I guess it shouldn't, because it presumably just dispatches to ld.