
I recently switched from ghc --make to a parallelized build system. I was looking forward to faster builds, and while they are much faster at figuring out what has to be rebuilt (which is most of the time for a small rebuild, since ld dominates), compilation of the whole system is either the same or slightly slower than the single threaded ghc --make version. My guess is that the overhead of starting up lots of individual ghcs, each of which has to read all the .hi files all over again, just about cancels out the parallelism gains. So one way around that would be parallelizing --make, which has been a TODO for a long time. However, I believe that's never going to be satisfactory for a project involving various different languages, because ghc itself is never going to be a general purpose build system. So ghc --make provides two things: a dependency chaser and a way to keep the compiler resident as it compiles new files. Since the dependency chaser will never be as powerful as a real build system, it occurs to me that the only reasonable way forward is to split out the second part, by adding an --interactive flag to ghc. It would then read filenames on stdin, compiling each one in turn, only exiting when it sees EOF. Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in replacement for ghc. It would be nice if ghc could atomically read one line from the input, then you could just start a bunch of ghcs behind a named pipe and each would steal its own work. But I don't think that's possible with unix pipes, and of course there are still a few non-unix systems out there. And I guess ghc-fe has to wait for the compilation to finish, so I guess ghc has to print a status line when it completes (or fails) a module. But it can still be done with an external distributor program that acts like a server: starts up n ghcs, distributes src files between them, and shuts them down then given the command: data Ghc = Ghc { status :: Free|Busy, in :: Handle, out :: Handle, pid :: Int } main = do origFlags <- getArgs ghcs <- mapM (startup origFlags) [0..cpus] socket <- accept while $ read socket >>= \case of Quit -> return False Compile ghcFlags src -> forkIO $ assert $ ghcFlags == origFlags result <- bracket (findFreeAndMarkBusy ghcs) markFree $ \ghc -> do tellGhc ghc src readResult ghc write socket result return True mapM_ shutdown ghcs The ghc-fe then starts a distributor if one is not running, sends a src file and waits for the response, acting like a drop-in replacement for the ghc cmdline. Build systems just call ghc-fe and have an extra responsibility to call ghc-fe --quit when they are done. And I guess if they know how many files they want to rebuild, it won't be worth it below a certain threshold. So I'm wondering, does this seem reasonable and feasible? Is there a better way to do it? Even if it could be done, would it be worth it? If the answers are "yes", "maybe not", and "maybe yes", then how hard would this be to do and where should I start looking? I'm assuming start at GhcMake.hs and work outwards from there... I'm not entirely sure it would be worth it to me even if it did make full builds, say 1.5x faster for my dual core i5, but it's interesting to think about all the same.