parallelizing ghc

older
Removal of #include <HsFFI.h> from...

Evan Laforge

24 Jan 2012 24 Jan '12

3:53 a.m.

I recently switched from ghc --make to a parallelized build system. I was looking forward to faster builds, and while they are much faster at figuring out what has to be rebuilt (which is most of the time for a small rebuild, since ld dominates), compilation of the whole system is either the same or slightly slower than the single threaded ghc --make version. My guess is that the overhead of starting up lots of individual ghcs, each of which has to read all the .hi files all over again, just about cancels out the parallelism gains. So one way around that would be parallelizing --make, which has been a TODO for a long time. However, I believe that's never going to be satisfactory for a project involving various different languages, because ghc itself is never going to be a general purpose build system. So ghc --make provides two things: a dependency chaser and a way to keep the compiler resident as it compiles new files. Since the dependency chaser will never be as powerful as a real build system, it occurs to me that the only reasonable way forward is to split out the second part, by adding an --interactive flag to ghc. It would then read filenames on stdin, compiling each one in turn, only exiting when it sees EOF. Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in replacement for ghc. It would be nice if ghc could atomically read one line from the input, then you could just start a bunch of ghcs behind a named pipe and each would steal its own work. But I don't think that's possible with unix pipes, and of course there are still a few non-unix systems out there. And I guess ghc-fe has to wait for the compilation to finish, so I guess ghc has to print a status line when it completes (or fails) a module. But it can still be done with an external distributor program that acts like a server: starts up n ghcs, distributes src files between them, and shuts them down then given the command: data Ghc = Ghc { status :: Free|Busy, in :: Handle, out :: Handle, pid :: Int } main = do origFlags <- getArgs ghcs <- mapM (startup origFlags) [0..cpus] socket <- accept while $ read socket >>= \case of Quit -> return False Compile ghcFlags src -> forkIO $ assert $ ghcFlags == origFlags result <- bracket (findFreeAndMarkBusy ghcs) markFree $ \ghc -> do tellGhc ghc src readResult ghc write socket result return True mapM_ shutdown ghcs The ghc-fe then starts a distributor if one is not running, sends a src file and waits for the response, acting like a drop-in replacement for the ghc cmdline. Build systems just call ghc-fe and have an extra responsibility to call ghc-fe --quit when they are done. And I guess if they know how many files they want to rebuild, it won't be worth it below a certain threshold. So I'm wondering, does this seem reasonable and feasible? Is there a better way to do it? Even if it could be done, would it be worth it? If the answers are "yes", "maybe not", and "maybe yes", then how hard would this be to do and where should I start looking? I'm assuming start at GhcMake.hs and work outwards from there... I'm not entirely sure it would be worth it to me even if it did make full builds, say 1.5x faster for my dual core i5, but it's interesting to think about all the same.

Show replies by date

Mikhail Glushenkov

24 Jan 24 Jan

9:45 a.m.

Hi, On Tue, Jan 24, 2012 at 4:53 AM, Evan Laforge wrote:

...

[...]

So ghc --make provides two things: a dependency chaser and a way to keep the compiler resident as it compiles new files. Since the dependency chaser will never be as powerful as a real build system, it occurs to me that the only reasonable way forward is to split out the second part, by adding an --interactive flag to ghc. It would then read filenames on stdin, compiling each one in turn, only exiting when it sees EOF.

There is in fact an '--interactive' flag already, 'ghc --interactive' is a synonym for 'ghci'.

...

So I'm wondering, does this seem reasonable and feasible? Is there a better way to do it? Even if it could be done, would it be worth it? If the answers are "yes", "maybe not", and "maybe yes", then how hard would this be to do and where should I start looking? I'm assuming start at GhcMake.hs and work outwards from there...

I'm also interested in a "build server" mode for ghc. I have written a parallel wrapper for 'ghc --make' [1], but the speed gains are not as impressive [2] as I hoped because of the duplicated work. [1] https://github.com/23Skidoo/ghc-parmake [2] https://gist.github.com/1360470 -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Mikhail Glushenkov

12:45 p.m.

Hi, On Tue, Jan 24, 2012 at 4:53 AM, Evan Laforge wrote:

...

So ghc --make provides two things: a dependency chaser and a way to keep the compiler resident as it compiles new files. Since the dependency chaser will never be as powerful as a real build system, it occurs to me that the only reasonable way forward is to split out the second part, by adding an --interactive flag to ghc. It would then read filenames on stdin, compiling each one in turn, only exiting when it sees EOF.

Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in replacement for ghc.

One immediate problem I see with this is linking - 'ghc --make Main.hs' is able to figure out what packages a program depends on, while 'ghc Main.o ... -o Main' requires the user to specify them manually with -package. So you'll either need to pass this information back to the parent process, or use 'ghc --make' for linking (which adds more overhead). -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Evan Laforge

6:04 p.m.

...

One immediate problem I see with this is linking - 'ghc --make Main.hs' is able to figure out what packages a program depends on, while 'ghc Main.o ... -o Main' requires the user to specify them manually with -package. So you'll either need to pass this information back to the parent process, or use 'ghc --make' for linking (which adds more overhead).

Well, figuring out dependencies is the job of the build system. I'd be perfectly happy to just invoke ghc with a hardcoded package list as I do currently, or as you said, invoke --make just to figure out the package list for me. The time is going to be dominated by linking, which is single threaded anyway, so either way works. It would be a neat feature to be able to ask ghc to figure out the packages needed for a particular file and emit them for the build system (or is there already a way to do that currently?), but it's orthogonal I think. Probably not hard though, just stick a knob on --make that prints the link line instead of running it.

...

There is in fact an '--interactive' flag already, 'ghc --interactive' is a synonym for 'ghci'.

Oh right, well some other name then :)

...

I'm also interested in a "build server" mode for ghc. I have written a parallel wrapper for 'ghc --make' [1], but the speed gains are not as impressive [2] as I hoped because of the duplicated work.

Was the duplicated work rereading .hi files, or was there something else?

Mikhail Glushenkov

7:13 p.m.

Hi, On Tue, Jan 24, 2012 at 7:04 PM, Evan Laforge wrote:

...

...
I'm also interested in a "build server" mode for ghc. I have written a parallel wrapper for 'ghc --make' [1], but the speed gains are not as impressive [2] as I hoped because of the duplicated work.

Was the duplicated work rereading .hi files, or was there something else?

I think so - according to the GHC manual, the main speed improvement comes from caching the information between compilations. -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Ryan Newton

25 Jan 25 Jan

7:42 p.m.

...

package list for me. The time is going to be dominated by linking, which is single threaded anyway, so either way works.

What is the state of incremental linkers? I thought those existed now.

Evan Laforge

9:51 p.m.

On Wed, Jan 25, 2012 at 11:42 AM, Ryan Newton wrote:

...

...
package list for me. The time is going to be dominated by linking, which is single threaded anyway, so either way works.

What is the state of incremental linkers? I thought those existed now.

I think in some specific cases. I've heard there's a microsoft one? It would be windows only of course. Is anyone using that with ghc? gold is supposed to be multi-threaded and fast (don't know about incremental), but once again it's ELF-only. I've heard a few people talking about gold with ghc, but I don't know what the results were. Unfortunately I'm on OS X, I don't know about any incremental or multithreaded linking here.

Simon Marlow

26 Jan 26 Jan

9:16 a.m.

On 24/01/2012 03:53, Evan Laforge wrote:

...

I recently switched from ghc --make to a parallelized build system. I was looking forward to faster builds, and while they are much faster at figuring out what has to be rebuilt (which is most of the time for a small rebuild, since ld dominates), compilation of the whole system is either the same or slightly slower than the single threaded ghc --make version. My guess is that the overhead of starting up lots of individual ghcs, each of which has to read all the .hi files all over again, just about cancels out the parallelism gains.

I'm slightly surprised by this - in my experience parallel builds beat --make as long as the parallelism is a factor of 2 or more. Is your dependency graph very narrow, or do you have lots of very small modules?

...

So I'm wondering, does this seem reasonable and feasible? Is there a better way to do it? Even if it could be done, would it be worth it? If the answers are "yes", "maybe not", and "maybe yes", then how hard would this be to do and where should I start looking? I'm assuming start at GhcMake.hs and work outwards from there...

I like the idea! And it should be possible to build this without modifying GHC at all, on top of the GHC API. As you say, you'll need a server process, which accepts command lines, executes them, and sends back the results. A local socket should be fine (and will work on both Unix and Windows). The server process can either do the compilation itself, or have several workers. Unfortunately the workers would have to be separate processes, because the GHC API is single threaded. When a worker gets too large, just kill it and start a new one. Cheers, Simon

Evan Laforge

11:37 p.m.

...

I'm slightly surprised by this - in my experience parallel builds beat --make as long as the parallelism is a factor of 2 or more. Is your dependency graph very narrow, or do you have lots of very small modules?

...

From scratch, --make (that's what 'make -j3' winds up calling) wins slightly. --make loses handily at detecting than nothing need be done :) And as expected, modifying one file is all about the linking,

I get full parallelism, 4 threads at once on a 2 core machine * 2 hyperthread/whatever core i5, and SSD. Maybe I should try with just 2 threads. I only ever get 200% CPU at most, so it seems like the hyperthreads are not really much like a whole core. The modules are usually around 150-250 lines. Here are the timings for an older run: from scratch (191 modules): runghc Shake/Shakefile.hs build/debug/seq 128.43s user 20.04s system 178% cpu 1:23.01 total no link: runghc Shake/Shakefile.hs build/debug/seq 118.92s user 19.21s sys tem 249% cpu 55.383 total make -j3 build/seq 68.81s user 9.98s system 98% cpu 1:19.60 total modify nothing: runghc Shake/Shakefile.hs build/debug/seq 0.65s user 0.10s system 96% cpu 0.780 total make -j3 build/seq 6.05s user 1.21s system 85% cpu 8.492 total modify one file: runghc Shake/Shakefile.hs build/debug/seq 19.50s user 2.37s system 94% cpu 23.166 total make -j3 build/seq 12.81s user 1.85s system 94% cpu 15.586 total though it's odd how --make was faster.

...

I like the idea! And it should be possible to build this without modifying GHC at all, on top of the GHC API. As you say, you'll need a server process, which accepts command lines, executes them, and sends back the results. A local socket should be fine (and will work on both Unix and Windows).

The server process can either do the compilation itself, or have several workers. Unfortunately the workers would have to be separate processes, because the GHC API is single threaded.

When a worker gets too large, just kill it and start a new one.

A benefit of real processes, I'm pretty confident all the memory will be GCed after the whole process is killed :) I'll start looking into the ghc api. I have no experience with it, but I assume I can look at what GhcMake.hs is doing and learn from that.

Simon Marlow

27 Jan 27 Jan

9:07 a.m.

On 26/01/2012 23:37, Evan Laforge wrote:

...

...
I'm slightly surprised by this - in my experience parallel builds beat --make as long as the parallelism is a factor of 2 or more. Is your dependency graph very narrow, or do you have lots of very small modules?

I get full parallelism, 4 threads at once on a 2 core machine * 2 hyperthread/whatever core i5, and SSD. Maybe I should try with just 2 threads. I only ever get 200% CPU at most, so it seems like the hyperthreads are not really much like a whole core.

The modules are usually around 150-250 lines. Here are the timings for an older run:

from scratch (191 modules): runghc Shake/Shakefile.hs build/debug/seq 128.43s user 20.04s system 178% cpu 1:23.01 total no link: runghc Shake/Shakefile.hs build/debug/seq 118.92s user 19.21s sys tem 249% cpu 55.383 total make -j3 build/seq 68.81s user 9.98s system 98% cpu 1:19.60 total

This looks a bit suspicious. The Shake build is doing nearly twice as much work as the --make build, in terms of CPU time, but because it is getting nearly 2x parallelism it comes in a close second. How many processes is the Shake build using? I'd investigate this further. Are you sure there's no swapping going on? How many processes is the Shake build creating - perhaps too many? Cheers, Simon

Neil Mitchell

29 Jan 29 Jan

11:56 a.m.

Hi Simon, I have found that a factor of 2 parallelism is required on Linux to draw with ghc --make. In particular: GHC --make = 7.688 Shake -j1 = 11.828 (of which 11.702 is spent running system commands) Shake full -j4 = 7.414 (of which 12.906 is spent running system commands) This is for a Haskell program which has several bottlenecks, you can see graph of spawned processes here: http://community.haskell.org/~ndm/darcs/shake/academic/icfp2012/profile.eps - everything above the 1 mark is more than one process in parallel, so it gets to 4 processes, but not all the time - roughly an average of ~ x2 parallelism. On Windows the story is much worse. If you -j4 then the time spent executing system commands shoots up from ~15s to around ~25s, since even on a 4 core machine the contention in the processes is high. I tried investigating this, checking for things like a locked file (none I can find), or disk/CPU/memory contention (its basically taking no system resources), but couldn't find anything. If you specify -O2 then the parallel performance also goes down - I suspect because each ghc process needs to read inline information for packages that are imported multiple times, and ghc --make gets away with doing that once?

...

This looks a bit suspicious. The Shake build is doing nearly twice as much work as the --make build, in terms of CPU time, but because it is getting nearly 2x parallelism it comes in a close second. How many processes is the Shake build using?

Shake uses a maximum of the number of processes you specify, it never exceeds the -j flag - so in the above example it caps out at 4. It is very good at getting parallelism (I believe it to be perfect, but the code is 150 lines of IORef twiddling, so I wouldn't guarantee it), and very safe about never exceeding the cap you specify (I think I can even prove that, for some value of proof). The profiling makes it easy to verify these claims after the fact. Thanks, Neil

Evan Laforge

10 Feb 10 Feb

8:01 a.m.

...

I like the idea! And it should be possible to build this without modifying GHC at all, on top of the GHC API. As you say, you'll need a server process, which accepts command lines, executes them, and sends back the results. A local socket should be fine (and will work on both Unix and Windows).

I took a whack at this, but I'm having to backtrack a bit now because I don't fully understand the GHC API, so I thought I should explain my understanding to make sure I'm on the right track. It appears the cached information I want to preserve between compiles is in HscEnv. At first I thought I could just do what --make does, but what it does is call 'GHC.load', which maintains the HscEnv (which mostly means loading already compiled modules into the HomePackageTable, since the other cache entries are apparently loaded on demand by DriverPipeline.compileFile). But actually it does a lot of things, such as detecting that a module doesn't need recompilation and directly loading the interface in that case. So I thought it would be quickest to just use it: add a new target to the set of targets and call load again. However, there are problems with that. The first is it doesn't pay attention to DynFlags.outputFile, which makes sense because it's expecting to compile multiple files. The bigger problem is that it apparently wants to reload the whole set each time, so it winds up being slower rather than faster. I guess 'load' is really set up to figure out dependencies on its own and compile a set of modules, so I'm talking at the wrong level. So I think I need to rewrite the HPT-maintaining parts of GHC.load and write my own compileFile that *does* maintain the HPT. And also figure out what other parts of the HscEnv should be updated, if any. Sound about right? Along the way I ran into the problem that it's impossible to re-parse GHC flags to compare them to previous runs, because static flags only export a parsing function that mutates global variables and can only be called once. So I parse out the dynamic flags, strip out the *.hs args, and assume the rest are static flags. I noticed comments about converting them all to dynamic, I guess that might make a nice housekeeping project some day.

Simon Marlow

13 Feb 13 Feb

9:13 a.m.

On 10/02/2012 08:01, Evan Laforge wrote:

...

...
I like the idea! And it should be possible to build this without modifying GHC at all, on top of the GHC API. As you say, you'll need a server process, which accepts command lines, executes them, and sends back the results. A local socket should be fine (and will work on both Unix and Windows).

I took a whack at this, but I'm having to backtrack a bit now because I don't fully understand the GHC API, so I thought I should explain my understanding to make sure I'm on the right track.

It appears the cached information I want to preserve between compiles is in HscEnv. At first I thought I could just do what --make does, but what it does is call 'GHC.load', which maintains the HscEnv (which mostly means loading already compiled modules into the HomePackageTable, since the other cache entries are apparently loaded on demand by DriverPipeline.compileFile). But actually it does a lot of things, such as detecting that a module doesn't need recompilation and directly loading the interface in that case. So I thought it would be quickest to just use it: add a new target to the set of targets and call load again.

However, there are problems with that. The first is it doesn't pay attention to DynFlags.outputFile, which makes sense because it's expecting to compile multiple files. The bigger problem is that it apparently wants to reload the whole set each time, so it winds up being slower rather than faster. I guess 'load' is really set up to figure out dependencies on its own and compile a set of modules, so I'm talking at the wrong level.

So I think I need to rewrite the HPT-maintaining parts of GHC.load and write my own compileFile that *does* maintain the HPT. And also figure out what other parts of the HscEnv should be updated, if any. Sound about right?

What you're trying to do is mimic the operation of 'ghc -c Foo.hs ..' but cache any loaded interface files and re-use them. This means you need to retain the contents of HscEnv (as you say), because that contains the cached information. However, the GHC API doesn't provide a way to do this directly (I hadn't really thought about this when I suggested the idea before, sorry). The GHC API provides support for compiling multiple modules in the way that GHCi and --make work; each module is added to the HPT as it is compiled. But when compiling single modules, GHC doesn't normally use the HPT - interfaces for modules in the home package are normally demand-loaded in the same way as interfaces for package modules, and added to the PIT. The crucial difference between the HPT and the PIT is that the PIT supports demand-loading of interfaces, but the HPT is supposed to be populated in the right order by the compilation manager - home package modules are assumed to be present in the HPT when they are required. For 'ghc -c Foo.hs' you want to demand-load interfaces for other modules in the same package (and cache them), but you want them to not get mixed up with interfaces from other packages that may be being compiled simultaneously by other clients. There's no easy way to solve this. You could avoid the problem by not caching home-package interfaces, but that may throw away a lot of the benefit of doing this. Or you could maintain some kind of session state with the client over multiple compilations, and only discard the home package interfaces if another client connects. There are further complications in that certain flags can invalidate the information you have cached: changing the package flags, for instance. So I think some additions to the API are almost certainly needed. But this is as far as I have got in thinking about the problem... Cheers, Simon

...

Along the way I ran into the problem that it's impossible to re-parse GHC flags to compare them to previous runs, because static flags only export a parsing function that mutates global variables and can only be called once. So I parse out the dynamic flags, strip out the *.hs args, and assume the rest are static flags. I noticed comments about converting them all to dynamic, I guess that might make a nice housekeeping project some day.

Evan Laforge

17 Feb 17 Feb

1:59 a.m.

...

However, the GHC API doesn't provide a way to do this directly (I hadn't really thought about this when I suggested the idea before, sorry). The GHC API provides support for compiling multiple modules in the way that GHCi and --make work; each module is added to the HPT as it is compiled. But when compiling single modules, GHC doesn't normally use the HPT - interfaces for modules in the home package are normally demand-loaded in the same way as interfaces for package modules, and added to the PIT. The crucial difference between the HPT and the PIT is that the PIT supports demand-loading of interfaces, but the HPT is supposed to be populated in the right order by the compilation manager - home package modules are assumed to be present in the HPT when they are required.

Yah, that's what I don't understand about HscEnv. The HPT doc says that in one-shot mode, the HPT is empty and even local modules are demand-cached in the ExternalPackageState (which the PIT belongs to). And the EPS doc itself reinforces that where it says in one-shot mode "home-package modules accumulate in the external package state". So why not just ignore the HPT, and run multiple "one-shot" compiles, and let all the info accumulate in the PIT? A fair amount of work in GhcMake is concerned with trimming old data out of the HPT, I assume this is for ghci that wants to reload changed modules but keep unchanged ones. I don't actually care about that since I can assume the modules will be unchanged over one run. So I tried just calling compileFile multiple times in the same GhcMonad, assuming the mutable bits of the HscEnv get updated appropriately. Here are the results for a build of about 200 modules: with persistent server: no link: 3.30s user 1.60s system 12% cpu 38.323 total 3.50s user 1.66s system 13% cpu 38.368 total link: 21.66s user 4.13s system 35% cpu 1:11.62 total 21.59s user 4.54s system 38% cpu 1:08.13 total 21.82s user 4.70s system 35% cpu 1:14.56 total without server (ghc -c): no link: 109.25s user 19.90s system 240% cpu 53.750 total 109.11s user 19.23s system 243% cpu 52.794 total link: 128.10s user 21.66s system 201% cpu 1:14.29 total ghc --make (with linking since I can't turn that off): 42.57s user 5.83s system 74% cpu 1:05.15 total The 'user' is low for the server because it doesn't count time spent by the subprocesses on the other end of the socket, but excluding linking it looks like I can shave about 25% off compile time. Unfortunately it winds up being just about the same as ghc --make, so it seems too low. Perhaps I should be using the HPT? I'm also falling back to plain ghc for linking, maybe --make can link faster when it has everything cached? I guess it shouldn't, because it presumably just dispatches to ld.

Simon Marlow

9 a.m.

On 17/02/2012 01:59, Evan Laforge wrote:

...

...
However, the GHC API doesn't provide a way to do this directly (I hadn't really thought about this when I suggested the idea before, sorry). The GHC API provides support for compiling multiple modules in the way that GHCi and --make work; each module is added to the HPT as it is compiled. But when compiling single modules, GHC doesn't normally use the HPT - interfaces for modules in the home package are normally demand-loaded in the same way as interfaces for package modules, and added to the PIT. The crucial difference between the HPT and the PIT is that the PIT supports demand-loading of interfaces, but the HPT is supposed to be populated in the right order by the compilation manager - home package modules are assumed to be present in the HPT when they are required.

Yah, that's what I don't understand about HscEnv. The HPT doc says that in one-shot mode, the HPT is empty and even local modules are demand-cached in the ExternalPackageState (which the PIT belongs to). And the EPS doc itself reinforces that where it says in one-shot mode "home-package modules accumulate in the external package state".

So why not just ignore the HPT, and run multiple "one-shot" compiles, and let all the info accumulate in the PIT?

Sure, except that if the server is to be used by multiple clients, you will get clashes in the PIT when say two clients both try to compile a module with the same name. The PIT is indexed by Module, which is basically the pair (package,modulename), and the package for the main program is always the same: "main". This will work fine if you spin up a new server for each program you want to build - maybe that's fine for your use case? Don't forget to make sure the GhcMode is set to OneShot, not CompManager, BTW.

...

A fair amount of work in GhcMake is concerned with trimming old data out of the HPT, I assume this is for ghci that wants to reload changed modules but keep unchanged ones. I don't actually care about that since I can assume the modules will be unchanged over one run.

So I tried just calling compileFile multiple times in the same GhcMonad, assuming the mutable bits of the HscEnv get updated appropriately. Here are the results for a build of about 200 modules:

with persistent server: no link: 3.30s user 1.60s system 12% cpu 38.323 total 3.50s user 1.66s system 13% cpu 38.368 total link: 21.66s user 4.13s system 35% cpu 1:11.62 total 21.59s user 4.54s system 38% cpu 1:08.13 total 21.82s user 4.70s system 35% cpu 1:14.56 total

without server (ghc -c): no link: 109.25s user 19.90s system 240% cpu 53.750 total 109.11s user 19.23s system 243% cpu 52.794 total link: 128.10s user 21.66s system 201% cpu 1:14.29 total

ghc --make (with linking since I can't turn that off): 42.57s user 5.83s system 74% cpu 1:05.15 total

Yep, it seems to be doing the right thing.

...

The 'user' is low for the server because it doesn't count time spent by the subprocesses on the other end of the socket, but excluding linking it looks like I can shave about 25% off compile time. Unfortunately it winds up being just about the same as ghc --make, so it seems too low.

But that's what you expect, isn't it?

...

Perhaps I should be using the HPT? I'm also falling back to plain ghc for linking, maybe --make can link faster when it has everything cached? I guess it shouldn't, because it presumably just dispatches to ld.

--make has a slight advantage for linking in that it knows which packages it needs to link against, whereas plain ghc will link against all the packages on the command line. Cheers, Simon

Evan Laforge

6:12 p.m.

...

Sure, except that if the server is to be used by multiple clients, you will get clashes in the PIT when say two clients both try to compile a module with the same name.

The PIT is indexed by Module, which is basically the pair (package,modulename), and the package for the main program is always the same: "main".

This will work fine if you spin up a new server for each program you want to build - maybe that's fine for your use case?

Yep, I have a new server for each CPU. So compiling one program will start up (say) 4 compilers and one server. Then shake will start throwing source files at the server, in the proper dependency order, and the server will distribute the input files among the 4 servers. Each server is single-threaded so I don't have to worry about calling GHC functions reentrantly. But --make is single-threaded as well, so why doesn't it just call compileFile repeatedly and instead bother with all that HPT stuff? Is it just for ghci?

...

...
The 'user' is low for the server because it doesn't count time spent by the subprocesses on the other end of the socket, but excluding linking it looks like I can shave about 25% off compile time. Unfortunately it winds up being just about the same as ghc --make, so it seems too low.

But that's what you expect, isn't it?

It's surprising to me that the serial --make is just about the same speed as a parallelized one. The whole point was to compile faster! Granted, each interface has to be loaded for each processor while --make only needs to do it once, but once loaded they should stay loaded and I'd expect the benefit from two processors would win out pretty quickly.

...

--make has a slight advantage for linking in that it knows which packages it needs to link against, whereas plain ghc will link against all the packages on the command line.

Ohh, so maybe with --make it can omit some packages and do less work. Let me try minimizing the -packages and see if that helps. As an aside, it would be handy to be able to ask ghc "given this main module, which -packages should the final program get?" but not actually compile anything. Is there a way to do that, short of writing my own with the ghc api? Would it be a reasonable ghc flag, along the lines of -M but for packages? BTW, in case anyone is interested, a darcs repo is at http://ofb.net/~elaforge/ghc-server/

Simon Marlow

27 Feb 27 Feb

9:33 a.m.

On 17/02/2012 18:12, Evan Laforge wrote:

...

...
Sure, except that if the server is to be used by multiple clients, you will get clashes in the PIT when say two clients both try to compile a module with the same name.

The PIT is indexed by Module, which is basically the pair (package,modulename), and the package for the main program is always the same: "main".

This will work fine if you spin up a new server for each program you want to build - maybe that's fine for your use case?

Yep, I have a new server for each CPU. So compiling one program will start up (say) 4 compilers and one server. Then shake will start throwing source files at the server, in the proper dependency order, and the server will distribute the input files among the 4 servers. Each server is single-threaded so I don't have to worry about calling GHC functions reentrantly.

But --make is single-threaded as well, so why doesn't it just call compileFile repeatedly and instead bother with all that HPT stuff? Is it just for ghci?

That might be true, but I'm not completely sure. The HPT stuff was added with a continuous edit-recompile cycle in mind (i.e. for GHCi), and we added --make at the same time because it fitted nicely. It might be that just calling compileFile repeatedly works, and it would end up storing the interfaces for the home-package modules in the PackageIfaceTable, but we never considered this use case. One thing that worries me: will it be reading the .hi file for a module off the disk after compiling it? I suspect it might, whereas the HPT method will be caching the iface in the HPT.

...

...
...
The 'user' is low for the server because it doesn't count time spent by the subprocesses on the other end of the socket, but excluding linking it looks like I can shave about 25% off compile time. Unfortunately it winds up being just about the same as ghc --make, so it seems too low.

But that's what you expect, isn't it?

It's surprising to me that the serial --make is just about the same speed as a parallelized one. The whole point was to compile faster!

Ah, so maybe the problem is that the compileFile method is re-reading .hi files off the disk (and typechecking them), and that is making it slower.

...

Granted, each interface has to be loaded for each processor while --make only needs to do it once, but once loaded they should stay loaded and I'd expect the benefit from two processors would win out pretty quickly.

...
--make has a slight advantage for linking in that it knows which packages it needs to link against, whereas plain ghc will link against all the packages on the command line.

Ohh, so maybe with --make it can omit some packages and do less work. Let me try minimizing the -packages and see if that helps.

As an aside, it would be handy to be able to ask ghc "given this main module, which -packages should the final program get?" but not actually compile anything. Is there a way to do that, short of writing my own with the ghc api? Would it be a reasonable ghc flag, along the lines of -M but for packages?

I don't think we can calculate the package dependencies without knowing the ModIface, which is generated by compiling (or at least typechecking) each module. Cheers, Simon

...

BTW, in case anyone is interested, a darcs repo is at http://ofb.net/~elaforge/ghc-server/

4877

Age (days ago)

4911

Last active (days ago)

List overview

Download

16 comments

5 participants

participants (5)

Evan Laforge
Mikhail Glushenkov
Neil Mitchell
Ryan Newton
Simon Marlow