GHC build times on newer MacBook Pros?

Hi! I'm looking to upgrade my laptop so that I can (among other things) compile GHC faster. I'll either get the 13" 2.7GHz dual-core Intel Core i7 model or the 15" 2.2GHz quad-core Intel Core i7 model. Anyone know if it's worth to get the 15" model? According to these benchmarks it should be quite a bit faster: http://www.primatelabs.ca/geekbench/mac-benchmarks/ but I don't know if I can get enough parallelism out of GHC's build to use all 4 cores in the 15" model. -- Johan

Hi, go for 4 cores if the price is not prohibitive. I'm using Q6600 here and all cores are quite busy except for the configuration and compilations which is done by cabal (if only this cabal would be parallel too!). On ARM/Linux -- 2 cores cortex-a9 (OMAP4430) I've noticed that sometimes build process strangely occupies just one core but I don't have enough resources to search for the culprit and yet majority of times 2 cores are 100% busy... Karel On 08/23/11 11:27 AM, Johan Tibell wrote:
Hi!
I'm looking to upgrade my laptop so that I can (among other things) compile GHC faster. I'll either get the 13" 2.7GHz dual-core Intel Core i7 model or the 15" 2.2GHz quad-core Intel Core i7 model. Anyone know if it's worth to get the 15" model? According to these benchmarks it should be quite a bit faster:
http://www.primatelabs.ca/geekbench/mac-benchmarks/
but I don't know if I can get enough parallelism out of GHC's build to use all 4 cores in the 15" model.
-- Johan
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I'm using a MBP with a quad core 2gHz Core i7; it has 8 overall
hardware threads. GHC's build process using 'make -j9' or 'make -j12'
totally saturates all my cores. I believe I can clock in a full build
at well under 10 minutes (with BuildFlavor = quick in mk/build.mk.)
For comparison, I also have a dual core 2gHz Core i7 (4 hardware
threads) in a lenovo sitting next to it running Linux, and a full GHC
build takes a bit longer. I can get real numbers later if it's
actually that interesting.
I'd recommend you getting the quad-core machine, if at all possible.
On Tue, Aug 23, 2011 at 4:27 AM, Johan Tibell
Hi!
I'm looking to upgrade my laptop so that I can (among other things) compile GHC faster. I'll either get the 13" 2.7GHz dual-core Intel Core i7 model or the 15" 2.2GHz quad-core Intel Core i7 model. Anyone know if it's worth to get the 15" model? According to these benchmarks it should be quite a bit faster:
http://www.primatelabs.ca/geekbench/mac-benchmarks/
but I don't know if I can get enough parallelism out of GHC's build to use all 4 cores in the 15" model.
-- Johan
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Regards, Austin

Sounds promising. Now I just have to decide whether to get the 2.2 or
2.3 GHz version. I suspect the latter is a bit overpriced.
On Tue, Aug 23, 2011 at 4:36 PM, austin seipp
I'm using a MBP with a quad core 2gHz Core i7; it has 8 overall hardware threads. GHC's build process using 'make -j9' or 'make -j12' totally saturates all my cores. I believe I can clock in a full build at well under 10 minutes (with BuildFlavor = quick in mk/build.mk.) For comparison, I also have a dual core 2gHz Core i7 (4 hardware threads) in a lenovo sitting next to it running Linux, and a full GHC build takes a bit longer. I can get real numbers later if it's actually that interesting.
I'd recommend you getting the quad-core machine, if at all possible.
On Tue, Aug 23, 2011 at 4:27 AM, Johan Tibell
wrote: Hi!
I'm looking to upgrade my laptop so that I can (among other things) compile GHC faster. I'll either get the 13" 2.7GHz dual-core Intel Core i7 model or the 15" 2.2GHz quad-core Intel Core i7 model. Anyone know if it's worth to get the 15" model? According to these benchmarks it should be quite a bit faster:
http://www.primatelabs.ca/geekbench/mac-benchmarks/
but I don't know if I can get enough parallelism out of GHC's build to use all 4 cores in the 15" model.
-- Johan
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Regards, Austin

I have a 16 core machine at work (with 48GB of ram, a perk of the job
:)). GHC can saturate them all. Can validate GHC in well under 10
minutes on it.
I also just got the 15" core i7 2.3GHz less then a week ago, it's a
very nice machine so I would recommend it. But yeah just getting the
2.2GHz is the better value for money.
On 23 August 2011 07:56, Johan Tibell
Sounds promising. Now I just have to decide whether to get the 2.2 or 2.3 GHz version. I suspect the latter is a bit overpriced.
On Tue, Aug 23, 2011 at 4:36 PM, austin seipp
wrote: I'm using a MBP with a quad core 2gHz Core i7; it has 8 overall hardware threads. GHC's build process using 'make -j9' or 'make -j12' totally saturates all my cores. I believe I can clock in a full build at well under 10 minutes (with BuildFlavor = quick in mk/build.mk.) For comparison, I also have a dual core 2gHz Core i7 (4 hardware threads) in a lenovo sitting next to it running Linux, and a full GHC build takes a bit longer. I can get real numbers later if it's actually that interesting.
I'd recommend you getting the quad-core machine, if at all possible.
On Tue, Aug 23, 2011 at 4:27 AM, Johan Tibell
wrote: Hi!
I'm looking to upgrade my laptop so that I can (among other things) compile GHC faster. I'll either get the 13" 2.7GHz dual-core Intel Core i7 model or the 15" 2.2GHz quad-core Intel Core i7 model. Anyone know if it's worth to get the 15" model? According to these benchmarks it should be quite a bit faster:
http://www.primatelabs.ca/geekbench/mac-benchmarks/
but I don't know if I can get enough parallelism out of GHC's build to use all 4 cores in the 15" model.
-- Johan
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Regards, Austin
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Tue, Aug 23, 2011 at 10:24 AM, David Terei
I have a 16 core machine at work (with 48GB of ram, a perk of the job :)). GHC can saturate them all. Can validate GHC in well under 10 minutes on it.
To wander a bit from the topic, when I first saw this I thought "wow, ghc builds in parallel now, I want that" but then I realized it's because ghc itself uses make, not --make. --make's automatic dependencies are convenient, but figuring out dependencies on every build and not being parallel means make should be a lot faster. Also, --make doesn't understand the hsc->hs link, so in practice I have to do a fair amount of manual dependencies anyway. So it inspired me to try to switch from --make to make for my own project. I took a look at the ghc build system and even after reading the documentation it's hard for me to understand. The first issue is how to get ghc -M to understand hsc2hs? My workaround was to fetch *.hsc, and have 'make depend' depend on $(patsubst %.hsc, %hs, $(all_hsc)) so that by the time ghc -M runs it can find the .hs files. Then the more perplexing issue is that I'm using to using -odir and -hidir with --make to maintain separate trees of .o and .hi built for profiling and testing, but I'm not sure how make does that, and fiddling with VPATH has been unsuccessful so far. Otherwise I could do wholesale preprocessing of the ghc generated deps file, but it seems clunky in addition to tripling its size. I know ghc has "ways", and it's hard for me to read rules/*dependencies* stuff, but I don't think ghc is doing that. Maybe it just doesn't allow profiling and non-profiling to coexist? Maybe I shouldn't be asking make questions on on the ghc list, but it's related to how ghc does -odir and -hidir and the best way to build haskell so it's at least somewhat relevant :) To bring it back to ghc a bit, wouldn't it be nice if there didn't have to be a tradeoff between fast but awkward to set up vs. slow but convenient? For larger projects, either make or something with equivalent power is probably necessary, given C, hsc2hs, etc. all needing to be integrated, but a wiki page with some make recipes for ghc could help there a bunch. I'd be happy to put one up as soon as I figure out the current situation. Then there's simply making --make faster... I saw a talk about a failed attempt to parallelize ghc, but it seems like he was trying to parallelize the compiler itself... why not take the make approach and simply start many ghcs? You'd have to pay the ghc startup time cost, but relatively speaking I think that's pretty fast nowadays. Or you could just do a work-stealing kind of thing where ghc marks a file as "in progress" and then each ghc tries to grab a file to compile. Then you just start a whole bunch of 'ghc --make's and let them fight it out.

From: Evan Laforge
Sent: Friday, August 26, 2011 6:35 PM Subject: Re: GHC build times on newer MacBook Pros?
On Tue, Aug 23, 2011 at 10:24 AM, David Terei
wrote: I have a 16 core machine at work (with 48GB of ram, a perk of the job :)). GHC can saturate them all. Can validate GHC in well under 10 minutes on it.
To wander a bit from the topic, when I first saw this I thought "wow, ghc builds in parallel now, I want that" but then I realized it's because ghc itself uses make, not --make. --make's automatic dependencies are convenient, but figuring out dependencies on every build and not being parallel means make should be a lot faster. Also, --make doesn't understand the hsc->hs link, so in practice I have to do a fair amount of manual dependencies anyway. So it inspired me to try to switch from --make to make for my own project.
I'm confused by this as well. Parallelizing --make was one of the first case studies in the smp runtime paper, section 7 in Haskell on a Shared-Memory Multiprocessor There's also a trac ticket http://hackage.haskell.org/trac/ghc/ticket/910with a vague comment that the patch from the paper "almost certainly isn't ready for prime time", but I haven't seen any description of specific problems. Brandon

On Sat, Aug 27, 2011 at 5:25 AM, Brandon Moore
I'm confused by this as well. Parallelizing --make was one of the first case studies in the smp runtime paper, section 7 in Haskell on a Shared-Memory Multiprocessor
There's also a trac ticket http://hackage.haskell.org/trac/ghc/ticket/910with a vague comment that the patch from the paper "almost certainly isn't ready for prime time", but I haven't seen any description of specific problems.
From what I remember someone tried to parallelize GHC but it turned out to me tricky in practice. At the moment very trying to parallelize Cabal which would allow us to build packages/modules in parallel using ghc -c and let Cabal handle dependency management (including preprocessing of .hsc files).
Johan

From what I remember someone tried to parallelize GHC but it turned out to me tricky in practice. At the moment very trying to parallelize Cabal which would allow us to build packages/modules in parallel using ghc -c and let Cabal handle dependency management (including preprocessing of .hsc files).
Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do. A parallel cabal build would be excellent, but AFAIK not much help for mixed language projects, though I admit I haven't tried cabal for that yet. I'm sure it could launch make to build the C, but can it track .h -> .hsc dependencies? Parallel cabal build would tempt me to give it a try.

On 27 August 2011 09:00, Evan Laforge
Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do.
It was Thomas Schilling, and he wasn't trying to parallelise the compilation of a single file. He was just trying to make access to the various bits of shared state GHC uses thread safe. This mostly worked but caused an unacceptable performance penalty to single-threaded compilation. Max

The performance problem was due to the use of unsafePerformIO or other
thunk-locking functions. The problem was that such functions can
cause severe performance problems when using a deep stack. The
problem is that these functions need to traverse the stack to
atomically claim thunks that might be under evaluation by multiple
threads.
The latest version of GHC should no longer have this problem (or not
as severely) because the stack is now split into chunks (see [1] for
performance tuning options) only one of which needs to be scanned.
So, it might be worth a try to re-apply that thread-safety patch.
[1]: https://plus.google.com/107890464054636586545/posts/LqgXK77FgfV
On 29 August 2011 21:50, Max Bolingbroke
On 27 August 2011 09:00, Evan Laforge
wrote: Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do.
It was Thomas Schilling, and he wasn't trying to parallelise the compilation of a single file. He was just trying to make access to the various bits of shared state GHC uses thread safe. This mostly worked but caused an unacceptable performance penalty to single-threaded compilation.
Max
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Push the envelope. Watch it bend.

On 30/08/2011 00:42, Thomas Schilling wrote:
The performance problem was due to the use of unsafePerformIO or other thunk-locking functions. The problem was that such functions can cause severe performance problems when using a deep stack. The problem is that these functions need to traverse the stack to atomically claim thunks that might be under evaluation by multiple threads.
The latest version of GHC should no longer have this problem (or not as severely) because the stack is now split into chunks (see [1] for performance tuning options) only one of which needs to be scanned. So, it might be worth a try to re-apply that thread-safety patch.
[1]: https://plus.google.com/107890464054636586545/posts/LqgXK77FgfV
I think I would do it differently. Rather than using unsafePerformIO, use unsafeDupablePerformIO with an atomic idempotent operation. Looking up or adding an entry to the FastString table can be done using an atomicModifyIORef, so this should be fine. The other place you have to look carefully at is the NameCache; again an atomicModifyIORef should do the trick there. In GHC 7.2.1 we also have a casMutVar# primitive which can be used to build lower-level atomic operations, so that might come in handy too. Cheers, Simon
On 29 August 2011 21:50, Max Bolingbroke
wrote: On 27 August 2011 09:00, Evan Laforge
wrote: Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do.
It was Thomas Schilling, and he wasn't trying to parallelise the compilation of a single file. He was just trying to make access to the various bits of shared state GHC uses thread safe. This mostly worked but caused an unacceptable performance penalty to single-threaded compilation.
Max
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Mon, Aug 29, 2011 at 1:50 PM, Max Bolingbroke
On 27 August 2011 09:00, Evan Laforge
wrote: Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do.
It was Thomas Schilling, and he wasn't trying to parallelise the compilation of a single file. He was just trying to make access to the various bits of shared state GHC uses thread safe. This mostly worked but caused an unacceptable performance penalty to single-threaded compilation.
Interesting, maybe I misremembered? Or maybe there was some other guy who was trying to parallelize? Just out of curiosity, what benefit does a thread-safe ghc provide? I know ghc api users have go to some bother to not call re-entrantly... what neat stuff could we do with a re-entrant ghc? Could it eventually lead to an internally parallel ghc or are there deeper reasons it's hard to parallelize compilation? That would be really cool, if possible. In fact, I don't know of any parallel compilers.

On 30 August 2011 01:16, Evan Laforge
Interesting, maybe I misremembered? Or maybe there was some other guy who was trying to parallelize?
Just out of curiosity, what benefit does a thread-safe ghc provide? I know ghc api users have go to some bother to not call re-entrantly... what neat stuff could we do with a re-entrant ghc? Could it eventually lead to an internally parallel ghc or are there deeper reasons it's hard to parallelize compilation? That would be really cool, if possible. In fact, I don't know of any parallel compilers.
Yes, the plan was to eventually have a parallel --make mode.

On 1 September 2011 08:44, Evan Laforge
Yes, the plan was to eventually have a parallel --make mode.
If that's the goal, wouldn't it be easier to start many ghcs?
Yes. With Scion I'm in the process of moving away from using GHC's compilation manager (i.e., --make) towards a multi-process setup. This has a number of advantages: - Less memory usage. Loading lots of modules (e.g., GHC itself) can take up to 1G of memory. There are also a number of caches that can only be flushed by restarting the session. - Sidestep a few bugs in the compilation manager, such as non-flushable instance caches which lead to spurious instance overlaps. (Sorry, can't find the corresponding ticket, right now.) - An external compilation manager (e.g., Shake) can also handle preprocessing of other extensions, such as .y, .chs, etc. - Support for different static flags (e.g., -prof). Static flags should eventually be removed from GHC, but it's low-priority and difficult to do. - Uniform handling of compilation with multiple versions of GHC. - Parallel building, as you mentioned. There may be more. It also comes with disadvantages, such as the need to serialise more data, but I think it's worth it. This is the main reason why I stopped working on a thread-safe GHC. Personally, I believe the GHC API should just include a simple API for compiling a single module and return some binary value (i.e., don't automatically write things to a file). Everything else, including GHCi, should be separate. But that's a different matter... -- Push the envelope. Watch it bend.

On 01/09/2011 08:44, Evan Laforge wrote:
Yes, the plan was to eventually have a parallel --make mode.
If that's the goal, wouldn't it be easier to start many ghcs?
It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic). Then you would probably want to randomise the build order of each --make run to maximise the chance that each GHC does something different. Fun project for someone? Cheers, Simon

On Thu, Sep 1, 2011 at 8:49 AM, Simon Marlow
On 01/09/2011 08:44, Evan Laforge wrote:
Yes, the plan was to eventually have a parallel --make mode.
If that's the goal, wouldn't it be easier to start many ghcs?
It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic).
I suppose you could just rename it into place when you're done. -Edward

It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic).
I suppose you could just rename it into place when you're done. -Edward
I was imagining that it could create Module.o.compiling and then rename into place when it's done. Then each ghc would do a work stealing thing where it tries to find output to produce that doesn't have an accompanying .compiling, or sleeps for a bit if all work at this stage is already taken, which is likely to happen since sometimes the graph would go through a bottleneck. Then it's easy to clean up if work gets interrupted, just rm **/*.compiling

On 01/09/2011 18:02, Evan Laforge wrote:
It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic).
I suppose you could just rename it into place when you're done. -Edward
I was imagining that it could create Module.o.compiling and then rename into place when it's done. Then each ghc would do a work stealing thing where it tries to find output to produce that doesn't have an accompanying .compiling, or sleeps for a bit if all work at this stage is already taken, which is likely to happen since sometimes the graph would go through a bottleneck. Then it's easy to clean up if work gets interrupted, just rm **/*.compiling
Right, using a Module.o.compiling file as a lock would work. Another way to do this would be to have GHC --make invoke itself to compile each module separately. Actually I think I prefer this method, although it might be a bit slower since each individual compilation has to read lots of interface files. The main GHC --make process would do the final link only. A fun hack for somebody? Cheers, Simon

Hi, Am Freitag, den 02.09.2011, 09:07 +0100 schrieb Simon Marlow:
On 01/09/2011 18:02, Evan Laforge wrote:
It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic).
I suppose you could just rename it into place when you're done. -Edward
I was imagining that it could create Module.o.compiling and then rename into place when it's done. Then each ghc would do a work stealing thing where it tries to find output to produce that doesn't have an accompanying .compiling, or sleeps for a bit if all work at this stage is already taken, which is likely to happen since sometimes the graph would go through a bottleneck. Then it's easy to clean up if work gets interrupted, just rm **/*.compiling
Right, using a Module.o.compiling file as a lock would work.
Another way to do this would be to have GHC --make invoke itself to compile each module separately. Actually I think I prefer this method, although it might be a bit slower since each individual compilation has to read lots of interface files. The main GHC --make process would do the final link only. A fun hack for somebody?
this would also help building large libraries on architectures with little memory, as it seems to me that when one ghc instance is compiling multiple modules in a row, some leaked memory/unevaluated thunks pile up and eventually cause the compilation to abort. I suspect that building each file on its own avoids this issue. (But this is only based on observation, not on hard facts.) Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

Another way to do this would be to have GHC --make invoke itself to compile each module separately. Actually I think I prefer this method, although it might be a bit slower since each individual compilation has to read lots of interface files. The main GHC --make process would do the final link only. A fun hack for somebody?
this would also help building large libraries on architectures with little memory, as it seems to me that when one ghc instance is compiling multiple modules in a row, some leaked memory/unevaluated thunks pile up and eventually cause the compilation to abort. I suspect that building each file on its own avoids this issue.
In my experience, reading all those .hi files is not so quick, about 1.5s for around 200 modules, on an SSD. It gets worse with a pgmF, since ghc wants to preprocess each file, it's a minimum of 5s given 'cat' as a preprocessor. Part of my wanting to use make instead of --make was to avoid this re-preprocessing delay. It's nice that it will automatically notice which modules to recompile if a CPP define changes, but not so nice that it has to take a lot of time to figure that out every single compile, or for a preprocessor that doesn't have the power to change whether the module should be recompiled or not.

On 03/09/2011 02:05, Evan Laforge wrote:
Another way to do this would be to have GHC --make invoke itself to compile each module separately. Actually I think I prefer this method, although it might be a bit slower since each individual compilation has to read lots of interface files. The main GHC --make process would do the final link only. A fun hack for somebody?
this would also help building large libraries on architectures with little memory, as it seems to me that when one ghc instance is compiling multiple modules in a row, some leaked memory/unevaluated thunks pile up and eventually cause the compilation to abort. I suspect that building each file on its own avoids this issue.
In my experience, reading all those .hi files is not so quick, about 1.5s for around 200 modules, on an SSD. It gets worse with a pgmF, since ghc wants to preprocess each file, it's a minimum of 5s given 'cat' as a preprocessor.
Part of my wanting to use make instead of --make was to avoid this re-preprocessing delay. It's nice that it will automatically notice which modules to recompile if a CPP define changes, but not so nice that it has to take a lot of time to figure that out every single compile, or for a preprocessor that doesn't have the power to change whether the module should be recompiled or not.
Ah, but you're measuring the startup time of ghc --make, which is not the same as the work that each individual ghc would do if ghc were invoked separately on each module, for two reasons: - when used in one-shot mode (i.e. without --make), ghc only reads and processes the interface files it needs, lazilly - the individual ghc's would not need to proprocess modules - that would only be done once, by the master process, before starting the subprocesses. The preprocessed source would be cached, exactly as it is now by --make. Cheers, Simon
participants (11)
-
austin seipp
-
Brandon Moore
-
David Terei
-
Edward Kmett
-
Evan Laforge
-
Joachim Breitner
-
Johan Tibell
-
Karel Gardas
-
Max Bolingbroke
-
Simon Marlow
-
Thomas Schilling