
Folks, first of all, I remember someone already mentioned issue with decreased parallelism of the GHC build recently somewhere but I cann't find it now. Sorry, for that since otherwise I would use this thread if it was on this mailing list. Anyway, while working on SPARC NCG I'm using T2000 which provides 32 threads/8 core UltraSPARC T1 CPU. The property of this machine is that it's really slow on single-threaded work. To squeeze some perf from it man really needs to push 32 threads of work on it. Now, it really hurts my nerves to see it's lazy building/running just one or two ghc processes. To verify the fact I've created simple script to collect number of ghc processes over time and putting this to graph. The result is in the attached picture. The graph is result of running: gmake -j64 anyway, the average number of running ghc processes is 4.4 and the median value is 2. IMHO such low number not only hurts build times on something like CMT SPARC machine, but also on let say a cluster of ARM machines using NFS and also on common engineering workstations which provide these days (IMHO!) around 8-16 cores (and double the threads number). My naive idea(s) for fixing this issue is (I'm assuming no Haskell file imports unused imports here, but perhaps this may be also investigated): 1) provide explicit dependencies which guides make to build in more optimal way 2) hack GHC's make depend to kind of compute explicit dependencies from (1) in an optimal way automatically 3) someone already mentioned using shake for building ghc. I don't know shake but perhaps this is the right direction? 4) hack GHC to compile needed hi file directly in its memory if hi file is not (yet!) available (issue how to get compiling options right here). Also I don't know hi file semantics yet so bear with me on this. Is there anything else which may be done to fix that issue? Is someone already working on some of those? (I mean those reasonable from the list)? Thanks! Karel

On 2015-03-07 at 11:49:53 +0100, Karel Gardas wrote: [...]
Is there anything else which may be done to fix that issue? Is someone already working on some of those? (I mean those reasonable from the list)?
are you aware of https://ghc.haskell.org/trac/ghc/wiki/Building/Shake and https://github.com/snowleopard/shaking-up-ghc ? Cheers, hvr

On 03/ 7/15 12:09 PM, Herbert Valerio Riedel wrote:
On 2015-03-07 at 11:49:53 +0100, Karel Gardas wrote:
[...]
Is there anything else which may be done to fix that issue? Is someone already working on some of those? (I mean those reasonable from the list)?
are you aware of
https://ghc.haskell.org/trac/ghc/wiki/Building/Shake
and
I am. Is this agreed way among the GHC developers? I was not sure so I mentioned shake just lightly... Thanks, Karel

Hi Karel,
I am. Is this agreed way among the GHC developers? I was not sure so I mentioned shake just lightly...
It's certainly agreed to give it a go, and implementation work is ongoing. If it works better than the existing system then I suspect we'll switch. I certainly hope to get full parallelism, although I don't have as many core as you do to test! Thanks, Neil

Hi Neil, On 03/ 8/15 02:58 PM, Neil Mitchell wrote:
Hi Karel,
I am. Is this agreed way among the GHC developers? I was not sure so I mentioned shake just lightly...
It's certainly agreed to give it a go, and implementation work is ongoing. If it works better than the existing system then I suspect we'll switch. I certainly hope to get full parallelism, although I don't have as many core as you do to test!
I've investigated for short time stage1 dependencies and what make does with them and so far had not found any issue which may be fixed by hand added dependency into the makefile build system. For example one of the biggest issue in stage1 is excessive dependency on DynFlags at least. Anyway, I'll see what shake/ghc brings to the table -- perhaps I've not looked that well and there is a way to solve that... Anyway, idea of separate compilation of interface files still comes to my head as a nice experiment to do. Thanks, Karel

As Neil says, I'm hoping that the new Shake-based build system will make a big difference. It's not certain that we'll switch to it, but I very much hope that we will. Fortunately we can work it side-by-side with the old system, so I hope it'll just be a question of switching over because it is manifestly better. Andrey can give a status report. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Karel | Gardas | Sent: 07 March 2015 10:50 | To: ghc-devs | Subject: How to better parallelize GHC build. | | | Folks, | | first of all, I remember someone already mentioned issue with decreased | parallelism of the GHC build recently somewhere but I cann't find it | now. Sorry, for that since otherwise I would use this thread if it was | on this mailing list. | | Anyway, while working on SPARC NCG I'm using T2000 which provides 32 | threads/8 core UltraSPARC T1 CPU. The property of this machine is that | it's really slow on single-threaded work. To squeeze some perf from it | man really needs to push 32 threads of work on it. Now, it really hurts | my nerves to see it's lazy building/running just one or two ghc | processes. To verify the fact I've created simple script to collect | number of ghc processes over time and putting this to graph. The result | is in the attached picture. The graph is result of running: | | gmake -j64 | | anyway, the average number of running ghc processes is 4.4 and the | median value is 2. IMHO such low number not only hurts build times on | something like CMT SPARC machine, but also on let say a cluster of ARM | machines using NFS and also on common engineering workstations which | provide these days (IMHO!) around 8-16 cores (and double the threads | number). | | My naive idea(s) for fixing this issue is (I'm assuming no Haskell file | imports unused imports here, but perhaps this may be also investigated): | | 1) provide explicit dependencies which guides make to build in more | optimal way | | 2) hack GHC's make depend to kind of compute explicit dependencies from | (1) in an optimal way automatically | | 3) someone already mentioned using shake for building ghc. I don't know | shake but perhaps this is the right direction? | | 4) hack GHC to compile needed hi file directly in its memory if hi file | is not (yet!) available (issue how to get compiling options right here). | Also I don't know hi file semantics yet so bear with me on this. | | | Is there anything else which may be done to fix that issue? Is someone | already working on some of those? (I mean those reasonable from the | list)? | | Thanks! | Karel

Hi Karel,
could you try adding `-j8` to `SRC_HC_OPTS` for the build flavor you're
using in `mk/build.mk`, and running `gmake -j8` instead of `gmake -j64`. A
graph like the one you attached will likely look even worse, but the
walltime of your build should hopefully be improved.
The build system seems to currently rely entirely on `make` for
parallelism. It doesn't exploit ghc's own parallel `--make` at all, unless
you explictly add `-jn` to SRC_HC_OPTS, with n>1 (which also sets the
number of capabilities for the runtime system, so also adding `+RTS -Nn` is
not needed).
Case study: One of the first things the build system does is build
ghc-cabal and Cabal using the stage 0 compiler, through a single invocation
of `ghc --make`. All the later make targets depend on that step to complete
first. Because `ghc --make` is not instructed to build in parallel, using
`make -j1` or `make -j100000` doesn't make any difference (for that step).
I think your graph shows that there are many of more of such bottlenecks.
You would have to find out empirically how to best divide your number of
threads (32) between `make` and `ghc --make`. From reading this comment
https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12 by Simon in #9221
I understand it's better not to call `ghc --make -jn` with `n` higher than
the number of physical cores of your machine (8 in your case). Once you get
some better parallelism, other flags like `-A` might also have an effect on
walltime (see that ticket).
-Thomas
On Sat, Mar 7, 2015 at 11:49 AM, Karel Gardas
Folks,
first of all, I remember someone already mentioned issue with decreased parallelism of the GHC build recently somewhere but I cann't find it now. Sorry, for that since otherwise I would use this thread if it was on this mailing list.
Anyway, while working on SPARC NCG I'm using T2000 which provides 32 threads/8 core UltraSPARC T1 CPU. The property of this machine is that it's really slow on single-threaded work. To squeeze some perf from it man really needs to push 32 threads of work on it. Now, it really hurts my nerves to see it's lazy building/running just one or two ghc processes. To verify the fact I've created simple script to collect number of ghc processes over time and putting this to graph. The result is in the attached picture. The graph is result of running:
gmake -j64
anyway, the average number of running ghc processes is 4.4 and the median value is 2. IMHO such low number not only hurts build times on something like CMT SPARC machine, but also on let say a cluster of ARM machines using NFS and also on common engineering workstations which provide these days (IMHO!) around 8-16 cores (and double the threads number).
My naive idea(s) for fixing this issue is (I'm assuming no Haskell file imports unused imports here, but perhaps this may be also investigated):
1) provide explicit dependencies which guides make to build in more optimal way
2) hack GHC's make depend to kind of compute explicit dependencies from (1) in an optimal way automatically
3) someone already mentioned using shake for building ghc. I don't know shake but perhaps this is the right direction?
4) hack GHC to compile needed hi file directly in its memory if hi file is not (yet!) available (issue how to get compiling options right here). Also I don't know hi file semantics yet so bear with me on this.
Is there anything else which may be done to fix that issue? Is someone already working on some of those? (I mean those reasonable from the list)?
Thanks! Karel
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi Thomas, thanks for your suggestion. Also thanks for the PR number. I've tried with quick way (build.mk) and benchmarking ghc while compiling ghc-cabal manually and here are the results: -j1: 45s -j2: 28s -j3: 26s -j4: 24s -j5: 24s -j6: 25s -j6 -A32m: 23s -j6 -A64m: 21s -j6 -A128m: 23s real time is reported, GHC compiles into i386 code on Solaris 11. GHC is located in /tmp hence basically in RAM. CPU is 6c/12ht E5-2620. So not that bad, but on the other hand also not that good result. Anyway, unfortunately on my niagara this will probably not help me, since I guess --make -jX is recent addition probably not presented in 7.6.x, right? If so, then I'm afraid this will not help me since on niagara I'm using patched 7.6.x with fixed SPARC NCG and this single-threaded will be probably faster than 7.10.1 multithreaded but building unregisterised (hence with C compiler...). Anyway, I'll try to benchmark this tomorrow and will keep you posted. Thanks! Karel On 04/ 1/15 12:34 PM, Thomas Miedema wrote:
Hi Karel,
could you try adding `-j8` to `SRC_HC_OPTS` for the build flavor you're using in `mk/build.mk http://build.mk`, and running `gmake -j8` instead of `gmake -j64`. A graph like the one you attached will likely look even worse, but the walltime of your build should hopefully be improved.
The build system seems to currently rely entirely on `make` for parallelism. It doesn't exploit ghc's own parallel `--make` at all, unless you explictly add `-jn` to SRC_HC_OPTS, with n>1 (which also sets the number of capabilities for the runtime system, so also adding `+RTS -Nn` is not needed).
Case study: One of the first things the build system does is build ghc-cabal and Cabal using the stage 0 compiler, through a single invocation of `ghc --make`. All the later make targets depend on that step to complete first. Because `ghc --make` is not instructed to build in parallel, using `make -j1` or `make -j100000` doesn't make any difference (for that step). I think your graph shows that there are many of more of such bottlenecks.
You would have to find out empirically how to best divide your number of threads (32) between `make` and `ghc --make`. From reading this comment https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12 by Simon in #9221 I understand it's better not to call `ghc --make -jn` with `n` higher than the number of physical cores of your machine (8 in your case). Once you get some better parallelism, other flags like `-A` might also have an effect on walltime (see that ticket).
-Thomas
On Sat, Mar 7, 2015 at 11:49 AM, Karel Gardas
mailto:karel.gardas@centrum.cz> wrote: Folks,
first of all, I remember someone already mentioned issue with decreased parallelism of the GHC build recently somewhere but I cann't find it now. Sorry, for that since otherwise I would use this thread if it was on this mailing list.
Anyway, while working on SPARC NCG I'm using T2000 which provides 32 threads/8 core UltraSPARC T1 CPU. The property of this machine is that it's really slow on single-threaded work. To squeeze some perf from it man really needs to push 32 threads of work on it. Now, it really hurts my nerves to see it's lazy building/running just one or two ghc processes. To verify the fact I've created simple script to collect number of ghc processes over time and putting this to graph. The result is in the attached picture. The graph is result of running:
gmake -j64
anyway, the average number of running ghc processes is 4.4 and the median value is 2. IMHO such low number not only hurts build times on something like CMT SPARC machine, but also on let say a cluster of ARM machines using NFS and also on common engineering workstations which provide these days (IMHO!) around 8-16 cores (and double the threads number).
My naive idea(s) for fixing this issue is (I'm assuming no Haskell file imports unused imports here, but perhaps this may be also investigated):
1) provide explicit dependencies which guides make to build in more optimal way
2) hack GHC's make depend to kind of compute explicit dependencies from (1) in an optimal way automatically
3) someone already mentioned using shake for building ghc. I don't know shake but perhaps this is the right direction?
4) hack GHC to compile needed hi file directly in its memory if hi file is not (yet!) available (issue how to get compiling options right here). Also I don't know hi file semantics yet so bear with me on this.
Is there anything else which may be done to fix that issue? Is someone already working on some of those? (I mean those reasonable from the list)?
Thanks! Karel
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (5)
-
Herbert Valerio Riedel
-
Karel Gardas
-
Neil Mitchell
-
Simon Peyton Jones
-
Thomas Miedema