Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Hello, Apologies upfront for the long post below, but I thought our experience on this particular point might be worth sharing. I present some observations first and then mention the particular problem around "cabal repl" facilitated development workflow. Any ideas/feedback on the latter point would be much appreciated and please let me know if there are other metrics, info or output I can produce here for further investigation/clarification. Recently at work, we upgraded a decent size project from GHC 7.6.3 to GHC 7.8.4 and as a result observed a very significant slowdown in compile times. Code changes were very minimal and should not be a factor. Any mistakes below are mine, but I can fairly confidently say that there is a visible-to-the-eye slowdown in compile times. We made a few measurements to quantify the issue and here is the data: - This project has 220 direct modules (non-dependency) that are compiled with each "cabal build" and we always use sandboxes - Tests were performed on OS X, though we saw similar results on Linux - Single threaded (-j=1) wall-clock compile time has gone up by around 30% for the overall project - With O1 and -j=N, the overall wall-clock time is approximately the same (around 8 min), but the CPU time spent is a staggering 7.5X higher (!) - With O0 and -j=N the overall wall-clock time is 40% higher, but the CPU time spent is a staggering 7.5X higher (!) - With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!) - 7.8.4: 3 min. 35 seconds - 7.6.3: 1 min. 2 seconds - Measuring compile times for individual modules, we see that those that are heavily loaded with lots of types, TH-facilitated type and typeclass derivations and those that contain large "blobs" of values directly at the top level now take much longer to compile: - We have 10 modules that each take over 10 seconds - We have 3 modules that each take over 35 seconds - Sidenote observation: In general, parallel builds with -j appear to cause a very significant overhead under the "system" part of timing: - Example with 7.6.3, O0: cabal build 140.57s user 13.25s system 100% cpu 2:33.70 total - Example with 7.8.4, O0: cabal build 507.83s user 655.43s system 549% cpu 3:31.59 total The main hurt here has been the infeasibility of using "cabal repl / ghci" fast-feedback style development (via emacs, vim, command-line, etc.), since: - Unlike "cabal build", "cabal repl" re-compiles from scratch with every invocation ":l App.Foo.Bar", which itself is the necessary first step when starting to hack on a module (we can then use :r to somewhat help the situation) - The slowdown is about 2.5X as stated above: What used to take a minute now takes 3.5 minutes. - cabal repl does not seem to benefit from -j at all - cabal repl 7.8.4 appears to be hurt particularly by the heavier modules - The heavier modules are often at the top of the compile tree (types, derivations, etc) and are practically loaded on the critical path all the time Let me know if I'm missing anything here and any/all feedback is much appreciated! Cheers, Oz

Oz, and others, Thanks for this data point. Two things: · Oz: is there any chance you could boil out a test case that highlights the regression most starkly? It is sad to know that GHC’s compilation time is increasing, but hard to tackle. A test case, preferably one that doesn’t rely on zillions of other libraries, increases the chances of something getting done by a factor of 10 or 100. Do make a ticket for it as well. Otherwise it just gets lost. · Everyone. It’s really hard to keep focus on keeping GHC’s compilation time and space down. Everyone is usually focused on bugs and features. It would be incredibly helpful if someone, or a small group, could build a profiled version of GHC 7.8, 7.10, and HEAD, and track down what is happening. Usually there are multiple causes, but a factor of 7.5 ought not to be hard to nail down. Please! Thanks. Simon From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Ozgun Ataman Sent: 05 April 2015 19:01 To: ghc-devs@haskell.org Subject: Slower Compilation on GHC 7.8.4 (vs. 7.6.3) Hello, Apologies upfront for the long post below, but I thought our experience on this particular point might be worth sharing. I present some observations first and then mention the particular problem around "cabal repl" facilitated development workflow. Any ideas/feedback on the latter point would be much appreciated and please let me know if there are other metrics, info or output I can produce here for further investigation/clarification. Recently at work, we upgraded a decent size project from GHC 7.6.3 to GHC 7.8.4 and as a result observed a very significant slowdown in compile times. Code changes were very minimal and should not be a factor. Any mistakes below are mine, but I can fairly confidently say that there is a visible-to-the-eye slowdown in compile times. We made a few measurements to quantify the issue and here is the data: * This project has 220 direct modules (non-dependency) that are compiled with each "cabal build" and we always use sandboxes * Tests were performed on OS X, though we saw similar results on Linux * Single threaded (-j=1) wall-clock compile time has gone up by around 30% for the overall project * With O1 and -j=N, the overall wall-clock time is approximately the same (around 8 min), but the CPU time spent is a staggering 7.5X higher (!) * With O0 and -j=N the overall wall-clock time is 40% higher, but the CPU time spent is a staggering 7.5X higher (!) * With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!) * 7.8.4: 3 min. 35 seconds * 7.6.3: 1 min. 2 seconds * Measuring compile times for individual modules, we see that those that are heavily loaded with lots of types, TH-facilitated type and typeclass derivations and those that contain large "blobs" of values directly at the top level now take much longer to compile: * We have 10 modules that each take over 10 seconds * We have 3 modules that each take over 35 seconds * Sidenote observation: In general, parallel builds with -j appear to cause a very significant overhead under the "system" part of timing: * Example with 7.6.3, O0: cabal build 140.57s user 13.25s system 100% cpu 2:33.70 total * Example with 7.8.4, O0: cabal build 507.83s user 655.43s system 549% cpu 3:31.59 total The main hurt here has been the infeasibility of using "cabal repl / ghci" fast-feedback style development (via emacs, vim, command-line, etc.), since: * Unlike "cabal build", "cabal repl" re-compiles from scratch with every invocation ":l App.Foo.Bar", which itself is the necessary first step when starting to hack on a module (we can then use :r to somewhat help the situation) * The slowdown is about 2.5X as stated above: What used to take a minute now takes 3.5 minutes. * cabal repl does not seem to benefit from -j at all * cabal repl 7.8.4 appears to be hurt particularly by the heavier modules * The heavier modules are often at the top of the compile tree (types, derivations, etc) and are practically loaded on the critical path all the time Let me know if I'm missing anything here and any/all feedback is much appreciated! Cheers, Oz

· Everyone. It’s really hard to keep focus on keeping GHC’s compilation time and space down. Everyone is usually focused on bugs and features. It would be incredibly helpful if someone, or a small group, could build a profiled version of GHC 7.8, 7.10, and HEAD, and track down what is happening. Usually there are multiple causes, but a factor of 7.5 ought not to be hard to nail down.
Please!
This is a great newcomer-to-ghc-hacking task. It's both relatively easy to do and gives a newcomer a chance to tour the codebase. The person doing the profiling doesn't also have to fix the bug. Just knowing what module or component of GHC is causing the trouble helps immensely. As a further plug: when a change I made introduced a performance regression, I went into the debugging process very warily, expecting profiling GHC to be painful. It was just the opposite -- remarkably straightforward and able to produce results without terribly much effort. On the other hand, it does generally require having multiple (profiled) builds of GHC to get meaningful comparisons. Thanks to whoever takes this on! Richard

Just a comment on this one point: On 05/04/2015 19:01, Ozgun Ataman wrote:
* With O0 and -j=N the overall wall-clock time is 40% higher, but the CPU time spent is a staggering 7.5X higher (!)
When looking at Haskell programs executing in parallel, it's normal to see a high CPU time, because the GC threads spin looking for work. The extra CPU time is just time spent spinning, it doesn't imply that the compiler was doing extra work. What matters is the overall wall-clock time - a worse wall-clock time indicates that something has gone wrong. We should look at compile-time regression independently of performance issues with -j. Parallel compilation is a new feature and mostly likely needs a lot of tuning. Note that cabal's -j feature is different from GHC's (and is likely to give much more reliable improvements) because it runs multiple compiler instances in parallel. Cheers, Simon
* With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!) o 7.8.4: 3 min. 35 seconds o 7.6.3: 1 min. 2 seconds * Measuring compile times for individual modules, we see that those that are heavily loaded with lots of types, TH-facilitated type and typeclass derivations and those that contain large "blobs" of values directly at the top level now take much longer to compile: o We have 10 modules that each take over 10 seconds o We have 3 modules that each take over 35 seconds * Sidenote observation: In general, parallel builds with -j appear to cause a very significant overhead under the "system" part of timing: o Example with 7.6.3, O0: cabal build 140.57s user 13.25s system 100% cpu 2:33.70 total o Example with 7.8.4, O0: cabal build 507.83s user 655.43s system 549% cpu 3:31.59 total
The main hurt here has been the infeasibility of using "cabal repl / ghci" fast-feedback style development (via emacs, vim, command-line, etc.), since:
* Unlike "cabal build", "cabal repl" re-compiles from scratch with every invocation ":l App.Foo.Bar", which itself is the necessary first step when starting to hack on a module (we can then use :r to somewhat help the situation) * The slowdown is about 2.5X as stated above: What used to take a minute now takes 3.5 minutes. * cabal repl does not seem to benefit from -j at all * cabal repl 7.8.4 appears to be hurt particularly by the heavier modules * The heavier modules are often at the top of the compile tree (types, derivations, etc) and are practically loaded on the critical path all the time
Let me know if I'm missing anything here and any/all feedback is much appreciated!
Cheers, Oz
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hello all,
Thank you for the comments and apologies that it took a bit of time to get
back to you. My colleague Michael and I have taken a stab at a package
demonstrating the slowdown. Our hypothesis was that the presence of type
declarations, large values/literals, typeclass instantiations and
engagement of TH caused at least some of the slowdown we were seeing in our
app code. It turns out this was only partially true:
- 500 random type declarations were a modest 10-25% slower.
- Simply instantiating types with a given class did not contribute that
much to the slowdown (around 5% on top of type declarations)
- There was no notable slowdown on large values/literals (false
hypothesis)
- TH-enabled derivations were over 100% slower (lens and safecopy used
as test subjects)
- lens: ~100% slower
- safecopy: ~125% slower
- The slowdowns appeared to be super-linear in at least some cases.
Doubling the declaration count to 1000 increased the % of the slowdown:
- Declarations: 30% slower
- lens TH slowdown: ~100% slower (stayed the same)
- safecopy TH slowdown: ~140% slower
I'd be happy to create a ticket if this is sufficient support material.
It's hard to say these conclusively isolate *all* culprits for the full
slowdown we observe in our codebase, but are hopefully at least a
significant part.
Here's the package:
https://dl.dropboxusercontent.com/u/58525/ghc-slowdown.tar.gz
Best,
Oz
On Tue, Apr 7, 2015 at 3:39 PM, Simon Marlow
Just a comment on this one point:
On 05/04/2015 19:01, Ozgun Ataman wrote:
* With O0 and -j=N the overall wall-clock time is 40% higher, but the CPU time spent is a staggering 7.5X higher (!)
When looking at Haskell programs executing in parallel, it's normal to see a high CPU time, because the GC threads spin looking for work. The extra CPU time is just time spent spinning, it doesn't imply that the compiler was doing extra work. What matters is the overall wall-clock time - a worse wall-clock time indicates that something has gone wrong.
We should look at compile-time regression independently of performance issues with -j. Parallel compilation is a new feature and mostly likely needs a lot of tuning. Note that cabal's -j feature is different from GHC's (and is likely to give much more reliable improvements) because it runs multiple compiler instances in parallel.
Cheers, Simon
* With -O0, "cabal repl" load times have gone up by a staggering 2.5X
(!) o 7.8.4: 3 min. 35 seconds o 7.6.3: 1 min. 2 seconds * Measuring compile times for individual modules, we see that those that are heavily loaded with lots of types, TH-facilitated type and typeclass derivations and those that contain large "blobs" of values directly at the top level now take much longer to compile: o We have 10 modules that each take over 10 seconds o We have 3 modules that each take over 35 seconds * Sidenote observation: In general, parallel builds with -j appear to cause a very significant overhead under the "system" part of timing: o Example with 7.6.3, O0: cabal build 140.57s user 13.25s system 100% cpu 2:33.70 total o Example with 7.8.4, O0: cabal build 507.83s user 655.43s system 549% cpu 3:31.59 total
The main hurt here has been the infeasibility of using "cabal repl / ghci" fast-feedback style development (via emacs, vim, command-line, etc.), since:
* Unlike "cabal build", "cabal repl" re-compiles from scratch with every invocation ":l App.Foo.Bar", which itself is the necessary first step when starting to hack on a module (we can then use :r to somewhat help the situation) * The slowdown is about 2.5X as stated above: What used to take a minute now takes 3.5 minutes. * cabal repl does not seem to benefit from -j at all * cabal repl 7.8.4 appears to be hurt particularly by the heavier modules * The heavier modules are often at the top of the compile tree (types, derivations, etc) and are practically loaded on the critical path all the time
Let me know if I'm missing anything here and any/all feedback is much appreciated!
Cheers, Oz
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

This is all very helpful Oz. Thanks.
What would be even more helpful would be ticket(s) that focus on individual bad aspects, especially ones that are non-linear. Eg “Here are modules with 500, 1000, 2000 declarations, and compile time goes up non-linearly”.
The two big ones seem to be:
· Type declarations
· TH-enabled derivations (whatever they are)
The simpler the better. Eg safecopy probably has fewer dependencies than lens. And maybe it’s nothing to do with safecopy, but any old derivation will do?
What’s ideal is to be able to turn up the size knob, with a profiled compiler, and see which bits of the compiler are going non-linear.
Thanks!
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Ozgun Ataman
Sent: 14 April 2015 20:56
To: Simon Marlow
Cc: michael.xavier@soostone.com; ghc-devs@haskell.org
Subject: Re: Slower Compilation on GHC 7.8.4 (vs. 7.6.3)
Hello all,
Thank you for the comments and apologies that it took a bit of time to get back to you. My colleague Michael and I have taken a stab at a package demonstrating the slowdown. Our hypothesis was that the presence of type declarations, large values/literals, typeclass instantiations and engagement of TH caused at least some of the slowdown we were seeing in our app code. It turns out this was only partially true:
* 500 random type declarations were a modest 10-25% slower.
* Simply instantiating types with a given class did not contribute that much to the slowdown (around 5% on top of type declarations)
* There was no notable slowdown on large values/literals (false hypothesis)
* TH-enabled derivations were over 100% slower (lens and safecopy used as test subjects)
* lens: ~100% slower
* safecopy: ~125% slower
* The slowdowns appeared to be super-linear in at least some cases. Doubling the declaration count to 1000 increased the % of the slowdown:
* Declarations: 30% slower
* lens TH slowdown: ~100% slower (stayed the same)
* safecopy TH slowdown: ~140% slower
I'd be happy to create a ticket if this is sufficient support material. It's hard to say these conclusively isolate *all* culprits for the full slowdown we observe in our codebase, but are hopefully at least a significant part.
Here's the package: https://dl.dropboxusercontent.com/u/58525/ghc-slowdown.tar.gz
Best,
Oz
On Tue, Apr 7, 2015 at 3:39 PM, Simon Marlow
participants (5)
-
Andrés Sicard-Ramírez
-
Ozgun Ataman
-
Richard Eisenberg
-
Simon Marlow
-
Simon Peyton Jones