Fwd: Is anything being done to remedy the soul crushing compile times of GHC?

There is currently an interesting discussion on Reddit on GHC compile times https://www.reddit.com/r/haskell/comments/45q90s/is_anything_being_done_to_r... I feel that this is a serious problem; so, it probably ought to be discussed here as well. Manuel

| There is currently an interesting discussion on Reddit on GHC compile times
Crikey. That's a long and rather discouraging thread. (It comes as a bit of a surprise to me: I don't have much problem myself with GHC, which is by any standards a big project; but clearly others do.)
I hate hearing people feeling so unhappy with GHC. And yet, and yet -- I don't feel able to just down-tools on everything else and work on perf. And that is true for many, I think. It's a tragedy-of-the-commons problem.
What to do? Some thoughts.
* GHC itself may or may not be the main culprit. There seems to be some
muttering about Cabal.
* We have had various attempts to appoint a Performance Tsar, but all have
come to nothing.
* We discussed it in our weekly GHC Skype chat yesterday. One thing that
would really help is to make it laughably easy to track
- Micro: whether my commit made anything significantly worse
(compile time/allocs, run time/allocs, binary size)
- Macro: how HEAD is doing relative to the previous release
Our current tools are weak, and make it too hard.
- We need a small number of aggregated numbers,
not hundreds of numbers. (You might want the hundreds
when digging into the details.)
- We should include nofib, not just tests/perf
- Our current perf tests only complain when you go outside
a window, but 90% of the lossage might have been from other
patches, which demotivates dealing with it
- We don’t have any tools that show what time/alloc gets spent
in which pass.
* You would think that perf improvements would be something
lots of people would like to work on: EVERYONE will love you.
But in fact no one does.
Increase incentives: maybe out tooling could generate a
leader-board showing who is responsible for what total perf
improvement, so that glory gets properly allocated.
Paying for it. Austin found companies who are nearly at the
point of hiring someone to work on perf. Maybe a collection
of companies could together pay Well Typed for a GHC perf
engineer, who focused on nothing else. That would be amazing.
Any other ideas?
Simon
| -----Original Message-----
| From: Manuel M T Chakravarty [mailto:chak@justtesting.org]
| Sent: 14 February 2016 23:14
| To: GHC developers

On Tue, Feb 16, 2016, at 05:49, Simon Peyton Jones wrote:
* We discussed it in our weekly GHC Skype chat yesterday. One thing that would really help is to make it laughably easy to track - Micro: whether my commit made anything significantly worse (compile time/allocs, run time/allocs, binary size) - Our current perf tests only complain when you go outside a window, but 90% of the lossage might have been from other patches, which demotivates dealing with it
It might be useful it phabricator ran the perf tests / nofib for every patch and displayed a warning (think a lint warning) if any of the metrics got worse. The warning would foster discussion about what caused the perf regression and whether it needs to be fixed *before* merging the patch. The current process for dealing with perf regressions seems to revolve around Joachim noticing that gipeda is reporting a regression, and raising a concern with the patch *after* it's been landed. This is entirely too late because the author will have moved on to something else, and to have to go back and work on a patch you thought was done is a bit demoralizing. To be clear, I'm very grateful for Joachim's work here, even when it involves flagging my patches :) But I think it would be better if we were *proactive* about the regressions rather than *reactive*. Eric

Eric Seidel
On Tue, Feb 16, 2016, at 05:49, Simon Peyton Jones wrote:
* We discussed it in our weekly GHC Skype chat yesterday. One thing that would really help is to make it laughably easy to track - Micro: whether my commit made anything significantly worse (compile time/allocs, run time/allocs, binary size) - Our current perf tests only complain when you go outside a window, but 90% of the lossage might have been from other patches, which demotivates dealing with it
It might be useful it phabricator ran the perf tests / nofib for every patch and displayed a warning (think a lint warning) if any of the metrics got worse. The warning would foster discussion about what caused the perf regression and whether it needs to be fixed *before* merging the patch.
Indeed we do already run the performance tests but at the moment you only get a thumbs-up or thumbs-down. One of my tasks for this week is to try adding better reporting of compiler performance in the testsuite driver. Cheers, - Ben

Manuel M T Chakravarty
There is currently an interesting discussion on Reddit on GHC compile times
https://www.reddit.com/r/haskell/comments/45q90s/is_anything_being_done_to_r...
I feel that this is a serious problem; so, it probably ought to be discussed here as well.
One area that is in terrible need of attention is nofib itself. It's a good start but it needs many more testcases representative of today's idiomatic Haskell. It would be great if we could get users to submit their computationally-heavy, toy projects. Unfortunately, the best testcases for us are those with no dependencies outside of the core libraries and these projects aren't terribly common. Cheers, - Ben

Ben Gamari
It would be great if we could get users to submit their computationally-heavy, toy projects. Unfortunately, the best testcases for us are those with no dependencies outside of the core libraries and these projects aren't terribly common.
This appears extremely unfortunate, because it is *the* metric that is really representative of end user experience. There is a number of factors that affect speed of the builds, and one could imagine how the profiles for the entirety of build process can vary due to the amount of cross-module dependencies, for example. -- с уважениeм / respectfully, Косырев Сергей

Kosyrev Serge <_deepfire@feelingofgreen.ru> writes:
Ben Gamari
writes: It would be great if we could get users to submit their computationally-heavy, toy projects. Unfortunately, the best testcases for us are those with no dependencies outside of the core libraries and these projects aren't terribly common.
This appears extremely unfortunate, because it is *the* metric that is really representative of end user experience.
Multiple modules aren't a problem. It is dependencies on Hackage packages that complicate matters. Cheers, - Ben

On Wed, Feb 17, 2016 at 4:38 AM, Ben Gamari
Multiple modules aren't a problem. It is dependencies on Hackage packages that complicate matters.
From a certain point of view, this could be motivation to either break fewer things, or to patch breaking dependents as soon as the breaking
I guess the problem is when ghc breaks a bunch of hackage packages, you can't build with it anymore until those packages are updated, which won't happen until after the release? patch goes into ghc. Which doesn't sound so bad in theory. Of course someone would need to spend time doing boring maintenance, but it seems that will be required regardless. And ultimately someone has to do it eventually. Of course, said person's boring time might be better spent directly addressing known performance problems. My impression from the reddit thread is that three things are going on: 1 - cabal has quite a bit of startup overhead 2 - ghc takes a long time on certain inputs, e.g. long list literals. There are probably already tickets for these. 3 - and of course, ghc can be just generally slow, in the million tiny cuts sense. I personally haven't run into these (though I don't doubt those who have), so don't get too discouraged!

Evan Laforge
On Wed, Feb 17, 2016 at 4:38 AM, Ben Gamari
wrote: Multiple modules aren't a problem. It is dependencies on Hackage packages that complicate matters.
I guess the problem is when ghc breaks a bunch of hackage packages, you can't build with it anymore until those packages are updated, which won't happen until after the release?
This is one issue, although perhaps not the largest. Here are some of the issues I can think of off the top of my head, * The issue you point out: Hackage packages need to be updated * Hackage dependencies mean that the performance of the testcase is now dependent upon code over which we have no control. If a test's performance improves is this because the compiler improved or merely because a dependency of the testcase was optimized? Of course, you could maintain a stable fork of the dependency, but at this point you might as well just take the pieces you need and fold them into the testcase. * Hackage dependencies greatly complicate packaging. You need to somehow download and install them. The obvious approach here is to use cabal-install but it is unavailable during a GHC build * Hackage dependencies make it much harder to determine what the compiler is doing. If I have a directory of modules, I can rebuild all of them with `ghc --make -fforce-recomp`. Things are quite a bit trickier when packages enter the picture. In short, the whole packaging system really acts as nothing more than a confounding factor for performance analysis, in addition to making implementation quite a bit trickier. That being said, developing another performance testsuite consisting of a set of larger, dependency-ful applications may be useful at some point. I think the first priority, however, should be nofib.
From a certain point of view, this could be motivation to either break fewer things, or to patch breaking dependents as soon as the breaking patch goes into ghc. Which doesn't sound so bad in theory. Of course someone would need to spend time doing boring maintenance, but it seems that will be required regardless. And ultimately someone has to do it eventually.
Much of the effort necessary to bring Hackage up to speed with a new GHC release isn't due to breakage; it's just bumping version bounds. I'm afraid the GHC project really doesn't have the man-power to do this work consistently. We already owe hvr a significant amount of gratitude for handling so many of these issues leading up to the release.
Of course, said person's boring time might be better spent directly addressing known performance problems.
Indeed.
My impression from the reddit thread is that three things are going on:
1 - cabal has quite a bit of startup overhead
Yes, it would be great if someone could step up to look at Cabal's performance. Running `cabal build` on an up-to-date tree of a moderately-sized (10 kLoC, 8 components, 60 modules) Haskell project I have laying around takes over 5 seconds from start-to-finish. `cabal build`ing just a single executable component takes 4 seconds. This same executable takes 48 seconds for GHC to build from scratch with optimization and 12 seconds without.
2 - ghc takes a long time on certain inputs, e.g. long list literals. There are probably already tickets for these.
Indeed, there are plenty of pathological cases. For better or worse, these are generally the "easier" performance problems to tackle.
3 - and of course, ghc can be just generally slow, in the million tiny cuts sense.
And this is the tricky one. Beginning to tackle this will require that someone perform some very careful measurements on current and previous releases. Performance issues are always on my and Austin's to-do list, but we are unfortunately rather limited in the amount of time we can spend on these due to funding considerations. As Simon mentioned, if someone would like to see this fixed and has money to put towards the cause, we would love to hear from you. Cheers, - Ben

Another large culprit for performance is that the fact that ghc --make must preprocess and parse the header of every local Haskell file: https://ghc.haskell.org/trac/ghc/ticket/618 (as well as https://ghc.haskell.org/trac/ghc/ticket/1290). Neil and I have observed that when you use something better (like Shake) recompilation performance gets a lot better, esp. when you have a lot of modules. Edward Excerpts from Ben Gamari's message of 2016-02-17 00:58:43 -0800:
Evan Laforge
writes: On Wed, Feb 17, 2016 at 4:38 AM, Ben Gamari
wrote: Multiple modules aren't a problem. It is dependencies on Hackage packages that complicate matters.
I guess the problem is when ghc breaks a bunch of hackage packages, you can't build with it anymore until those packages are updated, which won't happen until after the release?
This is one issue, although perhaps not the largest. Here are some of the issues I can think of off the top of my head,
* The issue you point out: Hackage packages need to be updated
* Hackage dependencies mean that the performance of the testcase is now dependent upon code over which we have no control. If a test's performance improves is this because the compiler improved or merely because a dependency of the testcase was optimized?
Of course, you could maintain a stable fork of the dependency, but at this point you might as well just take the pieces you need and fold them into the testcase.
* Hackage dependencies greatly complicate packaging. You need to somehow download and install them. The obvious approach here is to use cabal-install but it is unavailable during a GHC build
* Hackage dependencies make it much harder to determine what the compiler is doing. If I have a directory of modules, I can rebuild all of them with `ghc --make -fforce-recomp`. Things are quite a bit trickier when packages enter the picture.
In short, the whole packaging system really acts as nothing more than a confounding factor for performance analysis, in addition to making implementation quite a bit trickier.
That being said, developing another performance testsuite consisting of a set of larger, dependency-ful applications may be useful at some point. I think the first priority, however, should be nofib.
From a certain point of view, this could be motivation to either break fewer things, or to patch breaking dependents as soon as the breaking patch goes into ghc. Which doesn't sound so bad in theory. Of course someone would need to spend time doing boring maintenance, but it seems that will be required regardless. And ultimately someone has to do it eventually.
Much of the effort necessary to bring Hackage up to speed with a new GHC release isn't due to breakage; it's just bumping version bounds. I'm afraid the GHC project really doesn't have the man-power to do this work consistently. We already owe hvr a significant amount of gratitude for handling so many of these issues leading up to the release.
Of course, said person's boring time might be better spent directly addressing known performance problems.
Indeed.
My impression from the reddit thread is that three things are going on:
1 - cabal has quite a bit of startup overhead
Yes, it would be great if someone could step up to look at Cabal's performance. Running `cabal build` on an up-to-date tree of a moderately-sized (10 kLoC, 8 components, 60 modules) Haskell project I have laying around takes over 5 seconds from start-to-finish.
`cabal build`ing just a single executable component takes 4 seconds. This same executable takes 48 seconds for GHC to build from scratch with optimization and 12 seconds without.
2 - ghc takes a long time on certain inputs, e.g. long list literals. There are probably already tickets for these.
Indeed, there are plenty of pathological cases. For better or worse, these are generally the "easier" performance problems to tackle.
3 - and of course, ghc can be just generally slow, in the million tiny cuts sense.
And this is the tricky one. Beginning to tackle this will require that someone perform some very careful measurements on current and previous releases.
Performance issues are always on my and Austin's to-do list, but we are unfortunately rather limited in the amount of time we can spend on these due to funding considerations. As Simon mentioned, if someone would like to see this fixed and has money to put towards the cause, we would love to hear from you.
Cheers,
- Ben

On Wed, Feb 17, 2016 at 2:14 AM, Edward Z. Yang
Another large culprit for performance is that the fact that ghc --make must preprocess and parse the header of every local Haskell file: https://ghc.haskell.org/trac/ghc/ticket/618 (as well as https://ghc.haskell.org/trac/ghc/ticket/1290). Neil and I have observed that when you use something better (like Shake) recompilation performance gets a lot better, esp. when you have a lot of modules.
I can second this, and I suspect the reason I've never had slowness problems is that I use shake exclusively. I'm looking forward to your work with integrating shake into ghc. If it means we can get both shake's speed and parallelism as well as --make's ability to retain hi files in memory, it should give a really nice speed boost. I see a lot of "compilation IS NOT required", presumably that would be even faster if it didn't have to start a new ghc and inspect all the hi files again before finding that out!

On Wed, Feb 17, 2016 at 2:58 AM, Ben Gamari
Yes, it would be great if someone could step up to look at Cabal's performance. Running `cabal build` on an up-to-date tree of a moderately-sized (10 kLoC, 8 components, 60 modules) Haskell project I have laying around takes over 5 seconds from start-to-finish.
`cabal build`ing just a single executable component takes 4 seconds. This same executable takes 48 seconds for GHC to build from scratch with optimization and 12 seconds without.
I have contributed several performance patches to Cabal in the past, so I feel somewhat qualified to speak here. The remaining slowness in `cabal build` is mostly due to the pre-process phase. There work in progress which may improve this situation. We could also look at separating the pre-process phase from the build phase. (It seems odd to call it `pre-process` when it is *always* run during the build phase, doesn't it?) This has the advantage of sidestepping the slowness issue entirely, but it may break some users' workflows. Is that trade-off worth it? We could use user feedback here. Regards, Tom

Thomas Tuegel
On Wed, Feb 17, 2016 at 2:58 AM, Ben Gamari
wrote: Yes, it would be great if someone could step up to look at Cabal's performance. Running `cabal build` on an up-to-date tree of a moderately-sized (10 kLoC, 8 components, 60 modules) Haskell project I have laying around takes over 5 seconds from start-to-finish.
`cabal build`ing just a single executable component takes 4 seconds. This same executable takes 48 seconds for GHC to build from scratch with optimization and 12 seconds without.
I have contributed several performance patches to Cabal in the past, so I feel somewhat qualified to speak here. The remaining slowness in `cabal build` is mostly due to the pre-process phase. There work in progress which may improve this situation. We could also look at separating the pre-process phase from the build phase. (It seems odd to call it `pre-process` when it is *always* run during the build phase, doesn't it?) This has the advantage of sidestepping the slowness issue entirely, but it may break some users' workflows. Is that trade-off worth it? We could use user feedback here.
What exactly does the pre-process phase do, anyways? Cheers, - Ben

On Wed, Feb 17, 2016 at 2:21 PM, Ben Gamari
Thomas Tuegel
writes: I have contributed several performance patches to Cabal in the past, so I feel somewhat qualified to speak here. The remaining slowness in `cabal build` is mostly due to the pre-process phase. There work in progress which may improve this situation. We could also look at separating the pre-process phase from the build phase. (It seems odd to call it `pre-process` when it is *always* run during the build phase, doesn't it?) This has the advantage of sidestepping the slowness issue entirely, but it may break some users' workflows. Is that trade-off worth it? We could use user feedback here.
What exactly does the pre-process phase do, anyways?
It runs the appropriate pre-processor (Alex, Happy, c2hs, etc.) for modules that require it. It's slow because of the way the process is carried out: For each module in the package description, Cabal tries to find an associated .hs source file in the hs-source-dirs. If it cannot, it looks for a file with an extension matching one of the pre-processors it knows about. If it finds one, it runs the corresponding program if the output files are missing or outdated. If this doesn't sound TOO bad, consider: how many modules on Hackage use pre-processors? Certainly less than 5%, maybe even less than 1%. That's a LOT of work every time you call `cabal build`. - Tom P.S. If I may get a little philosophical, this is representative of the problems we have in Cabal. Cabal tries to be very automagical, at the cost of being sometimes slow and always opaque when things break!

On 17 February 2016 at 07:40, Evan Laforge
My impression from the reddit thread is that three things are going on:
1 - cabal has quite a bit of startup overhead 2 - ghc takes a long time on certain inputs, e.g. long list literals. There are probably already tickets for these.
In my experience GHC startup overhead (time) has increased quite a lot somewhere in 7.x. I don't know if it's the cause, but perhaps dyn libs may be part of the reason. I'm not sure because I once (7.8 I believe) tried to build without dynlink support and couldn't measure a substantial improvement. So, if you start ghc(i) for the first time from a spinning disk, it's very noticeable and quite a delay. Once it's cached, it's fast, so I think it's primarily due to reading stuff from disk. Just to mention the ideal overhead: anything below 400ms is small enough to not disrupt the flow and feels responsive. Go over 1s and it breaks.

Tuncer Ayaz
On 17 February 2016 at 07:40, Evan Laforge
wrote: My impression from the reddit thread is that three things are going on:
1 - cabal has quite a bit of startup overhead 2 - ghc takes a long time on certain inputs, e.g. long list literals. There are probably already tickets for these.
In my experience GHC startup overhead (time) has increased quite a lot somewhere in 7.x. I don't know if it's the cause, but perhaps dyn libs may be part of the reason. I'm not sure because I once (7.8 I believe) tried to build without dynlink support and couldn't measure a substantial improvement.
I'm not sure this is going to be a significant effect, but last night thomie, rwbarton, and I discovered that the way we structure the Haskell library directory makes the dynamic linker do significantly more work than necessary. Merely compiling "Hello world" requires 800 `open` system calls with a dynamically linked compiler. Seems like we should really try to fix this. See #11587. Cheers, - Ben

On 17 February 2016 at 14:31, Tuncer Ayaz
On 17 February 2016 at 07:40, Evan Laforge
wrote: My impression from the reddit thread is that three things are going on:
1 - cabal has quite a bit of startup overhead 2 - ghc takes a long time on certain inputs, e.g. long list literals. There are probably already tickets for these.
In my experience GHC startup overhead (time) has increased quite a lot somewhere in 7.x. I don't know if it's the cause, but perhaps dyn libs may be part of the reason. I'm not sure because I once (7.8 I believe) tried to build without dynlink support and couldn't measure a substantial improvement.
So, if you start ghc(i) for the first time from a spinning disk, it's very noticeable and quite a delay. Once it's cached, it's fast, so I think it's primarily due to reading stuff from disk.
Just to mention the ideal overhead: anything below 400ms is small enough to not disrupt the flow and feels responsive. Go over 1s and it breaks.
Freshly booted machine with an SSD required 2 seconds for GHCi, so maybe it's just that there's a lot more stuff to load, which leads me to the next question: (better) dead code elimination would probably help, where it at minimum only includes modules used, with a future improvement of skipping unused functions, etc. as well, but I don't know enough about GHC's DCE to talk about it.

Here's a thought: has anyone experience with limiting a certain major release to just bug fixes and perf regression fixes, while postponing all feature patches? It sounds like a good idea on paper, but has anyone seen it work, and would this be something to consider for GHC? I'm not suggesting the even/odd versioning scheme, if anyone wonders. These don't work so well and nobody tests odd versions.

The better approach, I think, might be to section off certain times in
a release period where we only allow such changes. Only for a month or
so, for example, and you're just encouraged to park your current work
for a little while, during that time, and just improve things.
The only problem is, it's not clear if people will want to commit as
much if the hard rule is just to fix bugs/improve performance for a
select time. Nobody is obligated to contribute, so it could easily
fall into a lull period if people get tired of it. But maybe the
shared sense of community in doing it would help.
Whatever we do, it has to be strict in these times, because in
practice we have a policy like this ("just bug/perf fixes") during the
time leading up to the RC, but we always slip and merge other things
regardless. So, if we do this, we must be quite strict about it in
practice and police ourselves better, I think.
On Wed, Feb 17, 2016 at 7:35 AM, Tuncer Ayaz
Here's a thought: has anyone experience with limiting a certain major release to just bug fixes and perf regression fixes, while postponing all feature patches? It sounds like a good idea on paper, but has anyone seen it work, and would this be something to consider for GHC? I'm not suggesting the even/odd versioning scheme, if anyone wonders. These don't work so well and nobody tests odd versions. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On 17 February 2016 at 15:47, Austin Seipp
The better approach, I think, might be to section off certain times in a release period where we only allow such changes. Only for a month or so, for example, and you're just encouraged to park your current work for a little while, during that time, and just improve things.
The only problem is, it's not clear if people will want to commit as much if the hard rule is just to fix bugs/improve performance for a select time. Nobody is obligated to contribute, so it could easily fall into a lull period if people get tired of it. But maybe the shared sense of community in doing it would help.
Whatever we do, it has to be strict in these times, because in practice we have a policy like this ("just bug/perf fixes") during the time leading up to the RC, but we always slip and merge other things regardless. So, if we do this, we must be quite strict about it in practice and police ourselves better, I think.
Exactly, the time-boxing aspect is what I tried to express with my concern about even/odd branching model, which failed for linux pre-Bitkeeper. So, maybe a model like Linus does with two weeks of a merge window and then strictly fixes, but that would require a ghc-next branch with a maintainer, so probably not feasible with the resources right now.
On Wed, Feb 17, 2016 at 7:35 AM, Tuncer Ayaz
wrote: Here's a thought: has anyone experience with limiting a certain major release to just bug fixes and perf regression fixes, while postponing all feature patches? It sounds like a good idea on paper, but has anyone seen it work, and would this be something to consider for GHC? I'm not suggesting the even/odd versioning scheme, if anyone wonders. These don't work so well and nobody tests odd versions.
participants (11)
-
Austin Seipp
-
Ben Gamari
-
Ben Gamari
-
Edward Z. Yang
-
Eric Seidel
-
Evan Laforge
-
Kosyrev Serge
-
Manuel M T Chakravarty
-
Simon Peyton Jones
-
Thomas Tuegel
-
Tuncer Ayaz