
Friends The thread below concerns GHC's performance. I'm writing to ask for your help. In developing GHC we always run 'validate', which runs a lot of regression tests. A few of those are performance tests, but because we do so frequently, none of these performance tests run for long, and none depend on other packages. And they tend to test something very specific. What we lack is a sustained effort to track a) The performance of GHC itself b) The performance of GHC-compiled programs At GHC HQ we *aspire* to track this stuff, but in practice we simply don't. Bug-fixing, portability, etc etc always ends up taking priority. But it's a pity that we don't. For example, Michal's comment below (that his GHC-compiled program has run 10-20% faster with each release of GHC from 6.12) is fantastic -- but we have no data to back it up, or know whether it's just Michael, or more widely true. I also suspect that sometimes we regress and don't know it. People do report this (e.g. #8971), but it's a bit random. Again, it would be great to have a more systematic way to know. In many cases it might be easy to fix; but we can only fix if we know. We have the nofib suite, and we take that very seriously, but it is showing its age, uses no advanced features, and I'm not sure how representative it is any more. Johan Tibell and Bryan O'Sullivan agreed to become GHC's Performance Tsars a year or two ago, with a view to focusing on (b) at least, but they are both extremely busy. And I don't think they even attempt to focus on (a). So I'm wondering: would anyone like to help here? It would mean * Soliciting and gathering together some more substantial benchmarks * Gathering performance numbers * Yelling quickly if the numbers go bad. * Investigating why (e.g. no one has seriously profiled GHC itself for a while, both space and time. I bet there are improvements to be had there.) Maybe it could be part of the new buildbot team's work? Certainly it'd make sense to use the same nightly-build infrastructure. Anyway, I'm advertising that there's an un-met need, and I'd love some help. Thanks Simon | -----Original Message----- | From: Michal J. Gajda [mailto:mjgajda@gmail.com] | Sent: 10 April 2014 07:26 | To: ghc-devs@haskell.org; Sergei Trofimovich; Manuel Chakravarty; Simon | Peyton Jones | Subject: Re: ghc-7.8-rc2 in -O2 mode eats all stack and RAM on pandoc- | ,citeproc / highlighting-kate | | Dear Devs, | | On 04/10/2014 09:09 AM, ghc-devs-request@haskell.org wrote: | > Filed the reproducer as a new ticket: | > https://ghc.haskell.org/trac/ghc/ticket/8980 | > | > [ Looks like highlighting-kate asks to be added to | > compiler performance benchmarks (are there such ones?) | > It tends to stress ghc all the time: | > http://hackage.haskell.org/trac/ghc/ticket/3664 | > ] | Please consider adding hPDB too, if you want to stress the optimizer. | It shows GHC optimizer at its best, with at least 10-20% improvement in | every major version of | the compiler since 6.12. Unfortunately at cost of very long compile | times. | Please let me know if I should submit a driver code for automatic | benchmarks it (it is in hPDB-examples.) Thanks! | >> > SpecConstr is too aggressive: it sometimes blows up the program | badly and we have no good | solution. See Trac #7898, #7068, #7944, #5550, #8836. | And #8960, where GHC runs out of memory. (Only in 7.8.) Should be easy | to reproduce by just `cabal install hPDB`. | >> > I notice that the latter three are actually fixed in 7.8, so worth | trying that. If it | still fails, do add instructions to reproduce to one of the above open | tickets, or make a new one. | >> > | >> > Meanwhile you can use -fno-spec-constr to simply switch it off for | offending modules. That should get you going. | -- | Best regards | Michal

There are a few "projects" in this area I think we should undertake: ## Set up a performance dashboard Example: http://goperfd.appspot.com/ (try clicking the "Perf" and "Graphs" buttons.) I think the way to go here is: * Figure out how to build GHC and run the nofib suite in a repeatable (i.e. scripted) manner. * Set up a Jenkins build bot on a "quiet" machine. Have it run the above script in "exclusive" mode (i.e. no other Jenkins jobs can run in parallel). * Write a script that gathers the nofib output and sticks it in a database. The database should be keyed by the commit hash. Use a mature DB (e.g. MySQL, sqlite, postgress). * Write a little web frontend that graphs the results over time. Don't reinvent the wheel/shave all the Yaks. You might be able to reuse the Go frontend and run it on appengine. * Bonus points: have the jenkins job send an email if performance regresses above some threshold. There might already exist Jenkins plugins that can help with the last 4 steps. ## Improve our benchmarks Many of the benchmarks in nofib have too short runtime nowadays to be accurate enough when comparing the performance of two GHC builds. The shootout benchmarks are good (I've checked), but the remaining ones should be considered suspicious. We ought to weed out or improve benchmarks that are no longer accurate. This might involve having the run on bigger inputs and thus run for longer. In addition, running the existing benchmark suites of some core libraries (e.g. from the HP) would also be very useful. The difficulty is to get Criterion to build with HEAD reliably. ## One-off nofib run against the lastest N major GHC releases To see if we have already regressed (and see where we have regressed) I think we should just run the current nofib suite (probably using the "slow" mode) against 6.12 and up to see where we have already regressed. File a bug for each regression, optionally with a small analysis of why we regressed (e.g. look at the Core and the output of +RTS -s). Cheers, Johan

Dear Friends,
I have a few weeks free just now, and a keen interest in GHC performance,
so please count me in as a person interested in developing the solution :-).
On Thu, Apr 10, 2014 at 5:57 PM, Johan Tibell
There are a few "projects" in this area I think we should undertake:
## Set up a performance dashboard
Example: http://goperfd.appspot.com/ (try clicking the "Perf" and "Graphs" buttons.)
I think the way to go here is:
* Figure out how to build GHC and run the nofib suite in a repeatable (i.e. scripted) manner.
Wouldn't it be faster to take GHC builds from the current builder? I understand that running GHC build for each commit may take some time... Where would be put the infrastructure? I could help to set it up, but I do not possess enough CPU power to hold it for a long time. I do not know how current builders are set up, and if it would be possible to indicate some of them as having exclusivity for their machine. If so, would it be better to add perf test and database submission as a last step for the test suite? As long as access is not anonymous (to prevent spam), such a database might provide a better coverage of performance data over tier-2 architectures. -- Best regards Michal

On Fri, Apr 18, 2014 at 8:12 AM, Michał J Gajda
Dear Friends,
I have a few weeks free just now, and a keen interest in GHC performance, so please count me in as a person interested in developing the solution :-).
On Thu, Apr 10, 2014 at 5:57 PM, Johan Tibell
wrote: There are a few "projects" in this area I think we should undertake:
## Set up a performance dashboard
Example: http://goperfd.appspot.com/ (try clicking the "Perf" and "Graphs" buttons.)
I think the way to go here is:
* Figure out how to build GHC and run the nofib suite in a repeatable (i.e. scripted) manner.
Wouldn't it be faster to take GHC builds from the current builder? I understand that running GHC build for each commit may take some time... Where would be put the infrastructure? I could help to set it up, but I do not possess enough CPU power to hold it for a long time.
Don't worry too much about how you build GHC. Any way that works is fine. The important part is that it's simple and works. If it can be written as a simple shell script + some tools to post-process the output, it would be easy to run on Jenkins. Also, don't worry about finding a machine that can run the benchmarks concurrently. If you get a dashboard and continuous build that works on your development machine, we can find a real server for it (e.g. I have an unused http://www.hetzner.de/en/hosting/produkte_rootserver/ex40) So here's what I'd suggest: 1. Write a script that builds and runs the benchmarks on your local machine. 2. Write something that massages the output into a format that can be pushed to whatever database the perf dashboard would user. 3. Get a dashboard up and running. 4. Tell us about the results. We'll find machines to run it on. Feel free to ask questions if you get stuck anywhere. -- Johan

On 10/04/2014 08:58, Simon Peyton Jones wrote:
Friends
The thread below concerns GHC's performance. I'm writing to ask for your help.
In developing GHC we always run 'validate', which runs a lot of regression tests. A few of those are performance tests, but because we do so frequently, none of these performance tests run for long, and none depend on other packages. And they tend to test something very specific.
What we lack is a sustained effort to track a) The performance of GHC itself b) The performance of GHC-compiled programs
At GHC HQ we *aspire* to track this stuff, but in practice we simply don't. Bug-fixing, portability, etc etc always ends up taking priority.
But it's a pity that we don't. For example, Michal's comment below (that his GHC-compiled program has run 10-20% faster with each release of GHC from 6.12) is fantastic -- but we have no data to back it up, or know whether it's just Michael, or more widely true. I also suspect that sometimes we regress and don't know it. People do report this (e.g. #8971), but it's a bit random. Again, it would be great to have a more systematic way to know. In many cases it might be easy to fix; but we can only fix if we know.
We have the nofib suite, and we take that very seriously, but it is showing its age, uses no advanced features, and I'm not sure how representative it is any more.
Johan Tibell and Bryan O'Sullivan agreed to become GHC's Performance Tsars a year or two ago, with a view to focusing on (b) at least, but they are both extremely busy. And I don't think they even attempt to focus on (a).
So I'm wondering: would anyone like to help here? It would mean * Soliciting and gathering together some more substantial benchmarks * Gathering performance numbers * Yelling quickly if the numbers go bad. * Investigating why (e.g. no one has seriously profiled GHC itself for a while, both space and time. I bet there are improvements to be had there.)
Maybe it could be part of the new buildbot team's work? Certainly it'd make sense to use the same nightly-build infrastructure.
Anyway, I'm advertising that there's an un-met need, and I'd love some help.
Let me second this. In particular, I think we regress on GHC performance regularly, because the perf tests just aren't a good way to prevent regressions. When we have a +/- 10% window, someone can commit a 9% regression without triggering a perf failure, but the next patch to come along with a 1% regression will be unfairly blamed. Furthermore, by the time we get to the perf tests we're nearly done and just want to push and go home, not go back and profile GHC. Yet the perf tests have an important purpose: the idea is to catch the problem when we have the crucial piece of information: the patch that caused the regression. Someone can try to optimise GHC later, but they have to start from scratch without the information about what caused the regressions. Having an automated system to track GHC performance would help a lot with this, I think. Cheers, Simon
Thanks
Simon
| -----Original Message----- | From: Michal J. Gajda [mailto:mjgajda@gmail.com] | Sent: 10 April 2014 07:26 | To: ghc-devs@haskell.org; Sergei Trofimovich; Manuel Chakravarty; Simon | Peyton Jones | Subject: Re: ghc-7.8-rc2 in -O2 mode eats all stack and RAM on pandoc- | ,citeproc / highlighting-kate | | Dear Devs, | | On 04/10/2014 09:09 AM, ghc-devs-request@haskell.org wrote: | > Filed the reproducer as a new ticket: | > https://ghc.haskell.org/trac/ghc/ticket/8980 | > | > [ Looks like highlighting-kate asks to be added to | > compiler performance benchmarks (are there such ones?) | > It tends to stress ghc all the time: | > http://hackage.haskell.org/trac/ghc/ticket/3664 | > ] | Please consider adding hPDB too, if you want to stress the optimizer. | It shows GHC optimizer at its best, with at least 10-20% improvement in | every major version of | the compiler since 6.12. Unfortunately at cost of very long compile | times. | Please let me know if I should submit a driver code for automatic | benchmarks it (it is in hPDB-examples.) Thanks! | >> > SpecConstr is too aggressive: it sometimes blows up the program | badly and we have no good | solution. See Trac #7898, #7068, #7944, #5550, #8836. | And #8960, where GHC runs out of memory. (Only in 7.8.) Should be easy | to reproduce by just `cabal install hPDB`. | >> > I notice that the latter three are actually fixed in 7.8, so worth | trying that. If it | still fails, do add instructions to reproduce to one of the above open | tickets, or make a new one. | >> > | >> > Meanwhile you can use -fno-spec-constr to simply switch it off for | offending modules. That should get you going. | -- | Best regards | Michal
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On Fri, Apr 11, 2014 at 10:58 AM, Simon Marlow
Let me second this. In particular, I think we regress on GHC performance regularly, because the perf tests just aren't a good way to prevent regressions. When we have a +/- 10% window, someone can commit a 9% regression without triggering a perf failure, but the next patch to come along with a 1% regression will be unfairly blamed.
Agreed. I think graphing results helps here. It's often easier to visualize identify which commit is the real culprit. Aside: I think moving completely to subrepos will generally help us track down regressions, both performance and correctness, faster. Being able to `git bisect` your way to the cause saves a lot of time. Furthermore, by the time we get to the perf tests we're nearly done and
just want to push and go home, not go back and profile GHC. Yet the perf tests have an important purpose: the idea is to catch the problem when we have the crucial piece of information: the patch that caused the regression. Someone can try to optimise GHC later, but they have to start from scratch without the information about what caused the regressions.
Having an automated system to track GHC performance would help a lot with this, I think.
Agree a 100%. Automation is what's needed here.

On 11/04/2014 11:55, Johan Tibell wrote:
On Fri, Apr 11, 2014 at 10:58 AM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: Let me second this. In particular, I think we regress on GHC performance regularly, because the perf tests just aren't a good way to prevent regressions. When we have a +/- 10% window, someone can commit a 9% regression without triggering a perf failure, but the next patch to come along with a 1% regression will be unfairly blamed.
Agreed. I think graphing results helps here. It's often easier to visualize identify which commit is the real culprit.
Aside: I think moving completely to subrepos will generally help us track down regressions, both performance and correctness, faster. Being able to `git bisect` your way to the cause saves a lot of time.
Furthermore, by the time we get to the perf tests we're nearly done and just want to push and go home, not go back and profile GHC. Yet the perf tests have an important purpose: the idea is to catch the problem when we have the crucial piece of information: the patch that caused the regression. Someone can try to optimise GHC later, but they have to start from scratch without the information about what caused the regressions.
Having an automated system to track GHC performance would help a lot with this, I think.
Agree a 100%. Automation is what's needed here.
Just to add one thing: when I wrote that above I was thinking primarily about the performance of GHC itself, but of course it all applies to both GHC and the code that GHC generates. Since the latter also affects the former, if we track both together we'll be able to see when we have changes in GHC performance that aren't related to changes in compiled code performance. I care a *lot* about the performance of GHC itself these days, the performance of GHC will directly impact how fast we can get code into production. Cheers, Simon
participants (4)
-
Johan Tibell
-
Michał J Gajda
-
Simon Marlow
-
Simon Peyton Jones