
| > While writing a new nofib benchmark today I found myself wondering | > whether all the nofib benchmarks are run just before each release, I think we could do with a GHC Performance Tsar. Especially now that Simon has changed jobs, we need to try even harder to broaden the base of people who help with GHC. It would be amazing to have someone who was willing to: * Run nofib benchmarks regularly, and publish the results * Keep baseline figures for GHC 7.6, 7.4, etc so we can keep track of regressions * Investigate regressions to see where they come from; ideally propose fixes. * Extend nofib to contain more representative programs (as Johan is currently doing). That would help keep us on the straight and narrow. Any offers? It could be more than one person. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell- | users-bounces@haskell.org] On Behalf Of Simon Marlow | Sent: 30 November 2012 12:11 | To: Johan Tibell | Cc: glasgow-haskell-users | Subject: Re: Is the GHC release process documented? | | On 30/11/12 03:54, Johan Tibell wrote: | > While writing a new nofib benchmark today I found myself wondering | > whether all the nofib benchmarks are run just before each release, | > which the drove me to go look for a document describing the release | > process. A quick search didn't turn up anything, so I thought I'd ask | > instead. Is there a documented GHC release process? Does it include | > running nofib? If not, may I propose that we do so before each release | > and compare the result to the previous release*. | > | > * This likely means that nofib has to be run for the upcoming release | > and the prior release each time a release is made, as numbers don't | > translate well between machines so storing the results somewhere is | > likely not that useful. | | I used to do this on an ad-hoc basis: the nightly builds at MSR spit out | nofib results that I compared against previous releases. | | In practice you want to do this much earlier than just before a release, | because it can take time to investigate and squash any discrepancies. | | On the subject of the release process, I believe Ian has a checklist | that he keeps promising to put on the wiki (nudge :)). | | Cheers, | Simon | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Could we not configure travis-ci to run the benchmarks for us or something like that? A simple (free) ci setup would be easier than finding a pair of hands to do this regularly I would've thought.
On 30 Nov 2012, at 14:42, Simon Peyton-Jones
| > While writing a new nofib benchmark today I found myself wondering | > whether all the nofib benchmarks are run just before each release,
I think we could do with a GHC Performance Tsar. Especially now that Simon has changed jobs, we need to try even harder to broaden the base of people who help with GHC. It would be amazing to have someone who was willing to:
* Run nofib benchmarks regularly, and publish the results
* Keep baseline figures for GHC 7.6, 7.4, etc so we can keep track of regressions
* Investigate regressions to see where they come from; ideally propose fixes.
* Extend nofib to contain more representative programs (as Johan is currently doing).
That would help keep us on the straight and narrow.
Any offers? It could be more than one person.
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell- | users-bounces@haskell.org] On Behalf Of Simon Marlow | Sent: 30 November 2012 12:11 | To: Johan Tibell | Cc: glasgow-haskell-users | Subject: Re: Is the GHC release process documented? | | On 30/11/12 03:54, Johan Tibell wrote: | > While writing a new nofib benchmark today I found myself wondering | > whether all the nofib benchmarks are run just before each release, | > which the drove me to go look for a document describing the release | > process. A quick search didn't turn up anything, so I thought I'd ask | > instead. Is there a documented GHC release process? Does it include | > running nofib? If not, may I propose that we do so before each release | > and compare the result to the previous release*. | > | > * This likely means that nofib has to be run for the upcoming release | > and the prior release each time a release is made, as numbers don't | > translate well between machines so storing the results somewhere is | > likely not that useful. | | I used to do this on an ad-hoc basis: the nightly builds at MSR spit out | nofib results that I compared against previous releases. | | In practice you want to do this much earlier than just before a release, | because it can take time to investigate and squash any discrepancies. | | On the subject of the release process, I believe Ian has a checklist | that he keeps promising to put on the wiki (nudge :)). | | Cheers, | Simon | | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

| Could we not configure travis-ci to run the benchmarks for us or
| something like that? A simple (free) ci setup would be easier than
| finding a pair of hands to do this regularly I would've thought.
Of course automation is great. The pair of hands is still needed to figure out what to do, set it up, make sure it stays working, investigate regressions in performance. But it'd be silly to run nofib *manually* every time!
Simon
| -----Original Message-----
| From: Tim Watson [mailto:watson.timothy@gmail.com]
| Sent: 30 November 2012 15:51
| To: Simon Peyton-Jones
| Cc: Simon Marlow; Johan Tibell; glasgow-haskell-users
| Subject: Re: GHC Performance Tsar
|
| Could we not configure travis-ci to run the benchmarks for us or
| something like that? A simple (free) ci setup would be easier than
| finding a pair of hands to do this regularly I would've thought.
|
| On 30 Nov 2012, at 14:42, Simon Peyton-Jones

On Fri, 2012-11-30 at 15:51 +0000, Tim Watson wrote:
Could we not configure travis-ci to run the benchmarks for us or something like that? A simple (free) ci setup would be easier than finding a pair of hands to do this regularly I would've thought.
AFAIK Travis uses some IAAS service (EC2 if I'm not mistaken) to execute CI jobs. Experience has shown benchmark results from VMs (especially running on some public/shared IAAS service) are rather useless: there's huge variance between results, even when executing the exact same binaries, depending on host CPU and IO load (the latter even in case your benchmark doesn't perform any IO itself), time of day (which influences load), actual hardware your VM gets deployed on,... Just my .02, Nicolas

Hi Simon, I will try to find some time to set up a automatic run of nofib on my buildbot (which is powerful enough) and have it graph the results over time (and perhaps even email us when a benchmark dips). -- Johan

If Bryan and Johan are the Performance Tsars the future looks bright. Or at least fast. Thank you.
Simon
From: Bryan O'Sullivan [mailto:bos@serpentine.com]
Sent: 30 November 2012 16:53
To: Johan Tibell
Cc: Simon Peyton-Jones; glasgow-haskell-users@haskell.org
Subject: Re: GHC Performance Tsar
On Fri, Nov 30, 2012 at 8:48 AM, Johan Tibell

On Fri, Nov 30, 2012 at 09:38:10AM -0800, Johan Tibell wrote:
On Fri, Nov 30, 2012 at 9:11 AM, Simon Peyton-Jones
wrote: If Bryan and Johan are the Performance Tsars the future looks bright. Or at least fast. Thank you.
If someone could point me to the build bot script that we run today that would be a great start.
The code is at http://darcs.haskell.org/builder/ The config, including the build steps, is attached. Thanks Ian

Bryan O'Sullivan
writes:
On Fri, Nov 30, 2012 at 8:48 AM, Johan Tibell
wrote: I will try to find some time to set up a automatic run of nofib on my buildbot (which is powerful enough) and have it graph the results over time (and perhaps even email us when a benchmark dips).
I'll pitch in with this too.
I'd like to offer to help with benchmarking on Mac x86_64, if it would be useful to add another architecture to the mix. I just need a little hand- holding to get starting. -- John Wiegley FP Complete Haskell tools, training and consulting http://fpcomplete.com johnw on #haskell/irc.freenode.net

I can also offer a decently spec'd linux x86_64 machine, and a
functional OS X x86_64 Mountain Lion machine too. If possible I'll
offer my ARMv7 board as well, which currently fails late in the stage2
build on DPH. I haven't figured that one out just yet. All these can
all be available on a regular basis (for nightly builds or whatever)
with little interruption. Any ARM machine now is slow enough to where
builds would need to be once per day at best, anyway.
I was thinking of something like arewefastyet.com that Mozilla has for
JavaScript, instead comparing different GHC versions. CodeSpeed seems
to be that and much more, after looking at the PyPy speed website. It
looks really nice. If it can accept JSON requests for build results
from certain platforms, I think that tying it into the current builder
infrastructure (which runs nofib every night anyway from my
understanding) would be relatively easy, and save a lot of effort. It
looks like it tracks the differences between runs (and stores them in
a database,) so you wouldn't need to use nofib-analyze or anything,
and can just submit raw metrics.
On Fri, Nov 30, 2012 at 1:59 PM, John Wiegley
I'd like to offer to help with benchmarking on Mac x86_64, if it would be useful to add another architecture to the mix. I just need a little hand- holding to get starting.
You can find some information about the Builder infrastructure here, which currently controls the nightly build bots: http://hackage.haskell.org/trac/ghc/wiki/Builder - I imagine any solution will likely tie in with it (as Johan mentioned.) If you want to run nofib manually for fun to test results locally, there's this page: http://hackage.haskell.org/trac/ghc/wiki/Building/RunningNoFib -- Regards, Austin

On Fri, 2012-11-30 at 08:48 -0800, Johan Tibell wrote:
Hi Simon,
I will try to find some time to set up a automatic run of nofib on my buildbot (which is powerful enough) and have it graph the results over time (and perhaps even email us when a benchmark dips).
You might be interested in CodeSpeed [1], of which an instance runs at [2]. It supports different benchmark suites using different platforms/compiler(version)s/... across different hosts (which might be interesting for cross-architecture comparison?), and benchmark results can be submitted to the app using some HTTP call in JSON format. When integrated with some CI system, it can also point at commits related to a certain benchmark run. Nicolas [1] https://github.com/tobami/codespeed/ [2] http://speed.pypy.org/

This is something I'd be happy to help out with.
On 30 November 2012 11:48, Johan Tibell
Hi Simon,
I will try to find some time to set up a automatic run of nofib on my buildbot (which is powerful enough) and have it graph the results over time (and perhaps even email us when a benchmark dips).
-- Johan
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 01/12/2012, at 1:42 AM, Simon Peyton-Jones wrote:
| > While writing a new nofib benchmark today I found myself wondering | > whether all the nofib benchmarks are run just before each release,
I think we could do with a GHC Performance Tsar. Especially now that Simon has changed jobs, we need to try even harder to broaden the base of people who help with GHC. It would be amazing to have someone who was willing to:
* Run nofib benchmarks regularly, and publish the results
* Keep baseline figures for GHC 7.6, 7.4, etc so we can keep track of regressions
* Investigate regressions to see where they come from; ideally propose fixes.
* Extend nofib to contain more representative programs (as Johan is currently doing).
That would help keep us on the straight and narrow.
I was running a performance regression buildbot for a while a year ago, but gave it up because I didn't have time to chase down the breakages. At the time we were primarily worried about the asymptotic performance of DPH, and fretting about a few percent absolute performance was too much of a distraction. However: if someone wants to pick this up then they may get some use out of the code I wrote for it. The dph-buildbot package in the DPH repository should still compile. This package uses http://hackage.haskell.org/package/buildbox-1.5.3.1 which includes code for running tests, collecting the timings, comparing against a baseline, making pretty reports etc. There is then a second package buildbox-tools which has a command line tool for listing the benchmarks that have deviated from the baseline by a particular amount. Here is an example of a report that dph-buildbot made: http://log.ouroborus.net/limitingfactor/dph/nightly-20110809_000147.txt Ben.

I'm particularly interested in parallel performance in the >8 core space.
(In fact, we saw some regressions from 7.2->7.4 that we never tracked down
properly, but maybe can now.)
If the buildbot can make it easy to add a new "slave" machine that runs and
uploads its result to a central location, then I would be happy to donate a
few hours of dedicated time (no other logins) on a 32 core westmere
machine, and hopefully other architectures soon.
Maybe, this use case is well-covered by creating a jenkins/travis slave and
letting it move the data around? (CodeSpeed looks pretty nice too.)
Cheers,
-Ryan
On Wed, Dec 5, 2012 at 12:40 AM, Ben Lippmeier
On 01/12/2012, at 1:42 AM, Simon Peyton-Jones wrote:
| > While writing a new nofib benchmark today I found myself wondering | > whether all the nofib benchmarks are run just before each release,
I think we could do with a GHC Performance Tsar. Especially now that Simon has changed jobs, we need to try even harder to broaden the base of people who help with GHC. It would be amazing to have someone who was willing to:
* Run nofib benchmarks regularly, and publish the results
* Keep baseline figures for GHC 7.6, 7.4, etc so we can keep track of regressions
* Investigate regressions to see where they come from; ideally propose fixes.
* Extend nofib to contain more representative programs (as Johan is currently doing).
That would help keep us on the straight and narrow.
I was running a performance regression buildbot for a while a year ago, but gave it up because I didn't have time to chase down the breakages. At the time we were primarily worried about the asymptotic performance of DPH, and fretting about a few percent absolute performance was too much of a distraction.
However: if someone wants to pick this up then they may get some use out of the code I wrote for it. The dph-buildbot package in the DPH repository should still compile. This package uses http://hackage.haskell.org/package/buildbox-1.5.3.1 which includes code for running tests, collecting the timings, comparing against a baseline, making pretty reports etc. There is then a second package buildbox-tools which has a command line tool for listing the benchmarks that have deviated from the baseline by a particular amount.
Here is an example of a report that dph-buildbot made:
http://log.ouroborus.net/limitingfactor/dph/nightly-20110809_000147.txt
Ben.
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (11)
-
Austin Seipp
-
Ben Lippmeier
-
Bryan O'Sullivan
-
David Terei
-
Ian Lynagh
-
Johan Tibell
-
John Wiegley
-
Nicolas Trangez
-
Ryan Newton
-
Simon Peyton-Jones
-
Tim Watson