
This is great! Can it do alerts, e.g. send a mail to the list when a metric moves by a certain amount? Cheers, Simon On 16/07/2014 09:02, Joachim Breitner wrote:
Hi,
I guess it’s time to talk about this, especially as Richard just brought it up again...
I felt that we were seriously lacking in our grip on performance issues. We don’t even know whether 6.8.3 was better or worse than 6.8.3 or 7.6.4 in terms of nofib, not to speak of the effect of each single commit.
I want to change that, so I set up a benchmark monitoring dashboard. You can currently reach it at:
http://ghcspeed-nomeata.rhcloud.com/
What does it do? ~~~~~~~~~~~~~~~~
It monitors the repository (master branch only) and builds each commit, complete with the test suite and nofib. The log is saved and analyzed, and some numbers are extracted: * The build time * The test suite summary numbers * Runtime (if >1s), allocations and binary sizes of the nofib benchmarks
These are uploaded to the website above, which is powered by codespeed, a general performance dashboard, implemented in Python using Django.
Under _Changes_, it provides a report for each commit (changes wrt. to the previous version, and wrt. to 10 revisions earlier, the so-called “trend”). A summary of these reports is visible on the front-page.
The _Timeline_ is a graph for each individual performance number. If there are bumps, you can hopefully find them there! You can also compare to 7.8.3, which is available as a “baseline”.
_Comparison_ will be more useful if we have more tagged revision, or if were benchmarking various options (e.g. -fllvm): Here you can do bar-chart comparisons.
Why codespeed? ~~~~~~~~~~~~~~
For a long time I searched for a suitable software product, and one criterion is that it should be open source, rather simple to set up and mostly decoupled from other tools, i.e. something that I throw numbers at and which then displays them nicely. While I don’t think codespeed is the best performance dashboard out there (I find http://goperfd.appspot.com/perf a bit better; I wonder how well codespeed scales to even larger numbers of benchmarks and I wish it were more git-aware), it was the easiest to get started with. And thanks to the loose coupling of (1) running the tests to acquire a log, (2) parsing the log to get numbers and (3) putting them on a server, we can hopefully replace it when we come along something better. I was hoping for the Phabricator guys to have something in their tool suite, but doesn’t look like it.
How does it work (currently)? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My office PC is underused (I work on my laptop), so its currently dedicated to it. I have a simple shell script that monitors the repo for new versions. It builds the newest revision and works itself back to the commit where everything was turned into submodules: https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/watch.sh
It calls a script that does the actual building: https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/run-speed.sh This produces a log file which should contain all the required numbers somewhere.
A second script extracts these numbers (with help of nofib-analyze) and converts them into codespeed compatible JSON files: https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/log2json.pl
Finally, a simple invocation to curl uploads them to codespeed: https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/upload.sh
So if you want additional benchmarks to be tracked, make sure they are present in the logs and adjust log2json.pl. codespeed will automatically pick up new benchmarks in these logs. Reimplementations in Haskell are also welcome :-)
The testsuite is run with VERBOSE=4, so the performance numbers are also shown for failing test cases. So once a test case goes over the limit, you can grep through previous logs try to find the real culprit. I uploaded the logs (so far) to https://github.com/nomeata/ghc-speed-logs (but this is not automated yet, ping me if you need an update on this).
What next? ~~~~~~~~~~
Clearly, the current setup is only good enough to evaluate the system. Eventually, I might want to use my office PC again, and the free hosting on openshift is not very powerful.
So if we want to keep this setup and make it “official”, we need find a permanent solution.¹ This involves:
* A dedicated machine to run the benchmarks. This probably shouldn’t be a VM, if we want to keep the noise in the runtime down. * A machine to run the codespeed server. Can be a VM, or even run on any of the system that we have right now. Just needs a database (postgresql preferably) and a webserver supporting WSGI (i.e. any of them). * Maybe a better place to store the logs for public consumption.
Also, there are way to improve the system:
* As I said, I don’t think codespeed is the best. If we find something better, we can replace it. Since we have all the logs, we can easily fill the new system with the data, or even run both at the same time. * We might want to have more numbers. I am already putting lines-of-code and disk space usage numbers into the logs, but do not parse them yet. * In particular, we might want to put in each performance test case as a benchmark of its own, to easier find commits that degrade (or improve!) performance. I’m not sure how well the web page will handle that. * We might want to replace my rather simple watch.sh-script by something more serious. In particular, I imagine that our builder setup could manages this, with a dedicated builder doing the benchmark runs and the builder server scheduling a build for each commit.
That’s it for now. Enjoy clicking around!
Greetings, Joachim
¹ I guess that could be considered beta-reduction :-)
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs