Re: Measuring compiler performance

8 Apr 2020

      Many thanks, Richard, Andreas, Joachim, and Ben, for your responses! I
have a few things to try now. :)
...
* what I call the "Cabal test"; namely:
$ _build/stage1/bin/ghc -O -ilibraries/Cabal/Cabal \
          libraries/Cabal/Cabal/Setup.hs +RTS -s
Thanks for spelling it out like that, Ben! I'm slightly embarrassed to
say that I hadn't been aware that I could use GHC directly in this way
to build a package!

Andreas, you wrote:
...
In general I only compile as linking adds overhead which isn't really part of GHC.
How do I tell GHC to build e.g. nofib/spectral/simple/Main.hs or Cabal
without linking?

I'll eventually try to distill a wiki page from all this!

Cheers,
Simon
...
* My WIP nofib branch [1] makes nofib much faster and easier to work
   with and adds the ability to measure perf counters, in addition to
   the usual RTS and cachegrind statistics.
* My nofib branch produces output in a uniform, easy to consume format
   and provides a tool for comparing sets of measurements in this format.
* My ghc_perf tool [2] is very useful for extracting runtime and perf
   statistics from Haskell program runs; furthermore, it produces output
   in the same format as expected by the aforementioned nofib-compare
   utility.
* I have a utility [3] which I use to reproducibly build a set of
   branches, run the testsuite, nofib, and the Cabal test on each of
   them. Admittedly it could use a bit of cleanup but it does its job
   reasonably well, making performance measurement a "set it and forget
   it" sort of task.
* We collect and record a complete set of testsuite statistics (saved
   to git notes 43]); however, we currently do not import these into
   gipeda.
* We don't currently have a box which can measure reliable timings
   (since our builders are nearly all virtualised cloud instances). I'm
   going to need to do some shuffling to change this.
* One potentially useful source of performance information (which sadly
   we currently do not exploit) is the -ddump-timing output produced
   during head.hackage runs.
[1] https://gitlab.haskell.org/ghc/nofib/merge_requests/24
[2] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py
[3] https://gitlab.haskell.org/bgamari/ghc-utils/-/tree/master/build-all
[4] https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/performanc...
...
A problem in this context is that reliable performance measurements
require a quiet machine. Closing my browser, and turning off other
programs is – in my perception – rather inconvenient, particularly
when I have to do it for a prolonged time.
Ideally I wouldn't have to perform these measurements on my local
machine at all! Do you usually use a separate machine for this? _Very_
convenient would be some kind of bot whom I could tell e.g.
Indeed it is inconvenient. I am in the lucky situation that I have
another machine locally that can be made reasonably quiet without
interfering with my worflow. However, in general
...
@perf-bot compiler perf
…or more concretely
@perf-bot compile nofib/spectral/simple/Main.hs
…or just
@nofib-bot run
… or something like that.
I've noticed that CI now includes a perf-nofib job. But since it
appears to run on a different machine each time, I'm not sure whether
it's actually useful for comparing performance. Could it be made more
useful by running it consistently on the same dedicated machine?
Indeed, we currently don't have a dedicated machine for timings.
However, allocations and executable sizes are still useful.
Nevertheless, as noted above I think that we should make more of an
effort to measure time. I need to do some shuffling of our runners so we
have a quiet bare-metal which can be dedicated to performance
measurement. I'll try to get to this in the next day or so.
...
Another question regarding performing compiler perf measurements
locally is which build flavour to use: So far I have used the "perf"
flavour. A problem here is that a full build seems to take close to an
hour. A rebuild with --freeze1 takes ~15 minutes on my machine. Is
this the right flavour to use?
I think perf is the best option for performance measurement (afterall,
we want to know what users would see). However, it is indeed a bit
painful.
...
BTW what's the purpose of the profiled GHC modules built with this
flavour which just seem to additionally prolong compile time? I don't
see a ghc-prof binary or similar in _build/stage1/bin.
Indeed; there is little sense in building profiled modules just for
performance measurement. However, I don't believe we currently have a
build flavour which provides comparable optimisation but without the
profiled way. Perhaps we should add one.
...
Also, what's the status of gipeda? The most recent commit at
https://perf.haskell.org/ghc/ is from "about a year ago"?
Indeed the machine which was previously providing gipeda builds is sadly
no longer around; consequently it's on ice at the moment. I would like
to get it going again but recently correctness issues have been taking
up more time than I would like to admit.
...
Sorry for this load of questions and complaints! I do believe though
that if work on compiler performance was a bit better documented and
more convenient, we might see even more progress on that front. :)
Quite alright! Typing out the points above made me realize that there is
indeed quite a bit of knowledge that the wiki leaves un-said.
Cheers,
- Ben

Re: Measuring compiler performance

Simon Jakobi