A small but useful tool for performance characterisation

Hi everyone, I have recently been doing a fair amount of performance characterisation and have long wanted a convenient means of collecting GHC runtime statistics for later analysis. For this I quickly developed a small wrapper utility [1]. To see what it does, let's consider an example. Say we made a change to GHC which we believe might affect the runtime performance of Program.hs. We could quickly check this by running, $ ghc-before/_build/stage1/bin/ghc -O Program.hs $ ghc_perf.py -o before.json ./Program $ ghc-before/_build/stage1/bin/ghc -O Program.hs $ ghc_perf.py -o after.json ./Program This will produce two files, before.json and after.json, which contain the various runtime statistics emitted by +RTS -s --machine-readable. These files are in the same format as is used by my nofib branch [2] and therefore can be compared using `nofib-compare` from that branch. In addition to being able to collect runtime metrics, ghc_perf is also able to collect performance counters (on Linux only) using perf. For instance, $ ghc_perf.py -o program.json \ -e instructions,cycles,cache-misses ./Program will produce program.json containing not only RTS statistics but also event counts from the perf instructions, cycles, and cache-misses events. Alternatively, passing simply `ghc_perf.py --perf` enables a reasonable default set of events (namely instructions, cycles, cache-misses, branches, and branch-misses). Finally, ghc_perf can also handle repeated runs. For instance, $ ghc_perf.py -o program.json -r 5 --summarize \ -e instructions,cycles,cache-misses ./Program will run Program 5 times, emit all of the collected samples to program.json, and produce a (very basic) statistical summary of what it collected on stdout. Note that there are a few possible TODOs that I've been considering: * I chose JSON as the output format to accomodate structured data (e.g. capture experimental parameters in a structured way). However, in practice this choice has lead to significantly more inconvenience than I would like, especially given that so far I've only used the format to capture basic key/value pairs. Perhaps reverting to CSV would be preferable. * It might be nice to also add support for cachegrind. Anyways, I hope that others find this as useful as I have. Cheers, - Ben [1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py [2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24

Hi Ben, This sounds great. Is there a place on the wiki to catalog tools like this? Thanks for telling us about it! Richard
On Jan 4, 2020, at 7:37 PM, Ben Gamari
wrote: Hi everyone,
I have recently been doing a fair amount of performance characterisation and have long wanted a convenient means of collecting GHC runtime statistics for later analysis. For this I quickly developed a small wrapper utility [1].
To see what it does, let's consider an example. Say we made a change to GHC which we believe might affect the runtime performance of Program.hs. We could quickly check this by running,
$ ghc-before/_build/stage1/bin/ghc -O Program.hs $ ghc_perf.py -o before.json ./Program $ ghc-before/_build/stage1/bin/ghc -O Program.hs $ ghc_perf.py -o after.json ./Program
This will produce two files, before.json and after.json, which contain the various runtime statistics emitted by +RTS -s --machine-readable. These files are in the same format as is used by my nofib branch [2] and therefore can be compared using `nofib-compare` from that branch.
In addition to being able to collect runtime metrics, ghc_perf is also able to collect performance counters (on Linux only) using perf. For instance,
$ ghc_perf.py -o program.json \ -e instructions,cycles,cache-misses ./Program
will produce program.json containing not only RTS statistics but also event counts from the perf instructions, cycles, and cache-misses events. Alternatively, passing simply `ghc_perf.py --perf` enables a reasonable default set of events (namely instructions, cycles, cache-misses, branches, and branch-misses).
Finally, ghc_perf can also handle repeated runs. For instance,
$ ghc_perf.py -o program.json -r 5 --summarize \ -e instructions,cycles,cache-misses ./Program
will run Program 5 times, emit all of the collected samples to program.json, and produce a (very basic) statistical summary of what it collected on stdout.
Note that there are a few possible TODOs that I've been considering:
* I chose JSON as the output format to accomodate structured data (e.g. capture experimental parameters in a structured way). However, in practice this choice has lead to significantly more inconvenience than I would like, especially given that so far I've only used the format to capture basic key/value pairs. Perhaps reverting to CSV would be preferable.
* It might be nice to also add support for cachegrind.
Anyways, I hope that others find this as useful as I have.
Cheers,
- Ben
[1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py [2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24 _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

There is the "useful tools" page [1] which has mentioned the ghc-utils repository where the aforementioned script lives for a few years now. That being said, I get the impression that not many people have found it via this page. Everyone who I know of who has used anything in ghc-utils has discovered it via word of mouth.
I'm not sure what to do about this. The page isn't *that* buried: from the wiki home page one arrives at it via the link path Working Conventions/Various tools.
Cheers,
- Ben
On January 4, 2020 8:51:07 PM EST, Richard Eisenberg
Hi Ben,
This sounds great. Is there a place on the wiki to catalog tools like this?
Thanks for telling us about it! Richard
On Jan 4, 2020, at 7:37 PM, Ben Gamari
wrote: Hi everyone,
I have recently been doing a fair amount of performance characterisation and have long wanted a convenient means of collecting GHC runtime statistics for later analysis. For this I quickly developed a small wrapper utility [1].
To see what it does, let's consider an example. Say we made a change to GHC which we believe might affect the runtime performance of Program.hs. We could quickly check this by running,
$ ghc-before/_build/stage1/bin/ghc -O Program.hs $ ghc_perf.py -o before.json ./Program $ ghc-before/_build/stage1/bin/ghc -O Program.hs $ ghc_perf.py -o after.json ./Program
This will produce two files, before.json and after.json, which contain the various runtime statistics emitted by +RTS -s --machine-readable. These files are in the same format as is used by my nofib branch [2] and therefore can be compared using `nofib-compare` from that branch.
In addition to being able to collect runtime metrics, ghc_perf is also able to collect performance counters (on Linux only) using perf. For instance,
$ ghc_perf.py -o program.json \ -e instructions,cycles,cache-misses ./Program
will produce program.json containing not only RTS statistics but also event counts from the perf instructions, cycles, and cache-misses events. Alternatively, passing simply `ghc_perf.py --perf` enables a reasonable default set of events (namely instructions, cycles, cache-misses, branches, and branch-misses).
Finally, ghc_perf can also handle repeated runs. For instance,
$ ghc_perf.py -o program.json -r 5 --summarize \ -e instructions,cycles,cache-misses ./Program
will run Program 5 times, emit all of the collected samples to program.json, and produce a (very basic) statistical summary of what it collected on stdout.
Note that there are a few possible TODOs that I've been considering:
* I chose JSON as the output format to accomodate structured data (e.g. capture experimental parameters in a structured way). However, in practice this choice has lead to significantly more inconvenience than I would like, especially given that so far I've only used the format to capture basic key/value pairs. Perhaps reverting to CSV would be preferable.
* It might be nice to also add support for cachegrind.
Anyways, I hope that others find this as useful as I have.
Cheers,
- Ben
[1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py [2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24 _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
participants (2)
-
Ben Gamari
-
Richard Eisenberg