Feedback request regarding HSOC project (bringing sanity to the GHC performance test-suite)

1 Sep 2017

      Hey y'all,

A quick ToC before I dive right in:

* What my HSOC project is on
* My progress so far
* Feedback welcome
* What I have left to do
* Theoretical potential improvements

-----------

My HSOC project was on bringing sanity to the GHC performance test-suite.
My blog post on this is here:
https://jaredweakly.com/blog/haskell-summer-of-code/
The Trac ticket that corresponds to this is here:
https://ghc.haskell.org/trac/ghc/ticket/12758
The Phabricator ticket for this patch: https://phabricator.haskell.org/D3758

The tl;dr of my HSOC project is that GHC's performance tests currently
require the programmer to add in expected numbers manually, updated
them, handhold the testsuite, etc. This is a bit absurd and my
project's overall aim is to reduce the effort required of the
programmer to as close to zero as possible while simultaneously
increasing the potential ability of the testsuite to catch regressions
as much as possible.

------------

My progress so far:
 - I have a few comparison tools in perf_notes.py. These allow people
to compare performance numbers of tests across commits
 - I have all the performance numbers generated by running the tests
automatically stored in git notes and referenced by both the
comparison tool and the testsuite
 - I have refactored the testsuite to use my new code that pulls
expected numbers automatically from git notes (trivially passing if
the note does not yet exist for that test), then it compares that
expected number with the number that was gotten from running the
testsuite on the latest commit. The comparison passes if it's within a
certain deviation (20% by default, but can be customized by the
programmer).
 - I have refactored all of the all.T files to use the new comparison
functions for the performance tests and ensured that this doesn't
break any existing tests.

------------

Anyone who wants to checkout the wip/perf-testsuite and try this out
is more than welcome. Feedback on anything is welcome; comments are
appreciated; discussion is welcome, etc.

-------------

What I have left to do is:

1. Finish writing up the documentation
2. Update the wiki in all the relevant places concerning
additions/modifications to the testsuite and test driver
3. Make sure everyone is happy with the change (and make small changes
as necessary)

--------------

Possible features and improvements I am thinking about adding in:
* As a stopgap to full integration with performance tracking tools
(such as Gipedia), optionally emitting a test warning with the test
summary if there is any regression detected whatsoever (even if the
number falls within the allowed deviation)
* Some tests, such as T7702, have a somewhat nonsensical regression
percentage. Ideally the testsuite could handle those better. I could
potentially build in multiple ways to determine a regression
(percentage, 'above a certain value', 'taking longer than X amount of
time', as potential examples)
* Currently some tests require installing some Haskell packages; they
are skipped if the packages are not installed. I could try to build in
a way to automatically attempt to install all necessary Haskell
packages if someone attempts to run a test that requires them.
(Perhaps using a command such as 'make test exhaustive')
* The performance metric 'peak_megabytes' is sometimes not accurate
enough; I could see if adding something like `RTS -h -i0.01`
automatically to tests that use 'peak_megabytes' would resolve that.
Currently it is a manual debugging step.

Any thoughts? Comments? Questions?

Regards,
Jared Weakly