
Hi, Am Samstag, den 05.05.2018, 12:33 -0400 schrieb Daniel Cartwright:
I write this out of curiosity, as well as concern, over how this may affect GHC.
our performance measurements are pretty non-scientific. For many decades, developers just ran our benchmark suite (nofib) before and after their change, hopefully on a cleanly built working copy, and pasted the most interesting numbers in the commit logs. Maybe some went for coffee to have an otherwise relatively quiet machine (or have some remote setup), maybe not. In the end, the run-time performance numbers are often ignored and we we focus on comparing the effects of *dynamic heap allocations*, which are much more stable across different environments, and which we believe are a good proxy for actual performance, at least for the kind of high-level optimizations that we work on in the core-to-core pipeline. But this assumption is folklore, and not scientifically investigated. Since two years or so we started collecting performance numbers for every commit to the GHC repository, and I wrote a tool to print comparisons: https://perf.haskell.org/ghc/ This runs on a dedicated physical machine, and still the run-time numbers were varying too widely and gave us many false warnings (and probably reported many false improvements which we of course were happy to believe). I have since switched to measuring only dynamic instruction counts with valgrind. This means that we cannot detect improvement or regressions due to certain low-level stuff, but we gain the ability to reliably measure *something* that we expect to change when we improve (or accidentally worsen) the high-level transformations. I wish there were a better way of getting a reliable, stable number that reflects the actual performance. Cheers, Joachim -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/