
Hi, while keeping an eye on the performance numbers, I notice a pattern where basically any change to the rts makes some benchmarks go up or down by a significant percentage. Recent example: https://git.haskell.org/ghc.git/commitdiff/0aba999f60babe6878a1fd2cc84101393... which exposed an additional secure modular power function in integer (and should really not affect any of our test cases) causes these changes: Benchmark name prev change now nofib/time/FS 0.434 - 4.61% 0.414 seco nds nofib/time/VS 0.369 + 15.45% 0.426 seco nds nofib/time/scs 0.411 - 4.62% 0.392 sec onds https://perf.haskell.org/ghc/#revision/0aba999f60babe6878a1fd2cc8410139 358cad16 The new effBench benchmarks (FS, VS) are particularly often affected, but also old friends like scs, lambda, integer… In a case like this I can see that the effect is spurious, but it really limits our ability to properly evaluate changes to the compiler – in some cases it makes us cheer about improvements that are not really there, in other cases it makes us hunt for ghosts. Does anyone have a solid idea what is causing these differences? Are they specific to the builder for perf.haskell.org, or do you observe them as well? And what can we do here? For the measurements in my thesis I switched to measuring instruction counts (using valgrind) instead. These are much more stable, requires only a single NoFibRun, and the machine does not have to be otherwise quiet. Should I start using these on perf.haskell.org? Or would we lose too much by not tracking actual running times any more? Greetings, Joachim -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/