RTS changes affect runtime when they shouldn’t

20 Sep 2017

      Hi,

while keeping an eye on the performance numbers, I notice a pattern
where basically any change to the rts makes some benchmarks go up or
down by a significant percentage. Recent example:
https://git.haskell.org/ghc.git/commitdiff/0aba999f60babe6878a1fd2cc84101393...
which exposed an additional secure modular power function in integer
(and should really not affect any of our test cases) causes these
changes:

Benchmark name 	prev 	change 		now 	
nofib/time/FS 	0.434 	-  4.61% 	0.414 	seco
nds
nofib/time/VS 	0.369 	+ 15.45% 	0.426 	seco
nds
nofib/time/scs 	0.411 	-  4.62% 	0.392 	sec
onds
https://perf.haskell.org/ghc/#revision/0aba999f60babe6878a1fd2cc8410139
358cad16

The new effBench benchmarks (FS, VS) are particularly often
affected, but also old friends like scs, lambda, integer…

In a case like this I can see that the effect is spurious, but it
really limits our ability to properly evaluate changes to the compiler
– in some cases it makes us cheer about improvements that are not
really there, in other cases it makes us hunt for ghosts.

Does anyone have a solid idea what is causing these differences? Are
they specific to the builder for perf.haskell.org, or do you observe
them as well? And what can we do here?

For the measurements in my thesis I switched to measuring instruction
counts (using valgrind) instead. These are much more stable, requires
only a single NoFibRun, and the machine does not have to be otherwise
quiet. Should I start using these on perf.haskell.org? Or would we lose
too much by not tracking actual running times any more?

Greetings,
Joachim

-- 
Joachim Breitner
  mail@joachim-breitner.de
  http://www.joachim-breitner.de/