potential for GHC benchmarks w.r.t. optimisations being incorrect

5 May 2018

      I am admittedly unsure of how GHC's optimisation benchmarks are currently
implemented/carried out, but I feel as though this paper and its findings
could be relevant to GHC devs:

http://cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf

Basically, according to this paper, the cache effects of changing where the
stack starts based on the number of environment variables are huge for many
compiler benchmarks, and adjusting for this effect shows that gcc -O3 is
only in actuality 1% faster than gcc -O2.

Some further thoughts, per http://aftermath.rocks/2016/04/11/wrong_data/ :

"The question they looked at was the following: does the compiler’s -O3
optimization flag result in speedups over -O2? This question is
investigated in the light of measurement biases caused by two sources: Unix
environment size, and linking order.
to the total size of the representation of Unix environment variables (such
as PATH, HOME, etc.). Typically, these variables are part of the memory
image of each process. The call stack begins where the environment ends.
This gives rise to the following hypothesis: changing the sizes of
(unused!) environment variables can change the alignment of variables on
the stack and thus the performance of the program under test due to
different behavior of hardware buffers such as caches or TLBs. (This is the
source of the hypothetical example in the first paragraph, which I made up.
On the machine where I am typing this, my user name appears in 12 of the
environment variables that are set by default. All other things being
equal, another user with a user name of a different length will have an
environment size that differs by a multiple of 12 bytes.)"

"So does this hypothesis hold? Yes. Using a simple computational kernel the
authors observe that changing the size of the environment can often cause a
slowdown of 33% and, in one particular case, by 300%. On larger benchmarks
the effects are less pronounced but still present. Using the C programs
from the standard SPEC CPU2006 benchmark suite, the effects of -O2 and -O3
optimizations were compared across a wide range of environment sizes. For
several of the programs a wide range of variations was observed, and the
results often included both positive and negative observations. The effects
were not correlated with the environment size. All this means that for some
benchmarks, a compiler engineer might by accident test a purported
optimization in a lucky environment and observe a 10% speedup, while users
of the same optimization in an unlucky environment may have a 10% slowdown
on the same workload."

I write this out of curiosity, as well as concern, over how this may affect
GHC.

Daniel Cartwright

Joachim Breitner

Andreas Klebinger

Joachim Breitner

Sven Panne

tags

participants (4)