Re: [GHC] #15999: Stabilise nofib runtime measurements

21 Dec 2018

      #15999: Stabilise nofib runtime measurements
-------------------------------------+-------------------------------------
        Reporter:  sgraf             |                Owner:  (none)
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:  ⊥
       Component:  NoFib benchmark   |              Version:  8.6.2
  suite                              |
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5793 #9476       |  Differential Rev(s):  Phab:D5438
  #15333 #15357                      |
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by sgraf:

Old description:
...
With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still
many benchmarks that are unstable. I identified three causes for
unstability in https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38.
With system overhead mostly out of the equation, there are still two
related tasks left:
1. Identify benchmarks with GC wibbles. Plan: Look at counted
instructions while varying heap size with just one generation. A wibbling
benchmark should have quite diverse sampled maximum residency (as opposed
to a microbenchmark, which should have quite stable instruction count).
Then fix these by iterating `main` 'often enough'. Maybe look at total
bytes allocated for that, we want this to be monotonically declining as
the initial heap size grows.
2. Now, all benchmarks should have stable instruction count. If not,
maybe there's another class of benchmarks I didn't identify yet in #5793.
Of these benchmarks, there are a few, like `real/eff/CS`, that still have
highly unstable runtimes. Fix these 'microbenchmarks' by hiding them
behind a flag.
New description:

 With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still
 many benchmarks that are unstable in one way or another. I identified
 three causes for unstability in
 https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38. With system
 overhead mostly out of the equation, there are still two related tasks
 left:

 1. Identify benchmarks with GC wibbles. Plan: Look at how productivity
 rate changes while increasing gen 0 heap size. A GC-sensitive benchmark
 should have a non-monotonic or discontinuous productivity-rate-over-
 nursery-size curve. Then fix these by iterating `main` often enough for
 the curve to become smooth and monotone.
 2. Now, all benchmarks should have monotonically decreasing instruction
 count for increasing nursery sizes. If not, maybe there's another class of
 benchmarks I didn't identify yet in #5793. Of these benchmarks, there are
 a few, like `real/eff/CS`, that still have highly code layout-sensitive
 runtimes. Fix these 'microbenchmarks' by hiding them behind a flag.

--

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15999#comment:8
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler