New subject: Perf regression: ghc --make: add nicer names to RTS threads (threaded IO manager, make workers) (f686682)

6 Aug 2014

      On Wed, 6 Aug 2014 22:15:34 +0300
Sergei Trofimovich  wrote:
...
On Wed, 06 Aug 2014 09:30:45 +0200
Joachim Breitner  wrote:
...
Hi,
the attached commit seems to have regressed the scs nofib benchmark by
~3%:
http://ghcspeed-nomeata.rhcloud.com/timeline/?ben=nofib/time/scs&env=1#/?exe=2&base=2+68&ben=nofib/time/scs&env=1&revs=50&equid=on
That's a test of compiled binary performance, not the compiler, right?
...
The graph unfortunately is in the wrong order, as the tool gets confused
by timezones and by commits with identical CommitDate, e.g. due to
rebasing. This needs to be fixed, I manually verified that the commit
below is the first that shows the above-noise-level-increase of runtime.
(Other benchmarks seem to be unaffected.)
Is this regression expected and intended or unexpected? Is it fixable?
Or is is this simply inexplicable?
The graph looks mysterious (18 ms bump). Bencmark does not use haskell threads at all.
I'll try to reproduce degradation locally and will investigate.
I think I know what happens. According to perf the benchmark spends 34%+
of time in garbage collection ('perf record -- $args'/'perf report'):

  27,91%  test  test               [.] evacuate                                                                                
   9,29%  test  test               [.] s9Lz_info                                                                               
   7,46%  test  test               [.] scavenge_block

And the whole benchmark runs a tiny bit more than 300ms.
It is exactly in line with major GC timer (0.3s).

If we run
    $ time ./test inverter 345 10n 4u 1>/dev/null
multiple times there is heavy instability in there (with my patch reverted):
    real    0m0.319s
    real    0m0.305s
    real    0m0.307s
    real    0m0.373s
    real    0m0.381s
which is +/- 80ms drift!

Let's try to kick major GC earlier instead of running right at runtime
shutdown time:
    $ time ./test inverter 345 10n 4u +RTS -I0.1 1>/dev/null

    real    0m0.304s
    real    0m0.308s
    real    0m0.302s
    real    0m0.304s
    real    0m0.308s
    real    0m0.306s
    real    0m0.305s
    real    0m0.312s
which is way more stable behaviour.

Thus my theory is that my changed stepped from
  "90% of time 1 GC run per run"
to
  "90% of time 2 GC runs per run"

-- 

  Sergei

Re: Perf regression: ghc --make: add nicer names to RTS threads (threaded IO manager, make workers) (f686682)

Sergei Trofimovich

Joachim Breitner

Simon Marlow

Sergei Trofimovich

tags

participants (3)