Re: Perf regression: ghc --make: add nicer names to RTS threads (threaded IO manager, make workers) (f686682)

On Wed, 6 Aug 2014 22:15:34 +0300
Sergei Trofimovich
On Wed, 06 Aug 2014 09:30:45 +0200 Joachim Breitner
wrote: Hi,
the attached commit seems to have regressed the scs nofib benchmark by ~3%: http://ghcspeed-nomeata.rhcloud.com/timeline/?ben=nofib/time/scs&env=1#/?exe=2&base=2+68&ben=nofib/time/scs&env=1&revs=50&equid=on
That's a test of compiled binary performance, not the compiler, right?
The graph unfortunately is in the wrong order, as the tool gets confused by timezones and by commits with identical CommitDate, e.g. due to rebasing. This needs to be fixed, I manually verified that the commit below is the first that shows the above-noise-level-increase of runtime.
(Other benchmarks seem to be unaffected.)
Is this regression expected and intended or unexpected? Is it fixable? Or is is this simply inexplicable?
The graph looks mysterious (18 ms bump). Bencmark does not use haskell threads at all.
I'll try to reproduce degradation locally and will investigate.
I think I know what happens. According to perf the benchmark spends 34%+ of time in garbage collection ('perf record -- $args'/'perf report'): 27,91% test test [.] evacuate 9,29% test test [.] s9Lz_info 7,46% test test [.] scavenge_block And the whole benchmark runs a tiny bit more than 300ms. It is exactly in line with major GC timer (0.3s). If we run $ time ./test inverter 345 10n 4u 1>/dev/null multiple times there is heavy instability in there (with my patch reverted): real 0m0.319s real 0m0.305s real 0m0.307s real 0m0.373s real 0m0.381s which is +/- 80ms drift! Let's try to kick major GC earlier instead of running right at runtime shutdown time: $ time ./test inverter 345 10n 4u +RTS -I0.1 1>/dev/null real 0m0.304s real 0m0.308s real 0m0.302s real 0m0.304s real 0m0.308s real 0m0.306s real 0m0.305s real 0m0.312s which is way more stable behaviour. Thus my theory is that my changed stepped from "90% of time 1 GC run per run" to "90% of time 2 GC runs per run" -- Sergei

Hi, Am Mittwoch, den 06.08.2014, 23:40 +0300 schrieb Sergei Trofimovich:
I think I know what happens. According to perf the benchmark spends 34%+ of time in garbage collection ('perf record -- $args'/'perf report'):
27,91% test test [.] evacuate 9,29% test test [.] s9Lz_info 7,46% test test [.] scavenge_block
And the whole benchmark runs a tiny bit more than 300ms. It is exactly in line with major GC timer (0.3s).
If we run $ time ./test inverter 345 10n 4u 1>/dev/null multiple times there is heavy instability in there (with my patch reverted): real 0m0.319s real 0m0.305s real 0m0.307s real 0m0.373s real 0m0.381s which is +/- 80ms drift!
Let's try to kick major GC earlier instead of running right at runtime shutdown time: $ time ./test inverter 345 10n 4u +RTS -I0.1 1>/dev/null
real 0m0.304s real 0m0.308s real 0m0.302s real 0m0.304s real 0m0.308s real 0m0.306s real 0m0.305s real 0m0.312s which is way more stable behaviour.
Thus my theory is that my changed stepped from "90% of time 1 GC run per run" to "90% of time 2 GC runs per run"
great analysis, thanks. I think in this case we should not worry about it. From a QA perspective we are already doing well if we consider the apparent regressions. If we can explain them and consider it acceptable, then it’s fine. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

On 06/08/2014 21:40, Sergei Trofimovich wrote:
I think I know what happens. According to perf the benchmark spends 34%+ of time in garbage collection ('perf record -- $args'/'perf report'):
27,91% test test [.] evacuate 9,29% test test [.] s9Lz_info 7,46% test test [.] scavenge_block
And the whole benchmark runs a tiny bit more than 300ms. It is exactly in line with major GC timer (0.3s).
0.3s is the *idle* GC timer, it has no effect when the program is running normally. There's no timed GC or anything like that. It sometimes happens that a tiny change somewhere tips a program over into doing one more major GC, though.
If we run $ time ./test inverter 345 10n 4u 1>/dev/null multiple times there is heavy instability in there (with my patch reverted): real 0m0.319s real 0m0.305s real 0m0.307s real 0m0.373s real 0m0.381s which is +/- 80ms drift!
Let's try to kick major GC earlier instead of running right at runtime shutdown time: $ time ./test inverter 345 10n 4u +RTS -I0.1 1>/dev/null
real 0m0.304s real 0m0.308s real 0m0.302s real 0m0.304s real 0m0.308s real 0m0.306s real 0m0.305s real 0m0.312s which is way more stable behaviour.
Thus my theory is that my changed stepped from "90% of time 1 GC run per run" to "90% of time 2 GC runs per run"
Is this program idle? I have no idea why this might be happening! If the program is busy computing stuff, the idle GC should not be firing. If it is, that's a bug. Cheers, Simon

On Thu, 07 Aug 2014 15:42:07 +0100
Simon Marlow
On 06/08/2014 21:40, Sergei Trofimovich wrote:
I think I know what happens. According to perf the benchmark spends 34%+ of time in garbage collection ('perf record -- $args'/'perf report'):
27,91% test test [.] evacuate 9,29% test test [.] s9Lz_info 7,46% test test [.] scavenge_block
And the whole benchmark runs a tiny bit more than 300ms. It is exactly in line with major GC timer (0.3s).
0.3s is the *idle* GC timer, it has no effect when the program is running normally. There's no timed GC or anything like that.
It sometimes happens that a tiny change somewhere tips a program over into doing one more major GC, though.
If we run $ time ./test inverter 345 10n 4u 1>/dev/null multiple times there is heavy instability in there (with my patch reverted): real 0m0.319s real 0m0.305s real 0m0.307s real 0m0.373s real 0m0.381s which is +/- 80ms drift!
Let's try to kick major GC earlier instead of running right at runtime shutdown time: $ time ./test inverter 345 10n 4u +RTS -I0.1 1>/dev/null
real 0m0.304s real 0m0.308s real 0m0.302s real 0m0.304s real 0m0.308s real 0m0.306s real 0m0.305s real 0m0.312s which is way more stable behaviour.
Thus my theory is that my changed stepped from "90% of time 1 GC run per run" to "90% of time 2 GC runs per run"
Is this program idle? I have no idea why this might be happening! If the program is busy computing stuff, the idle GC should not be firing. If it is, that's a bug.
The task is a completely CPU-bound thing. Then I was very lucky to get that change with -I option. I'll try to find exact place where profiling times change (or just float), but it will take more time. -- Sergei
participants (3)
-
Joachim Breitner
-
Sergei Trofimovich
-
Simon Marlow