[GHC] #8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF ------------------------------+-------------------------------------------- Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime performance bug Unknown/Multiple | Test Case: Difficulty: Unknown | Blocking: Blocked By: | Related Tickets: | ------------------------------+-------------------------------------------- The code generator unnecessarily spills and reloads function arguments if the scrutinee turns out to be already evaluated (i.e. has non-zero tag bits). Here's the beginning of a function body, taken from the `insert` function at https://github.com/tibbe/unordered- containers/blob/master/Data/HashMap/Base.hs#L303: {{{ c2wQ: // stack check if ((Sp + -72) < SpLim) goto c2wR; else goto c2wS; c2wR: // stack check failure R1 = PicBaseReg + $wpoly_go_closure; I64[Sp - 40] = R2; I64[Sp - 32] = R3; P64[Sp - 24] = R4; I64[Sp - 16] = R5; P64[Sp - 8] = R6; Sp = Sp - 40; call (I64[BaseReg - 8])(R1) args: 48, res: 0, upd: 8; c2wS: // stack check success I64[Sp - 40] = PicBaseReg + block_c2my_info; // return addr for eval R1 = R6; // t I64[Sp - 32] = R2; // spill: s I64[Sp - 24] = R3; // spill: x P64[Sp - 16] = R4; // spill: k I64[Sp - 8] = R5; // spill: h Sp = Sp - 40; if (R1 & 7 != 0) goto c2my; else goto c2mz; // eval check of t c2mz: // eval check failed call (I64[R1])(R1) returns to c2my, args: 8, res: 8, upd: 8; // eval c2my: // eval check succeeded _s2b1::I64 = I64[Sp + 8]; // reload: h _s2b2::I64 = I64[Sp + 16]; // reload: k _s2b3::P64 = P64[Sp + 24]; // reload: x _s2b4::I64 = I64[Sp + 32]; // reload: s switch [0 .. 4] (R1 & 7 - 1) { case 0 : goto c2wK; case 1 : goto c2wL; case 2 : goto c2wM; case 3 : goto c2wN; case 4 : goto c2wO; } }}} It seems to me that all the spills/reloads could be pushed into the `c2mz` block. The `c2my` block, in its current form, is reused for a heap check failure case, so the heap check most likely will have to do its own spilling/reloading. However, since the scrutinee not having tags bit or the eval checking failing is not the common case, they should be out of the common path. If it matters the data type is spine strict so GHC should have enough information to know that the common case (e.g. self-recursive calls) already have an evaluated argument (although there might be an indirection in some cases). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): If code duplication is a concern, perhaps the spill and load code could be give their own basic blocks that could be jumped to. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): The register allocator doesn't tidy this up either: {{{ #!asm _c2wS: leaq block_c2my_info(%rip),%rax movq %rax,-40(%rbp) movq %r9,%rbx movq %r14,-32(%rbp) movq %rsi,-24(%rbp) movq %rdi,-16(%rbp) movq %r8,-8(%rbp) addq $-40,%rbp testq $7,%rbx jne _c2my }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by carter): @tibbe, is this possibly related to https://ghc.haskell.org/trac/ghc/ticket/8048 ? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): Yeah, it is somewhat by design (to get smaller code), but you're quite right that we should have faster code for the common case. I'll look into it and see if we can improve things. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): If the stack change code can be moved out of the way, couldn't the switch statements also include checking the tags bit for 0? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): To provide some motivation, I've included the full Cmm of the '''common''' path below. Every line that starts with `>` is what I would considered unnecessary spills/loads for this path: {{{ c2wQ: // stack check if ((Sp + -72) < SpLim) goto c2wR; else goto c2wS; c2wS: // stack check success
I64[Sp - 40] = PicBaseReg + block_c2my_info; // return addr for
eval R1 = R6; // t
I64[Sp - 32] = R2; // spill: s I64[Sp - 24] = R3; // spill: x P64[Sp - 16] = R4; // spill: k I64[Sp - 8] = R5; // spill: h
Sp = Sp - 40; if (R1 & 7 != 0) goto c2my; else goto c2mz; // eval check of t c2my: // eval check succeded
_s2b1::I64 = I64[Sp + 8]; // reload: h _s2b2::I64 = I64[Sp + 16]; // reload: k _s2b3::P64 = P64[Sp + 24]; // reload: x _s2b4::I64 = I64[Sp + 32]; // reload: s
switch [0 .. 4] (R1 & 7 - 1) { case 0 : goto c2wK; case 1 : goto c2wL; case 2 : goto c2wM; case 3 : goto c2wN; case 4 : goto c2wO; } c2wN: // Full _s2cx::P64 = P64[R1 + 4]; // ary _s2cy::I64 = (_s2b1::I64 >> _s2b4::I64) & 15; // i _s2cC::P64 = P64[(_s2cx::P64 + 24) + (_s2cy::I64 << 3)]; // st I64[Sp] = PicBaseReg + block_c2nJ_info; // return addr R6 = _s2cC::P64; // arg: st R5 = _s2b4::I64 + 4; // arg: s + bitsPerSubkey R4 = _s2b3::P64; // arg: x R3 = _s2b2::I64; // arg: k R2 = _s2b1::I64; // arg: h P64[Sp + 8] = _s2cC::P64; // spill: st I64[Sp + 16] = _s2cy::I64; // spill: i P64[Sp + 24] = _s2cx::P64; // spill: ary P64[Sp + 32] = R1; // spill: t (only used in uncommon branch) call $wpoly_go_info(R6, R5, R4, R3, R2) returns to c2nJ, args: 8, res: 8, upd: 8; c2nJ:
_s2b6::P64 = P64[Sp + 32]; // reload: t (only used in uncommon
branch) _s2cx::P64 = P64[Sp + 24]; // reload: ary _s2cy::I64 = I64[Sp + 16]; // reload: i _s2cE::P64 = R1; _s2cF::I64 = R1 == P64[Sp + 8]; if (_s2cF::I64 != 1) goto c2nR; else goto c2AE; c2nR: // heap check Hp = Hp + 176; if (Hp > I64[BaseReg + 856]) goto c2AB; else goto c2AA; c2AA: // heap check success I64[Hp - 168] = I64[PicBaseReg + stg_MUT_ARR_PTRS_DIRTY_info@GOTPCREL]; I64[Hp - 160] = 16; I64[Hp - 152] = 17; _c2nT::I64 = Hp - 168; call MO_Memcpy(_c2nT::I64 + 24, _s2cx::P64 + 24, 128, 8); P64[(_c2nT::I64 + 24) + (_s2cy::I64 << 3)] = _s2cE::P64; I64[_c2nT::I64] = I64[PicBaseReg + stg_MUT_ARR_PTRS_DIRTY_info@GOTPCREL]; I8[(_c2nT::I64 + 24) + ((I64[_c2nT::I64 + 8] << 3) + (_s2cy::I64 >> 7))] = 1 :: W8; I64[_c2nT::I64] = I64[PicBaseReg + stg_MUT_ARR_PTRS_FROZEN0_info@GOTPCREL]; I64[Hp - 8] = PicBaseReg + Full_con_info; I64[Hp] = _c2nT::I64; R1 = Hp - 4; Sp = Sp + 40; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; // return }}} The main body of this function, except for the recursive call, is in the last block, `c2AA`. Quite of bit of time is spent on "administrative" things. Also note that there are a bunch of static arguments passed around (`x`, `k`, and `h`). I will try to see what the Cmm looks like if I manually perform a static argument transform on this code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): I attached a patch you can try. It definitely increases code size, but I'd be interested how much benefit it gives in your benchmarks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): I'm puzzled by why the spill code is before the eval check rather than after. Before spilling etc I'd expect to see: {{{ ... if <eval check> goto Leval else goto Ldone Leval: call (I64[R1])(R1) returns to Ldone Ldone: // Now it's evaluated }}} Then the stack-layout stuff should insert spills and reloads ''around the call'', but not interfering with the fast path. So how come the spills affect the fast path at all? Is there some special code-size-saving magic in the code generator? Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): Simon - this is one of the things I changed when I worked on the new code generator. We could certainly investigate alternatives, the patch I attached earlier changes it but increases code size, I haven't done a full nofib run though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): Oh, I just remembered one thing: when we are splitting proc points, then the "faster" sequence is actually worse because it generates two separate proc points, one for the call-return and one for the joint point. If we always do the copyout, then the call-return can be the join point too, which means we only need one proc point. However, we now don't generate proc points (when using the NCG), so we can probably use the faster sequence with only a small increase in code size. Someone needs to measure. And figure out whether we still want to do this when using LLVM or not. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): @simonmar, I will try to benchmark your patch against my code tonight. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): Replying to [comment:7 simonmar]:
I attached a patch you can try. It definitely increases code size, but I'd be interested how much benefit it gives in your benchmarks.
I will try to benchmark the patch on my code tonight. Will you do a nofib run? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): Here are the nofib results on my machine {{{ $ nofib-analyse/nofib-analyse spill.log no-spill.log | head -n 150 NoFib Results -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- anna +2.2% 0.0% 0.07 0.08 0.0% ansi 0.0% 0.0% 0.00 0.00 0.0% atom 0.0% 0.0% +7.9% +6.1% 0.0% awards 0.0% 0.0% 0.00 0.00 0.0% banner 0.0% 0.0% 0.00 0.00 0.0% bernouilli 0.0% 0.0% 0.11 0.12 0.0% binary-trees 0.0% 0.0% -14.8% -15.1% 0.0% boyer 0.0% 0.0% 0.02 0.03 0.0% boyer2 +0.2% 0.0% 0.00 0.00 0.0% bspt +1.7% 0.0% 0.00 0.01 0.0% cacheprof +0.5% -0.0% +11.8% +12.7% +5.5% calendar +0.1% 0.0% 0.00 0.00 0.0% cichelli +0.1% 0.0% 0.05 0.05 0.0% circsim +0.2% 0.0% +15.0% +15.8% 0.0% clausify 0.0% 0.0% 0.02 0.03 0.0% comp_lab_zift +0.2% 0.0% 0.12 0.13 0.0% compress +0.1% 0.0% 0.11 0.12 0.0% compress2 +0.6% 0.0% 0.10 0.12 0.0% constraints +0.1% 0.0% +9.4% +9.4% 0.0% cryptarithm1 0.0% 0.0% +19.2% +18.4% 0.0% cryptarithm2 +0.1% 0.0% 0.00 0.01 0.0% cse 0.0% 0.0% 0.00 0.00 0.0% eliza 0.0% 0.0% 0.00 0.00 0.0% event +0.1% 0.0% 0.09 0.10 0.0% exp3_8 0.0% 0.0% 0.15 0.16 0.0% expert +0.2% 0.0% 0.00 0.00 0.0% fannkuch-redux +0.1% 0.0% -7.6% -7.3% 0.0% fasta 0.0% 0.0% -4.5% -4.2% 0.0% fem +1.0% 0.0% 0.01 0.02 0.0% fft +0.1% 0.0% 0.02 0.03 0.0% fft2 +0.1% 0.0% 0.03 0.03 0.0% fibheaps +0.2% 0.0% 0.02 0.02 0.0% fish +0.1% 0.0% 0.01 0.01 0.0% fluid +2.5% 0.0% 0.00 0.01 0.0% fulsom +0.8% 0.0% 0.16 0.18 0.0% gamteb +0.3% 0.0% 0.02 0.03 0.0% gcd 0.0% 0.0% 0.02 0.02 0.0% gen_regexps 0.0% 0.0% 0.00 0.00 0.0% genfft 0.0% 0.0% 0.02 0.02 0.0% gg +0.5% 0.0% 0.00 0.01 0.0% grep +0.3% 0.0% 0.00 0.00 0.0% hidden +0.5% 0.0% +1.0% -0.0% 0.0% hpg +0.3% 0.0% 0.04 0.11 0.0% ida +0.2% 0.0% 0.06 0.06 0.0% infer +0.2% 0.0% 0.03 0.04 0.0% integer 0.0% 0.0% +5.6% +5.7% 0.0% integrate 0.0% 0.0% 0.06 0.09 0.0% k-nucleotide +0.2% 0.0% -4.6% -4.8% 0.0% kahan 0.0% 0.0% 0.17 0.18 0.0% knights +0.3% 0.0% 0.00 0.00 0.0% lcss 0.0% -0.0% +2.9% +2.6% 0.0% life 0.0% 0.0% 0.17 0.17 0.0% lift +0.3% 0.0% 0.00 0.00 0.0% listcompr 0.0% 0.0% 0.04 0.05 0.0% listcopy 0.0% 0.0% 0.05 0.06 0.0% maillist 0.0% +0.0% 0.03 0.17 +2.2% mandel 0.0% 0.0% 0.03 0.04 0.0% mandel2 +0.1% 0.0% 0.00 0.00 0.0% minimax +0.1% 0.0% 0.00 0.00 0.0% mkhprog +0.1% 0.0% 0.00 0.00 0.0% multiplier +0.1% 0.0% 0.07 0.08 0.0% n-body 0.0% 0.0% -6.3% -6.2% 0.0% nucleic2 +1.8% 0.0% 0.04 0.05 0.0% para +0.5% 0.0% 0.20 0.22 0.0% paraffins +0.1% 0.0% 0.07 0.09 0.0% parser +1.0% 0.0% 0.02 0.02 0.0% parstof +0.4% 0.0% 0.00 0.01 0.0% pic +0.5% 0.0% 0.00 0.01 0.0% pidigits 0.0% 0.0% -6.3% -5.5% 0.0% power +0.1% 0.0% +7.7% +7.4% +1.8% pretty 0.0% 0.0% 0.00 0.00 0.0% primes 0.0% 0.0% 0.04 0.04 0.0% primetest 0.0% 0.0% 0.06 0.07 0.0% prolog +0.2% 0.0% 0.00 0.00 0.0% puzzle +0.3% 0.0% 0.09 0.13 0.0% queens 0.0% 0.0% 0.01 0.01 0.0% reptile +0.5% 0.0% 0.00 0.01 0.0% reverse-complem +0.1% 0.0% 0.05 0.08 0.0% rewrite +0.2% 0.0% 0.01 0.01 0.0% rfib 0.0% 0.0% 0.01 0.01 0.0% rsa 0.0% 0.0% 0.01 0.01 0.0% scc 0.0% 0.0% 0.00 0.00 0.0% sched +0.1% 0.0% 0.01 0.01 0.0% scs +1.8% 0.0% -10.4% -13.1% 0.0% simple +2.9% 0.0% 0.15 0.17 0.0% solid +0.1% 0.0% 0.09 0.11 0.0% sorting +0.1% 0.0% 0.00 0.00 0.0% spectral-norm 0.0% 0.0% +3.5% +3.1% 0.0% sphere +0.6% 0.0% 0.02 0.03 0.0% symalg +0.2% 0.0% 0.00 0.00 0.0% tak 0.0% 0.0% 0.01 0.01 0.0% transform +0.4% 0.0% 0.22 +12.6% 0.0% treejoin +0.1% 0.0% 0.09 0.11 0.0% typecheck +0.1% 0.0% 0.14 0.15 0.0% veritas +1.9% 0.0% 0.00 0.01 0.0% wang +0.1% 0.0% 0.07 0.09 0.0% wave4main +0.1% 0.0% 0.14 0.16 0.0% wheel-sieve1 +0.1% 0.0% +3.8% +4.5% 0.0% wheel-sieve2 0.0% 0.0% 0.12 0.14 0.0% x2n1 0.0% 0.0% 0.00 0.00 0.0% -------------------------------------------------------------------------------- Min 0.0% -0.0% -14.8% -15.1% 0.0% Max +2.9% +0.0% +19.2% +18.4% +5.5% Geometric Mean +0.3% -0.0% +1.5% +1.8% +0.1% }}} A more detailed analysis of the results would be interesting. For example, I note that programs that are already heavily tuned (from the shootout suite) get faster, while some more "traditional" benchmarks get slower. However, given that it's a slight regressions on average on nofib, I think this isn't something we should go ahead with without further investigation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF --------------------------------------------+------------------------------ Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by tibbe): I've attached the Cmm for the biggest loss in the nofib run above, `cryptarithm1`. There's definitely more code generated with simonmar's patch (which nofib strangely doesn't report). Just to validate nofib's result, I reran this benchmark manually and grabbed the results for the best out of five runs: master: {{{ real 0m0.328s user 0m0.320s sys 0m0.007s }}} no-spill: {{{ real 0m0.333s user 0m0.325s sys 0m0.007s }}} That's a 1.5% runtime difference. Looking at the mutator time with `+RTS -s` (using five new runs), I can see no difference (`+RTS -s` has lower resolution). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF -------------------------------------+------------------------------------- Reporter: tibbe | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by thomie): * component: Compiler => Compiler (CodeGen) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF -------------------------------------+------------------------------------- Reporter: tibbe | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CodeGen -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8905: Function arguments are always spilled/reloaded if scrutinee is already in WHNF -------------------------------------+------------------------------------- Reporter: tibbe | Owner: kavon Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 (CodeGen) | Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by kavon): * owner: (none) => kavon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8905#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC