
#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Loopification triggers for fully-saturated (but not over-saturated!) tail calls. So in this code: {{{ f 0 = 4 f 1 = 5 f n = case f (n - 2) of 4 -> 4 5 -> f (n - 1) }}} If will trigger for call to `f (n - 1)` in second branch of case, but will not trigger for `f (n - 2)` in case scrutinee. I checked and this code actually causes segfault when compiled with `-prof -fprof-auto -rtsopts` (assuming you add `main = print (f 5)` in the file), but it only happens when `f :: Integer -> Integer` and not when `f :: Int -> Int`, so I suspect that the bug might actually be hidden somewhere in the libraries and I might be looking at wrong code. The idea behind the loopification is that it should put parameters in the local variables (instead of global registers) and make a jump (instead of call). `f` function begins like this in Cmm: {{{ cYp: _sUP::P64 = R2; _sUO::P64 = R1; if (%MO_UU_Conv_W32_W64(I32[era]) <= 0) goto cW1; else goto cVZ; cVZ: I64[R1 + 15] = I64[R1 + 15] & 1152921503533105152 | %MO_UU_Conv_W32_W64(I32[era]) | 1152921504606846976; goto cW1; cW1: if (Sp - 104 < SpLim) goto cYq; else goto cYr; cYr: Hp = Hp + 40; if (Hp > HpLim) goto cYt; else goto cYs }}} Without loopification tail call will be a normal call: {{{ cYM: I64[CCCS + 72] = I64[CCCS + 72] + %MO_UU_Conv_W64_W64(6 - 2); I64[Hp - 40] = sat_sVa_info; I64[Hp - 32] = CCCS; I64[Hp - 24] = (%MO_UU_Conv_W32_W64(I32[era]) << 30) | 0; P64[Hp - 8] = _sUN::P64; P64[Hp] = _sUP::P64; _cXT::P64 = Hp - 40; R2 = _cXT::P64; R1 = _sUO::P64; Sp = Sp + 40; call f1_sUQ_info(R2, R1) args: 8, res: 0, upd: 8; }}} With loopification we get: {{{ cYM: I64[CCCS + 72] = I64[CCCS + 72] + %MO_UU_Conv_W64_W64(6 - 2); I64[Hp - 40] = sat_sV8_info; I64[Hp - 32] = CCCS; I64[Hp - 24] = (%MO_UU_Conv_W32_W64(I32[era]) << 30) | 0; P64[Hp - 8] = _sUL::P64; P64[Hp] = _sUP::P64; _cXT::P64 = Hp - 40; _sUP::P64 = _cXT::P64; goto cW2; cW2: if (Sp - 104 < SpLim) goto uZq; else goto uZp; uZq: Sp = Sp + 40; goto cYq; cYq: R2 = _sUP::P64; R1 = _sUO::P64; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; uZp: Sp = Sp + 40; goto cYr; }}} What might be surprising is that value of `_sUO` is not set before making tail call, but that *seems* to be OK - it is only shuffled between the stack and local variable. Note also that loopified call doesn't jump directly to second label `cVZ`, but instead it jumps to `cYr`. In principle this is OK (we want to skip stack check but not heap check), but TBH I can't tell whether in this case that is correct - I don't know what the magical numbers in `cVZ` do. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler