[GHC] #8275: Loopification breaks profiling

#8275: Loopification breaks profiling ----------------------------------+---------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Profiling | Version: 7.7 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Building GHC failed Unknown/Multiple | Test Case: Difficulty: Unknown | Blocking: Blocked By: | Related Tickets: | ----------------------------------+---------------------------------------- Profiling is currently broken in HEAD. Setting `BuildFlavour = prof` in `build.mk` leads to a segfault during build. After reverting d61c3ac186c94021c851f7a2a6d20631e35fc1ba (loopification performed in the code generator) everything builds without problems. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by leroux): * cc: leroux@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by ezyang): * priority: high => highest -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by ezyang): {{{ Program received signal SIGSEGV, Segmentation fault. 0x000000000509dced in enterFunCCS (reg=0x64540d8, ccsfn=0x644496000) at rts/Profiling.c:367 367 if (ccsfn->prevStack == CCS_MAIN) { (gdb) bt #0 0x000000000509dced in enterFunCCS (reg=0x64540d8, ccsfn=0x644496000) at rts/Profiling.c:367 #1 0x000000000095ad66 in s47v_info () #2 0x0000000000000000 in ?? () }}} where `0x644496000` is nonsense memory. My guess is that the nodeReg isn't being restored properly upon the loop back. Some sample code which the loopification triggers on would be helpful for debugging; you will probably be able to figure it out by eyeballing the C--. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Loopification triggers for fully-saturated (but not over-saturated!) tail calls. So in this code: {{{ f 0 = 4 f 1 = 5 f n = case f (n - 2) of 4 -> 4 5 -> f (n - 1) }}} If will trigger for call to `f (n - 1)` in second branch of case, but will not trigger for `f (n - 2)` in case scrutinee. I checked and this code actually causes segfault when compiled with `-prof -fprof-auto -rtsopts` (assuming you add `main = print (f 5)` in the file), but it only happens when `f :: Integer -> Integer` and not when `f :: Int -> Int`, so I suspect that the bug might actually be hidden somewhere in the libraries and I might be looking at wrong code. The idea behind the loopification is that it should put parameters in the local variables (instead of global registers) and make a jump (instead of call). `f` function begins like this in Cmm: {{{ cYp: _sUP::P64 = R2; _sUO::P64 = R1; if (%MO_UU_Conv_W32_W64(I32[era]) <= 0) goto cW1; else goto cVZ; cVZ: I64[R1 + 15] = I64[R1 + 15] & 1152921503533105152 | %MO_UU_Conv_W32_W64(I32[era]) | 1152921504606846976; goto cW1; cW1: if (Sp - 104 < SpLim) goto cYq; else goto cYr; cYr: Hp = Hp + 40; if (Hp > HpLim) goto cYt; else goto cYs }}} Without loopification tail call will be a normal call: {{{ cYM: I64[CCCS + 72] = I64[CCCS + 72] + %MO_UU_Conv_W64_W64(6 - 2); I64[Hp - 40] = sat_sVa_info; I64[Hp - 32] = CCCS; I64[Hp - 24] = (%MO_UU_Conv_W32_W64(I32[era]) << 30) | 0; P64[Hp - 8] = _sUN::P64; P64[Hp] = _sUP::P64; _cXT::P64 = Hp - 40; R2 = _cXT::P64; R1 = _sUO::P64; Sp = Sp + 40; call f1_sUQ_info(R2, R1) args: 8, res: 0, upd: 8; }}} With loopification we get: {{{ cYM: I64[CCCS + 72] = I64[CCCS + 72] + %MO_UU_Conv_W64_W64(6 - 2); I64[Hp - 40] = sat_sV8_info; I64[Hp - 32] = CCCS; I64[Hp - 24] = (%MO_UU_Conv_W32_W64(I32[era]) << 30) | 0; P64[Hp - 8] = _sUL::P64; P64[Hp] = _sUP::P64; _cXT::P64 = Hp - 40; _sUP::P64 = _cXT::P64; goto cW2; cW2: if (Sp - 104 < SpLim) goto uZq; else goto uZp; uZq: Sp = Sp + 40; goto cYq; cYq: R2 = _sUP::P64; R1 = _sUO::P64; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; uZp: Sp = Sp + 40; goto cYr; }}} What might be surprising is that value of `_sUO` is not set before making tail call, but that *seems* to be OK - it is only shuffled between the stack and local variable. Note also that loopified call doesn't jump directly to second label `cVZ`, but instead it jumps to `cYr`. In principle this is OK (we want to skip stack check but not heap check), but TBH I can't tell whether in this case that is correct - I don't know what the magical numbers in `cVZ` do. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by ezyang): The magic numbers are for LDV (lag-drag-void) accounting; they are probably irrelevant. No comment for the rest of the code fragments yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Another thing I observed is that my example is very sensitive. If you modify it by removing any call to `f` (for example by changing scrutinee to `case n of...` or second branch to `5 -> 5`) or replace second pattern match in case alternatives with `_` (i.e second branch becomes `_ -> f (n - 1)`) the segfault disappears. I haven't yet examined the differences in Cmm between these versions, but if the problem is caused by code generated for `f` this should allow us to track the cause. I don't know yet how to debug this issue if the problem is hidden in the libraries :-/ -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: new
Priority: highest | Milestone:
Component: Profiling | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Building GHC failed | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8298 | Blocked By:
| Related Tickets:
----------------------------------------+----------------------------------
Comment (by Jan Stolarek

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Edward, I disabled loopification by default - this should allow you to use profiling. I noticed that enabling loopification for my test program does not trigger segfault. So the problem was in the libraries, not in code snippet that I uploaded here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by ezyang): OK, makes sense; I couldn't find anything obviously wrong in your example. My suggestion is to refactor it so the library dep goes away, and then we'll know what's up. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: new
Priority: highest | Milestone:
Component: Profiling | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Building GHC failed | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8298 | Blocked By:
| Related Tickets:
----------------------------------------+----------------------------------
Comment (by jstolarek):
Edward, there is some progress. There is a single test in the testsuite
that fails when loopification is turned on, but works if it is off. That
test is T1735. Assuming that you built GHC with profiling libraries you
can run this test from `testsuite/tests` with:
{{{
make WAY=normal EXTRA_HC_OPTS="-prof -fprof-auto -rtsopts" TEST=T1735
}}}
and it should pass. Including `-floopification` on the list of extra
options should make the test fail. Examining the Cmm dumps is a pain -
they are over 200k lines - but I found a single place where loopification
occurs. Attached are the dumps of this single function with and without
loopification. `-input` suffix denotes code produced by the code
generator, `-output` suffix denotes code after it has gone through all
optimisations in Cmm pipeline.
If you look at the output code you will notice one incorrect
transformation in the loopified version: stack check gets duplicated and
inserted at the end of function, right before making a tail call. There
are two issues here:
1) Correctness. Cmm pipeline performs an incorrect transformation on a
valid Cmm input program. It duplicates stack check at the end of a
function, which violates assumption that stack checks are made only at the
entry to a function (see !CmmLayoutStack, line 208). Putting stack check
at the end makes it invalid, because we set required amount of stack to be
a fixed value. As a result at the end of a function we are checking for
more stack that we actually need (becuase we have moved the stack
pointer). Simon proposes to fix that by referring to `

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by ezyang): If the C-- dump is too large, minimize the test case! Then you can look at each of the phases and see which one is inserting the extra stack check. It sounds like you have almost figured it out though. As for whether or not the LDV code needs to be done again, it depends on whether node (R1) changes when looping. If it does, yes, you need to run the LDV code, since the closure needs to be marked as used. Otherwise it will be idempotent, and you only need to run it once. This also reminds me, please make sure loopification interacts properly with `-falways-yields` (which guarantees that Haskell code will eventually yields); it sounds like either loopification must be disabled if always- yields is enabled, or the loop-back needs to insert a heap check for preemption purposes. And if a loop can be preempted, then the era can change, and yes, you need to run the LDV check after the preemption. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Replying to [comment:12 ezyang]:
As for whether or not the LDV code needs to be done again, it depends on whether node (R1) changes when looping. Yes, R1 does change inside the loop.
the loop-back needs to insert a heap check for preemption purposes. And if a loop can be preempted, then the era can change, and yes, you need to run the LDV check after the preemption. I don't fully understand this. What is preemption? If upon entry to the loop I will do the LDV code and heap check, will this ensure correctnes of profiling and `-falways-yields`? Is this described on the wiki somewhere (preemption and LDV)?
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by jstolarek): * owner: jstolarek => Comment: OK, I know what preemption is. There are three steps that need to be done: 1) Fix generation of stack checks, i.e. fix correctness (this should be simple, I know how to do this). 2) Make sure that stack check doesn't get duplicated, i.e. fix performance 3) Make sure that loopification jumps before LDV code and hope that this solves the problem. There is one thing that worries me. We want the loop to begin after the stack check but before the heap check. Now if LDV code is placed before the stack check then of course this requirement cannot be fulfilled. I'm not sure how to handle such situation. Anyway, I'm leaving MSR today and my work on this ticket will be put on hold for a couple of weeks. If anyone wants to take over feel free. I'm unassigning myself for the time being. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): I actually already implemented 1), but the implementation is only a proof- of-concept. It's available [https://github.com/jstolarek/ghc/commit/66d5598bc1163c32267c55670c579d93cce9... here], but it need testing, cleanup (we can just drop sp_high and use final_hwn instead and perhaps entry_args are not used anymore?) and improving `constantFoldMachOpM` in `CmmOpt`, so that `if ((Sp + 56) - 96 < SpLim) goto u4yi; else goto u4yh;` gets constant folded properly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.10.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by ezyang): * priority: highest => high * milestone: => 7.10.1 Old description:
Profiling is currently broken in HEAD. Setting `BuildFlavour = prof` in `build.mk` leads to a segfault during build. After reverting d61c3ac186c94021c851f7a2a6d20631e35fc1ba (loopification performed in the code generator) everything builds without problems.
We want the loop to begin after the stack check but before the heap check. Now if LDV code is placed before the stack check then of course
New description: Profiling is currently broken in HEAD. Setting `BuildFlavour = prof` in `build.mk` leads to a segfault during build. After reverting d61c3ac186c94021c851f7a2a6d20631e35fc1ba (loopification performed in the code generator) everything builds without problems. I am going to lower the priority since loopification is now no longer by default, and it looks like loopification by default will not make it to GHC 7.8. -- Comment: this requirement cannot be fulfilled. One way to deal with this, at the cost of increased code size, is to place a second copy of the LDV code in the function epilogue right before the loopification jump back. However, it might be OK to move the LDV check after the stack/heap checks. Simon Marlow might be able to better say. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.10.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by simonmar): * cc: simonmar (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by jstolarek): * milestone: 7.10.1 => 7.8.1 Comment: I'm setting this to milestone 7.8.1 - we certainly don't want that breakage released. I will be fairly busy for the next 2-3 weeks, but I should have time in November to look into this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
Reporter: jstolarek | Owner:
Type: bug | Status: new
Priority: high | Milestone: 7.8.1
Component: Profiling | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Building GHC failed | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8298 | Blocked By:
| Related Tickets:
----------------------------------------+----------------------------------
Comment (by Jan Stolarek

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Changes (by jstolarek): * owner: => jstolarek Comment: Resuming work on this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): I believe I know the reason for duplicating the stack check. There is at least one bug in !CmmContFlowOpt module. Consider this: {{{ L1: goto L2 L2: whatever L3: goto L1 }}} We are processing blocks from the end. When we reach L3 first guard in `maybe_concat` function (!CmmContFlowOpt.hs, line 123) will turn that blocks into: {{{ L1: goto L2 L2: whatever L3: goto L2 }}} However, the number of predecessors of L2 block is not updated because `backEdges` is computed once before we run `maybe_concat` and is not updated when we make changes to the block structure. Another issue, which I have not yet encountered but I believe may arise as well, comes from the fact that we may map one label to a different one, but again we don't that take into account when determining the number of predecessors. I wrote a fix for that sometime in July, but I did not merged into HEAD - I guess I'll do that now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: | Related Tickets: ----------------------------------------+---------------------------------- Comment (by simonmar): Yes, good catch. I wonder if the right way to fix these issues is to not shortcut or concat a jump when the jump destination has not already been processed. This might be easier than keeping track of the predecessor counts accurately, and should give results that are just as good. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by simonpj): It's absolutely right to refer to `(old+0)`. But I don't understand the patch (in comment 19), to `CmmLayoutStack`. The change made there is simply to ''pass a different (larger) value to `manifestSp`''. And that's the only change to that module. So we'll generate different, wrong code. I don't see how this can possibly work. And yet apparently it does. What am I missing? (The change to `StgCmmHeap` is different, and is fine.) Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by simonpj): About `CmmContFlowOpt`, it looks to me as if there may be other bugs. When deciding in `shouldConcatWith` whether the destination block is small, we look in `blocks`; but those are the original un-rewritten blocks, and we might have already rewritten the destination block. I think it might be better to consider this algorithm as something that rewrites a full graph to a graph (both represented as usual as a finite map). The algorithm takes a post-order DFS list of block-ids (not blocks!) which is used as the list of blocks to consider, one by one. Look up that block id to find the block, consider whether to rewrite it; if so, rewrite it and replace it in the finite map, and move on to the next one. You are right that rewriting changes predecessor counts, so you'd need to maintain that alongside the current graph. An easy approximiation would be this: keep a set of block-ids that are referenced exactly once, and delete things from it when necessary. (But never add anything.) That would be an approximation, but probably a reasonable one. I don't agree with Simon M that "not shortcut or concat backward jumps" will give results that are at least as good. Imagine a backward goto to a goto! Thanks for working on this Jan Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Replying to [comment:24 simonpj]:
But I don't understand the patch (in comment 19), to `CmmLayoutStack`. The change made there is simply to ''pass a different (larger) value to `manifestSp`''. And that's the only change to that module. So we'll generate different, wrong code. I don't see how this can possibly work. And yet apparently it does. What am I missing?
I'll try to explain this with an example of function that uses no stack space. Previously we generated stack checks that looked like this: {{{ if (Sp - <highSp> < SpLim) ... }}} If the function used no stack, then stack layout (`areaToSp` function, second guard) transformed this into: {{{ if (Sp - 0 < SpLim) ... }}} This check is always false and it used to be eliminated by the third guard in `areaToSp`. The value of `<highSp>` is computed to be `0` by the line of code that I removed in !CmmLayoutStack. Now consider new version. Code generator gives us: {{{ if ((old + 0) - <highSp> < SpLim) ... }}} As you pointed out, `<highSp>` will now be larger. But `(old + 0)` will not be `Sp`! This check will be turned into something this: {{{ if ((Sp + 8) - 8 < SpLim) ... }}} That is the reason why the elimination of always false stack checks was killed by my change. BTW. Simon, you wrote the code for that patch when I was in Cambridge :-) Do you remember how we (well, mostly you) updated [http://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas this wiki page] and how we were confused with addressing offsets into an `Old` area? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Simon, does my explanation make sense? I fixed the duplicate stack check, but loopification still causes segfault with T1735. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by simonpj): Yes, that makes sense thank you. I was forgetting about the late-bound constant `CmmHighStackMark`, which `manifestSp` manifests along with the areas etc. I've added yet more comments! Disappointing that you still get the seg fault. I hope you make progress in tracking it down. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Changes (by jstolarek): * owner: jstolarek => Comment: Sadly, it quite possible that I won't have time to work on this for the next week (or even two, if things go bad). I'm unassigning myself from this ticket for the moment. If anyone wishes to take over the next thing to do is ensuring that loopified call jumps to the correct cmm block: after the stack check, but before LDV code and heap check. Janek -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8298 | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by simonmar): Replying to [comment:26 jstolarek]:
BTW. Simon, you wrote the code for that patch when I was in Cambridge :-) Do you remember how we (well, mostly you) updated [http://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas this wiki page] and how we were confused with addressing offsets into an `Old` area?
I am afraid that wiki page was not updated when I rewrote the stack layout algorithm, and it's woefully out of date. In particular there are no `RegSlots` any more. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Changes (by parcs): * status: new => patch Comment: I have attached a patch that fixes loopification with profiling. The problem is that `nodeReg` aka R1 may get clobbered inside the body of a closure. With profiling, upon entering a closure, `enterFunCCS` is called with R1 passed to it. But since R1 may get clobbered inside the body of a closure, and since a self-recursive tail call does not restore R1, a subsequent call to `enterFunCCS` may be passed a bogus value from R1. The solution is to not pass `nodeReg` aka R1 to `enterFunCCS`. Instead, we pass `node`, the callee-saved temporary that stores the original value of R1. This way R1 may get clobbered and loopification will not care. I don't know whether it's necessary to re-run the LDV code after looping. Not doing so certainly does not cause a segfault, anyway. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by jstolarek): Thanks Patrick! Currently I'm without my main laptop so I can't verify your patch - I will do it in Wednesday, unless someone else can do it before. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by parcs): Replying to [comment:33 jstolarek]:
Thanks Patrick! Currently I'm without my main laptop so I can't verify your patch - I will do it in Wednesday, unless someone else can do it before.
Okay, great. And here's an ugly PoC patch that moves the self-loop label below the stack check: {{{ #!diff diff --git a/compiler/codeGen/StgCmmBind.hs b/compiler/codeGen/StgCmmBind.hs index 16477c8..029b8af 100644 --- a/compiler/codeGen/StgCmmBind.hs +++ b/compiler/codeGen/StgCmmBind.hs @@ -481,7 +481,6 @@ closureCodeBody top_lvl bndr cl_info cc args arity body fv_details -- of a self-recursive tail call. See Note -- [Self-recursive tail calls] in StgCmmExpr ; loop_header_id <- newLabelC - ; emitLabel loop_header_id -- Extend reader monad with information that -- self-recursive tail calls can be optimized into local -- jumps diff --git a/compiler/codeGen/StgCmmHeap.hs b/compiler/codeGen/StgCmmHeap.hs index 55ddfd4..f01d042 100644 --- a/compiler/codeGen/StgCmmHeap.hs +++ b/compiler/codeGen/StgCmmHeap.hs @@ -615,6 +615,12 @@ do_checks mb_stk_hwm checkYield mb_alloc_lit do_gc = do Nothing -> return () Just stk_hwm -> tickyStackCheck >> (emit =<< mkCmmIfGoto (sp_oflo stk_hwm) gc_id) + self_loop_info <- getSelfLoop + case self_loop_info of + Just (_,loop_header_id,_) + | checkYield && isJust mb_stk_hwm -> emitLabel loop_header_id + _otherwise -> return () + if (isJust mb_alloc_lit) then do tickyHeapCheck }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: patch Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Changes (by jstolarek): * owner: => jstolarek Comment: OK, I got my laptop back :-) Patch validates, but before I push it I want to make a few more tests to see that it does the right thing.
And here's an ugly PoC patch that moves the self-loop label below the stack check:
Could you explain how and why this works? I mostly interested in knowing how does it guarantee to place self-loop label after stack check but before heap check. I don't have more time to investigate this at the moment but if the patch is ready and working it would be good to have it pushed. Once I push your patch for loopification I'll close this ticket and start a new one for moving self-loop label after the stack check. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: patch
Priority: high | Milestone: 7.8.1
Component: Profiling | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Building GHC failed | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: | Blocked By: 8456
| Related Tickets:
----------------------------------------+----------------------------------
Comment (by Jan Stolarek

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: closed Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Changes (by jstolarek): * status: patch => closed * resolution: => fixed Comment: OK, I see how the loopification patch works. Previously we had: {{{ c4AC: call "ccall" arg hints: [PtrHint, PtrHint] result hints: [] enterFunCCS(BaseReg, I64[R1 - 1 + 8]); }}} Now we have: {{{ c4AC: call "ccall" arg hints: [PtrHint, PtrHint] result hints: [] enterFunCCS(BaseReg, I64[_s2UW::P64 - 1 + 8]); }}} The `_s2UW::P64` is declared anyway so no extra work required :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:37 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: closed Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by hvr): btw, shall we mention the new `-floopification` flag in the 7.8.1 release notes? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: closed
Priority: high | Milestone: 7.8.1
Component: Profiling | Version: 7.7
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Building GHC failed | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: | Blocked By: 8456
| Related Tickets:
----------------------------------------+----------------------------------
Comment (by Jan Stolarek

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Changes (by simonmar): * owner: jstolarek => * status: closed => new * resolution: fixed => Comment: The fix looks right to me, thanks. (we should never be referring to global regs explicity except during copy-in or copy-out, so it was just a bug before.)
I don't know whether it's necessary to re-run the LDV code after looping. Not doing so certainly does not cause a segfault, anyway.
Yes, it is necessary to do this. Remember that looping should behave in exactly the same way as making a recursive call to the function. Could someone look into this please? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:40 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
Reporter: jstolarek | Owner:
Type: bug | Status: new
Priority: high | Milestone: 7.8.1
Component: Profiling | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Building GHC failed | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: | Blocked By: 8456
| Related Tickets:
----------------------------------------+----------------------------------
Comment (by Patrick Palka

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Comment (by parcs): Replying to [comment:40 simonmar]:
The fix looks right to me, thanks. (we should never be referring to global regs explicity except during copy-in or copy-out, so it was just a bug before.)
I don't know whether it's necessary to re-run the LDV code after looping. Not doing so certainly does not cause a segfault, anyway.
Yes, it is necessary to do this. Remember that looping should behave in exactly the same way as making a recursive call to the function. Could someone look into this please?
Okay, I optimistically pushed a pair of commits to make the LDV coderun after looping. It involved eliminating another explicit use of `nodeReg` and then moving the LDV code below the self-loop label. Is this fine? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:42 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8275: Loopification breaks profiling ----------------------------------------+---------------------------------- Reporter: jstolarek | Owner: Type: bug | Status: closed Priority: high | Milestone: 7.8.1 Component: Profiling | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Building GHC failed | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: 8456 | Related Tickets: ----------------------------------------+---------------------------------- Changes (by simonmar): * status: new => closed * resolution: => fixed Comment: Ok, looks good, thanks. I'll re-open if I find any problems. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8275#comment:43 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC