[GHC] #8456: Control flow optimisations duplicate blocks

#8456: Control flow optimisations duplicate blocks ------------------------------+-------------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime performance bug Unknown/Multiple | Test Case: Difficulty: Unknown | Blocking: 8275 Blocked By: | Related Tickets: | ------------------------------+-------------------------------------------- During work on #8275 I observed that control flow optimisation pass in the Cmm pipeline duplicates block that does the stack check, which is completely redundant. There is at least one bug in !CmmContFlowOpt module. Consider this: {{{ L1: goto L2 L2: whatever L3: goto L1 }}} We are processing blocks from the end. When we reach L3 first guard in `maybe_concat` function (!CmmContFlowOpt.hs, line 123) will turn that blocks into: {{{ L1: goto L2 L2: whatever L3: goto L2 }}} However, the number of predecessors of L2 block is not updated because `backEdges` is computed once before we run `maybe_concat` and is not updated when we make changes to the block structure. Another issue, which I have not yet encountered in practice but I believe may arise as well, comes from the fact that we may map one label to a different one, but again we don't that take into account when determining the number of predecessors. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by jstolarek): Simon Marlow says: I wonder if the right way to fix these issues is to not shortcut or concat a jump when the jump destination has not already been processed. This might be easier than keeping track of the predecessor counts accurately, and should give results that are just as good. Let me see if I understand this correctly. Consider again the same code as in the ticket report: {{{ L1: goto L2 L2: whatever L3: goto L1 }}} According to your proposal L3 will not process its destination - L1 - because L1 itself has not been processed (we process blocks from the back). L1 however will concatenate with L2, causing L2 to be unreachable and removed from the graph later on: {{{ L1: whatever L2: whatever -- UNREACHABLE, eliminated later L3: goto L1 }}} Now consider this: {{{ L1: goto L2 L2: whatever L3: goto L1 L4: goto L2 }}} L3 will not shortcut its destination, because it has not been processed yet. L1 will not concatenate L2, because L2 has two predecessors. So this graph will not be modified in any way. However, using current algorithm (and providing that we correctly keep track of predecessors) we will optimise that graph to: {{{ L1: goto L2 -- UNREACHABLE, eliminated later L2: whatever L3: goto L2 L4: goto L2 }}} Do I got this right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by jstolarek): * cc: simonmar (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): About `CmmContFlowOpt`, it looks to me as if there may be other bugs. When deciding in `shouldConcatWith` whether the destination block is small, we look in `blocks`; but those are the original un-rewritten blocks, and we might have already rewritten the destination block. I think it might be better to consider this algorithm as something that rewrites a full graph to a graph (both represented as usual as a finite map). The algorithm takes a post-order DFS list of block-ids (not blocks!) which is used as the list of blocks to consider, one by one. Look up that block id to find the block, consider whether to rewrite it; if so, rewrite it and replace it in the finite map, and move on to the next one. You are right that rewriting changes predecessor counts, so you'd need to maintain that alongside the current graph. An easy approximiation would be this: keep a set of block-ids that are referenced exactly once, and delete things from it when necessary. (But never add anything.) That would be an approximation, but probably a reasonable one. I don't agree with Simon M that "not shortcut or concat backward jumps" will give results that are at least as good. Imagine a backward goto to a goto! Thanks for working on this Jan Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): Simon - the set of blocks is updated as we go along (there's a foldr at the top). So it does what you're suggesting. For a backward goto to a goto, I was imagining that the forward goto would be concatenated with its destination. But it might not be, if the destination has multiple predecessors, so I withdraw my suggestion. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): Ah yes, you're right. In this line 135 {{{ , Just blk' <- mapLookup b' blocks }}} I thought `blocks` was the original blocks. But it's the working set of blocks. (Shadowing.) I'll rename it Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by jstolarek):
I'll rename it Please, don't - I have a major patch coming for this module. I'll write more in a moment, just let me finish my lunch :)
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by jstolarek): I have rewritten `maybe_concat` function and some subsequent functions - code is [https://github.com/jstolarek/ghc/blob/95938fc84b4310cb04383594db35ce7ae15820... here on github]. In the modified algorithm I maintain an up-to-date map of predecessors. I don't like the idea of having an approximation - up till now we had an approximation (i.e. changing number of predecessors usually doesn't matter) and now we're fixing it because it does the wrong thing. Having an approximation just feels dangerous. My patch validates, except for the fact that it slightly increases allocation and trips two performance tests. I think that's to be expected, given that we are passing around one more data structure and updating it. I'll be updating the comments for the next hour or so, so if there's something wrong or unclear about the patch please tell me. As a side note I see some recurring problems in the design of Cmm pipeline: * here we have a problem, because we create a list of predecessors, which we use to perform graph transformations but at the same time we invalidate that list * in #8205 we computed a list of proc-points and passed it to stack layout, which needed it for graph transformations but at the same time invalidated that list * !CmmContFlowOpt creates unreachable blocks and relies on common block elimination to remove them * stack layout creates unreachable blocks and relies on sinking pass to remove them The last two have not manifested themselves as bugs, but fall into the category of introducing inconsistencies in our data and assuming that someone will later clean up those inconsistencies. I will not be surprised if more such problems are lurking in the code. At the moment I don't know how to do better, but certainly the current design feels tightly coupled and that is not a good thing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks
--------------------------------------------+------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime performance bug | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8275 | Blocked By:
| Related Tickets:
--------------------------------------------+------------------------------
Comment (by Jan Stolarek

#8456: Control flow optimisations duplicate blocks
--------------------------------------------+------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime performance bug | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8275 | Blocked By:
| Related Tickets:
--------------------------------------------+------------------------------
Comment (by Jan Stolarek

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: closed Priority: high | Milestone: Component: Compiler | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by jstolarek): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by simonmar): * owner: jstolarek => * priority: high => highest * status: closed => new * resolution: fixed => Comment: I looked at this patch, I think we need to revisit it. {{{ | CmmBranch b' <- last , Just blk' <- mapLookup b' blocks , hasOnePredecessor b' , hasNotBeenMappedTo b' shortcut_map }}} In particular, the `hasNotBeenMappedTo` here is very suspicious: {{{ hasNotBeenMappedTo :: BlockId -> BlockEnv BlockId -> Bool hasNotBeenMappedTo b successor_map = mapMember b successor_map }}} It looks like the condition is reversed. Indeed, fixing this makes the performance problem go away, probably because this condition is preventing any block concatenation from happening at all, which means later phases are seeing a lot more code than they would if concatenation was happening. What's more, even when it is reversed, I don't think this condition does what you want, because what you actually wanted to do was look in the range of the map, not the domain. And we don't want to do that, because searching the range of a map is O(n) operation, leaving the whole pass O(n^2). I suggest just deleting this condition. If the block is in the `shortcut_map`, then it will also have another predecessor (the call itself), and we won't attempt to concatenate with it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by simonmar): * milestone: => 7.8.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by jstolarek): * owner: => jstolarek Comment:
It looks like the condition is reversed Yes, you're right.
what you actually wanted to do was look in the range of the map, not the domain. Argh... I don't know why I assumed that it looks in the range :-/
searching the range of a map is O(n) operation According to documentation of !IntMap `lookup` has O(min(n,W)) complexity:
Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64). I'm not sure if I understand correctly, but this means that for n < W `lookup` is linear, right?
I suggest just deleting this condition. If the block is in the shortcut_map, then it will also have another predecessor (the call itself), and we won't attempt to concatenate with it. Not any more - see third guard and `update_cont` function. I wanted to avoid this kind of hidden invariants, because they often cause us trouble.
My idea here would be to replace {{{ , hasNotBeenMappedTo b' shortcut_map }}} with {{{ , Nothing <- mapLookup b' shortcut_map }}} Would that be acceptable? I think that for the most time we keep a small number of entries in `shortcut_map`, so `lookup` will be linear, but then again if these numbers are small it shouldn't be that much of a problem? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): `lookup` looks up in the domain, not the range. You would need to do something like `elem b' (mapElems shortcut_map)`. This is O(n). But I still think you don't need this test. You're keeping `backEdges` up to date, so there's no reason to believe that you have something "hidden" in the `shortcut_map` that would represent more predecessors, right? The `shortcut_map` is for replacing labels inside expressions after the pass is complete. It used to update calls, but as you point out, the pass now does this as it goes along. There are no extra invariants. The `shortcut_map` doesn't represent any extra edges in the graph. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by jstolarek): Ah, now I see. Yes, I think you're right - the test for mapping can be removed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks
--------------------------------------------+------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: new
Priority: highest | Milestone: 7.8.1
Component: Compiler | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime performance bug | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8275 | Blocked By:
| Related Tickets:
--------------------------------------------+------------------------------
Comment (by Jan Stolarek

#8456: Control flow optimisations duplicate blocks
--------------------------------------------+------------------------------
Reporter: jstolarek | Owner: jstolarek
Type: bug | Status: new
Priority: highest | Milestone: 7.8.1
Component: Compiler | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime performance bug | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: 8275 | Blocked By:
| Related Tickets:
--------------------------------------------+------------------------------
Comment (by Jan Stolarek

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: closed Priority: highest | Milestone: 7.8.1 Component: Compiler | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by jstolarek): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8456: Control flow optimisations duplicate blocks --------------------------------------------+------------------------------ Reporter: jstolarek | Owner: jstolarek Type: bug | Status: closed Priority: highest | Milestone: 7.8.1 Component: Compiler | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: 8275 | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonmar): Thanks! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8456#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC