[GHC] #15560: Full laziness destroys opportunities for join points

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Keywords: JoinPoints | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: #14287 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Even if we already know a binding is a join point we STILL float it to the top and turn it into a function. The simple example below results in a join point after the first simplifier run. Then we run the float out pass immediately undoing this by making it a top level binding. It then stays at the top till we are done resulting in the core I've put in the comments. {{{ #!haskell data T = A | B | C | D | E | F | G {-# NOINLINE n #-} n :: T -> T n A = B n B = C n _ = A f :: Int -> T -> T -> T f sel x y = -- function large enough to avoid being simply inlined let j z = n . n . n . n . n . n $ z in case sel of -- j is always tailcalled 0 -> j x _ -> j y -- j is floated to top level instead of ending up as joinpoint. -- T.f_j -- = \ (eta_B1 [OS=OneShot] :: T) -> n (n (n (n (n (n eta_B1))))) -- -- RHS size: {terms: 14, types: 6, coercions: 0, joins: 0/0} -- f :: Int -> T -> T -> T -- f = \ (sel_aYP :: Int) (x_aYQ :: T) (y_aYR :: T) -> -- case sel_aYP of { GHC.Types.I# ds_d2fL -> -- case ds_d2fL of { -- __DEFAULT -> T.f_j y_aYR; -- 0# -> T.f_j x_aYQ -- } -- } }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Generally, I think we should not float join points -- in general floating them will stop them being join points. It'd be easy to change `SetLevels` thus -- but intuition is a poor guide and it'd be really worth trying it and measuring the effect. However, if the binding can go all the way to the top level, then there seems no downside to floating it: we end up with a smaller (and hence perhaps more inlinable) function; and the jump to the join point is still a nice, efficient jump. What's not to like? (Your Description suggests that you think that doing so is Bad. Why?) All that said, there is a Huge Mess in `SetLevels` around join points with stuff about the "join ceiling". I want to expunge all that, but have lacked the time. The trouble is that we do want some limited, local floating -- I have a whole WIP tree pending on that, but it's in limbo. If anyone would like to work on this, I'll commit my WIP to a branch and offer advice/support on taking it forward. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

However, if the binding can go all the way to the top level, then there seems no downside to floating it: we end up with a smaller (and hence
#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): perhaps more inlinable) function; and the jump to the join point is still a nice, efficient jump. What's not to like? There are no top level join points as far as I know. (I might be mistaken though?) Turning them into a top level binding means the calling convention changes into the same that is used with regular functions. So we have the overhead of register saving, stack checks, layout penalty, the works. Limiting them to the top of the RHS instead seems like an advantage. Maybe that should already happen and I just hit a bug there? I haven't looked too far into the SetLevel machinery. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Turning them into a top level binding means the calling convention changes into the same that is used with regular functions. So we have the overhead of register saving, stack checks, layout penalty, the works
Can you be more specific. I think that the jumps to the join points will turn into jumps to the top-level function code, no arity checks nothing. You could look at the code and see, but I think it'll be no less efficient. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): If j becomes a top level binding we use the general calling convention. Which at the assembly level is still a jump as you said. However there are a subtle differences between jumping to top level bindings versus jumping into a basic block which can have a major performance impact. Things I can immediatly think of are: * If we jump a top level symbol we can't place the jump target immediately after the caller. This means we: * Can't eliminate one of the jump instructions, so they take up resource for branch prediction and need to be executed by the CPU. * The code won't be placed sequentially in memory leading to worse cache utilization. * Top level bindings require an additional info table compared to a regular jump target. This means more code size which is never a good thing. * Being a top level function that uses the stack `j` now performs a stack check. For very small functions this can be a lot of overhead. It's quite possible that in the general case more inlining is offsetting this cost, but in some cases this makes a major difference. For example the program below has ~7% speedup when disabling full laziness(780 vs 730ms). {{{ #!haskell --Simpler core to read without worker/wrapper {-# OPTIONS_GHC -fno-full-laziness -fno-worker-wrapper #-} {-# LANGUAGE MagicHash, BangPatterns #-} module Main where import System.Environment import GHC.Prim data T = A | B | C -- If we inline the functions case of known constructors kicks in. -- Which is good! But means j becomes small enough to be inlined -- and won't become an join point. So for this example we don't -- want that. {-# NOINLINE n #-} {-# NOINLINE f #-} n :: T -> T n A = B n B = C n _ = A toInt :: T -> Int toInt A = 1 toInt B = 2 toInt C = 3 f :: Int -> T -> T -> T f sel x y = -- function large enough to avoid being simply inlined let j z = n . n . n . n . n . n $ z in case sel of -- j is always tailcalled 0 -> j x _ -> j y main = do print $ sum . map toInt . map (\n -> f n A B) $ [0..50000000] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Wow, that's an impressively large effect. Thanks -- I had no idea. If you stare at the assembly code, can you guess which of your bullets is causing the effect here? E.g. do we in fact end up eliminating one of the jump instruction? Your point about the stack check is a good one. Info tables. Suppose a function is: * top level * not exported * always called with know, saturated calls Then it does not need either slow entry code or an info table. So rather than avoid creating such functions (by not floating join points) maybe we should apply the optimisation uniformly to all top-level functions? Hmm. What about heap checks? If a join point `j` does heap allocations, where do we do the heap-overflow check? Maybe we should absorb the heap allocation into the jump site (as if the code was inlined)? That could avoid doing two heap checks where only one is needed. (Would only work for non-recursive join points.) Also I'm unclear about how we save live variables around a GC call at the start of a join point. (On function entry we use the function's info table; on a case return point we use its info table.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Hmm. What about heap checks? If a join point `j` does heap allocations, where do we do the heap-overflow check? Maybe we should absorb the heap allocation into the jump site (as if the code was inlined)? That could avoid doing two heap checks where only one is needed. (Would only work for non-recursive join points.)
Also I'm unclear about how we save live variables around a GC call at
#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:6 simonpj]: the start of a join point. (On function entry we use the function's info table; on a case return point we use its info table.) I was under the impression that join points never allocate and that they rather re-use the closure of their enclosing scope. Also, currently full laziness will never float join points (or any other binding) that closes over local variables to top-level, so we can probably disregard heap overflow checks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I was under the impression that join points never allocate and that they rather re-use the closure of their enclosing scope
Correct. But consider {{{ f x = let j y = Just (y,y) in case x of A -> j (True, False) B -> j (False, True) C -> j (True, True) }}} Suppose branch `A` is taken. Then GHC must allocate the object `(True, False)` and jump to `j`. Then `j` must allocate `(y,y)` and return it. I suspect that there's a heap check at the beginning of the `A` branch; and then a second heap check at the start of the body of `j`. But instead we could make `j` not do a heap check (ever) and instead put on the caller (of `j`) the responsibility for making sure that there's enough heap space for `j` to do its allocation. That way we'd get one heap check, at the beginning of the `A` branch, which would ensure there was enough space for both pairs. (As you say, no closure is allocated for `j` itself, but that's an entirely different matter.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:6 simonpj]:
Wow, that's an impressively large effect. Thanks -- I had no idea. If you stare at the assembly code, can you guess which of your bullets is causing the effect here? E.g. do we in fact end up eliminating one of the jump instruction?
It's hard to tell really. I've looked at vtune and it seems the top level variant has more cache misses and decoder stalls. So it seems to be primarily a case of code size and worse code layout. While we also execute about 2,5% more instructions which definitely will come at a cost given the difference I would expect layout/code size to be larger factors.
Your point about the stack check is a good one.
Info tables. Suppose a function is: * top level * not exported * always called with know, saturated calls
Then it does not need either slow entry code or an info table. So
rather than avoid creating such functions (by not floating join points) maybe we should apply the optimisation uniformly to all top-level functions? I assume you are talking about removing the stack check here with the optimisation? Then we would still get the cache fragmentation/layout penalty. So ideally we would do both. * Remove the stack check if it's a floated join point called from many places to avoid code bloat. * Push them back into the rhs if there is only a single call site.
I suspect that there's a heap check at the beginning of the A branch; and then a second heap check at the start of the body of j. But instead we could make j not do a heap check (ever) and instead put on the caller (of j) the responsibility for making sure that there's enough heap space for j to do its allocation.
Ideally we would combine the checks for all branches into a single one to begin with. But that seems like a different issue to me as this would be beneficial with or without join points. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * related: #14287 => #14287 #13286 Comment: It seems this was changed deliberately in #13286. The ticket does mention examples where join points become top level functions as having improved but doesn't contain any performance metrics to judge the impact. I can see how it might be beneficial for exported functions, but I'm not yet convinced that this is better in general. I've also looked at the core output of the two testsuite files and at least at O2 there seems to be the same amount of floating happening with 8.0.2 and 8.4. I also don't think we will get much benefit out of the original example. Looking at the floated code: {{{ $wg_s6x5 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# -> Int# $wg_s6x5 (ww5_s6wV :: Int#) (ww6_s6wZ :: Int#) (ww7_s6x3 :: Int#) = case remInt# ww6_s6wZ 2# of { __DEFAULT -> case ww6_s6wZ of wild5_Xen { __DEFAULT -> jump $wg_s6x5 (*# ww5_s6wV ww5_s6wV) (quotInt# (-# wild5_Xen 1#) 2#) (*# ww5_s6wV ww7_s6x3); 1# -> *# ww5_s6wV ww7_s6x3 }; 0# -> $wg_s6x5 (*# ww5_s6wV ww5_s6wV) (quotInt# ww6_s6wZ 2#) ww7_s6x3 $wf_s6xg [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int# $wf_s6xg (ww3_X6Hi :: Int#) (ww4_s6xe :: Int#) = case remInt# ww4_s6xe 2# of { __DEFAULT -> case ww4_s6xe of wild3_Xe3 { __DEFAULT -> $wg_s6x5 (*# ww3_X6Hi ww3_X6Hi) (quotInt# (-# wild3_Xe3 1#) 2#) ww3_X6Hi; 1# -> ww3_X6Hi }; 0# -> $wf_s6xg (*# ww3_X6Hi ww3_X6Hi) (quotInt# ww4_s6xe 2#) GHC.Real.$w$s^1 [InlPrag=[0]] :: Int -> Int# -> Int# GHC.Real.$w$s^1 = \ (w_s6xh :: Int) (ww_s6xl :: Int#) -> case tagToEnum# @ Bool (<# ww_s6xl 0#) of { False -> case ww_s6xl of wild1_XdK { __DEFAULT -> case w_s6xh of { I# ww2_s6xa -> $wf_s6xg ww2_s6xa wild1_XdK }; 0# -> 1# }; True -> case GHC.Real.^2 of wild1_00 { } } }}} * For exponent < 0 we throw an exception so we can probably ignore that case when it comes to performance. * For exponent == 0 there is an advantage IF `GHC.Real.$w$s^1` get's inlined. But an exponent of zero seems like an unlikely case to me. * For exponents > 0 it depends heavily on how much get's inlined. * If all get inlined (essentially undoing the floating) we save nothing as the unfloated variant would have been inlined as well. * If we inline `GHC.Real.$w$s^1` and `$wf_s6xg` we save at best one call overhead if we don't jump into `wg_s6x5`, otherwise we save nothing. * If nothing get's inlined we are worse off as we now have at least as much overhead if not more should we jump into the floated functions. On the bright side the floated bindings work only on unboxed int's so might not cause an additional stack check. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Ideally we would combine the checks for all branches into a single one to begin with.
Do you mean that in {{{ case x of True -> e1 False -> e2 }}} instead of a heap-check at the start of `e1` and another at the start of `e2`, we could have a single one before the case? No, we can't do this: evaluating `x` might force a thunk, and hence allocate an arbitrary amount of stuff. If the `case` is strutinising an unlifted type (which does not require evaluating) then yes it's different, and indeed in that case we sometimes ''do'' move the heap check up. See the long `Note [Compiling case expressions]` in `StgCmmExpr.hs`. (Another possibility that looks unattractive, and that I have not explored: put the heap check after returning from evaluating `x` but before doing the case-analysis to decide which branch to take. That might reduce code size, but would never eliminate a heap check altogether; indeed it might put one in the code path that was not there before, for a branch that did not allocate at all.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I assume you are talking about removing the stack check here with the optimisation?
I wasn't very clear. My idea is to ''ignore'' whether a top-level binding started life as a join point, and instead optimise ''all'' top-level bindings the same way. There are three things in play: 1. '''Eliminating the closure and info-table for some top-level functions'''. Consider a top-level binding {{{ f = \xy. e }}} When can we do without a closure and info table for `f`? Answer: if * All calls to `f` are known, saturated calls (no partial applications). * `f` is not exported NB 1: it's fine if the call to `f` is in a lazy context, e.g. {{{ g y = let t = f (y+1) in Just t }}} NB 2: for a ''nested'', let-bound function, we do need a function closure, even if all the calls are known, saturated calls. Why? Because we need a heap-allocated container for its free variables. So we need the info table. But if all the calls are known, saturated, we do ''not'' need any slow-entry code, so the code pointer for the info table could be "crash" (will never happen). 2. '''Eliminating stack checks'''. Join points already don't have a stack check; regular let-bindings do (of course). Idea: extend to the stack- check elimination to more functions; that is, absorb any stack use for that function into its call site. When can we do this? Answer: * All calls to `f` are known, saturated calls (no partial applications). * `f` is not exported * `f` is not recursive; or `f` is top-level and tail-calls itself (or is part of a top-level tail-recursive nest). NB 1: this works for ''non-top-level'' functions too. Example: {{{ g x = let f y = blah t = f 4 + f 5 in Just t }}} Currently the thunk for `t` will do a stack check, and `f` will do its own stack check too. We could absorb `f`'s check into `t`. NB 2: why not recursive functions? Consider {{{ let f x = if x==0 then 0 else 1 + f (x-1) }}} Clearly the stack grows, so we can't eliminate the stack check. But if `f` is a join point there is no problem, and similarly at top level (i.e. not a join point) if it tail-calls itself or some other top-level function with the same property.... let's call these "top-level join points". 3. '''Eliminating heap checks'''. Idea: absorb the heap check for a function into its caller. When can we do this? * All calls to `f` are known, saturated calls (no partial applications). * `f` is not exported * `f` is not recursive. Reason: if it's recursive, one of its call sites is in its own RHS, so we can't statically bound how much heap it uses. NB 1: absorbing the heap check is ''not'' currently done even for join points. (Reason: it only works for ''non-recursive'' functions.) NB: None of these optimisations rely on `f` being a join point. All of this is STG-level/code-gen level optimisation, not Core. These would be interesting ideas to try out. If you feel motivated, I could advise. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:11 simonpj]:
Ideally we would combine the checks for all branches into a single one to begin with.
(Another possibility that looks unattractive, and that I have not explored: put the heap check after returning from evaluating `x` but before doing the case-analysis to decide which branch to take. That might reduce code size, but would never eliminate a heap check altogether; indeed it might put one in the code path that was not there before, for a branch that did not allocate at all.)
I wasn't very clear. My idea is to ignore whether a top-level binding started life as a join point, and instead optimise all top-level bindings
The obvious solution seems to me is to simply limit this to cases where all branches allocate. This would reduce code size while coming with few penalties. It would probably make sense to special case two other scenarios: * If the difference in allocation is huge. * If the non allocating branches are bottoming (eg pattern match failures). But I'm not sure how easy it ease to check these conditions during StgToCmm generation. I'm not really familiar with that part of codegen yet. the same way. That sounds like something we should do! Making them join points would still have additional advantages for code layout and possible register allocation. So a late "float in" pass of sorts after we are done inlining might still make sense. But your suggest changes should still reduce overhead of these calls quite a bit compared to what we have now.
These would be interesting ideas to try out. If you feel motivated, I could advise.
I'm interested as I might need these optimizations for other things I'm working on anyway. I guess a good way to start would be to look into 1) starting at the StgToCmm code? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
(comment:11) Another possibility that looks unattractive, ... The obvious solution seems to me is to simply limit this to cases where all branches allocate
Perhaps, but I think the other 1,2,3 possibilities look as if they'll have more payoff per hour invested. Once they are done, maybe come back to this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I guess a good way to start would be to look into 1) starting at the StgToCmm code?
Yes, sounds good to me! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:14 simonpj]:
(comment:11) Another possibility that looks unattractive, ... The obvious solution seems to me is to simply limit this to cases where all branches allocate
Perhaps, but I think the other 1,2,3 possibilities look as if they'll have more payoff per hour invested. Once they are done, maybe come back to this.
Hard to judge given that the heap check issue also arises in code not using join points at all. But it certainly seems like an orthogonal issue. I will open a seperate ticket for this so we don't get too much noise on this ticket. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chessai): * owner: (none) => chessai * milestone: 8.6.1 => 8.8.1 Comment: I am looking into this. Not sure if it's going to be better, but it would at least be good to have benchmarks for the differences over various mock setups and popular libraries. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I am looking into this.
Great -- thank you! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): @chessai Are you still interested in this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chessai): Replying to [comment:19 AndreasK]:
@chessai Are you still interested in this?
Yes - I am working on https://ghc.haskell.org/trac/ghc/ticket/13104 as a sort of preparation, it's a simpler issue related to join points, in that it's less time consuming and has a straightforward solution, as far as i can tell. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chessai): is it ok to make a `joinpoints` directory in testsuite/tests/? if not, where should tests for this go? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): There is already `testsuite/tests/perf/join_points` which you can use. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chessai): ah, i didn't look in perf/. thanks -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC