[GHC] #15560: Full laziness destroys opportunities for join points

GHC

23 Aug 23 Aug

9:55 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Generally, I think we should not float join points -- in general floating them will stop them being join points. It'd be easy to change `SetLevels` thus -- but intuition is a poor guide and it'd be really worth trying it and measuring the effect. However, if the binding can go all the way to the top level, then there seems no downside to floating it: we end up with a smaller (and hence perhaps more inlinable) function; and the jump to the join point is still a nice, efficient jump. What's not to like? (Your Description suggests that you think that doing so is Bad. Why?) All that said, there is a Huge Mess in `SetLevels` around join points with stuff about the "join ceiling". I want to expunge all that, but have lacked the time. The trouble is that we do want some limited, local floating -- I have a whole WIP tree pending on that, but it's in limbo. If anyone would like to work on this, I'll commit my WIP to a branch and offer advice/support on taking it forward. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

24 Aug 24 Aug

9:56 a.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):

...

Turning them into a top level binding means the calling convention changes into the same that is used with regular functions. So we have the overhead of register saving, stack checks, layout penalty, the works

Can you be more specific. I think that the jumps to the join points will turn into jumps to the top-level function code, no arity checks nothing. You could look at the code and see, but I think it'll be no less efficient. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

1:22 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): If j becomes a top level binding we use the general calling convention. Which at the assembly level is still a jump as you said. However there are a subtle differences between jumping to top level bindings versus jumping into a basic block which can have a major performance impact. Things I can immediatly think of are: * If we jump a top level symbol we can't place the jump target immediately after the caller. This means we: * Can't eliminate one of the jump instructions, so they take up resource for branch prediction and need to be executed by the CPU. * The code won't be placed sequentially in memory leading to worse cache utilization. * Top level bindings require an additional info table compared to a regular jump target. This means more code size which is never a good thing. * Being a top level function that uses the stack `j` now performs a stack check. For very small functions this can be a lot of overhead. It's quite possible that in the general case more inlining is offsetting this cost, but in some cases this makes a major difference. For example the program below has ~7% speedup when disabling full laziness(780 vs 730ms). {{{ #!haskell --Simpler core to read without worker/wrapper {-# OPTIONS_GHC -fno-full-laziness -fno-worker-wrapper #-} {-# LANGUAGE MagicHash, BangPatterns #-} module Main where import System.Environment import GHC.Prim data T = A | B | C -- If we inline the functions case of known constructors kicks in. -- Which is good! But means j becomes small enough to be inlined -- and won't become an join point. So for this example we don't -- want that. {-# NOINLINE n #-} {-# NOINLINE f #-} n :: T -> T n A = B n B = C n _ = A toInt :: T -> Int toInt A = 1 toInt B = 2 toInt C = 3 f :: Int -> T -> T -> T f sel x y = -- function large enough to avoid being simply inlined let j z = n . n . n . n . n . n $ z in case sel of -- j is always tailcalled 0 -> j x _ -> j y main = do print $ sum . map toInt . map (\n -> f n A B) $ [0..50000000] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

4:55 p.m.

...

Hmm. What about heap checks? If a join point `j` does heap allocations, where do we do the heap-overflow check? Maybe we should absorb the heap allocation into the jump site (as if the code was inlined)? That could avoid doing two heap checks where only one is needed. (Would only work for non-recursive join points.)

Also I'm unclear about how we save live variables around a GC call at

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:6 simonpj]: the start of a join point. (On function entry we use the function's info table; on a case return point we use its info table.) I was under the impression that join points never allocate and that they rather re-use the closure of their enclosing scope. Also, currently full laziness will never float join points (or any other binding) that closes over local variables to top-level, so we can probably disregard heap overflow checks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

25 Aug 25 Aug

12:55 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:6 simonpj]:

...

Wow, that's an impressively large effect. Thanks -- I had no idea. If you stare at the assembly code, can you guess which of your bullets is causing the effect here? E.g. do we in fact end up eliminating one of the jump instruction?

It's hard to tell really. I've looked at vtune and it seems the top level variant has more cache misses and decoder stalls. So it seems to be primarily a case of code size and worse code layout. While we also execute about 2,5% more instructions which definitely will come at a cost given the difference I would expect layout/code size to be larger factors.

...

Your point about the stack check is a good one.

Info tables. Suppose a function is: * top level * not exported * always called with know, saturated calls

Then it does not need either slow entry code or an info table. So

rather than avoid creating such functions (by not floating join points) maybe we should apply the optimisation uniformly to all top-level functions? I assume you are talking about removing the stack check here with the optimisation? Then we would still get the cache fragmentation/layout penalty. So ideally we would do both. * Remove the stack check if it's a floated join point called from many places to avoid code bloat. * Push them back into the rhs if there is only a single call site.

...

I suspect that there's a heap check at the beginning of the A branch; and then a second heap check at the start of the body of j. But instead we could make j not do a heap check (ever) and instead put on the caller (of j) the responsibility for making sure that there's enough heap space for j to do its allocation.

Ideally we would combine the checks for all branches into a single one to begin with. But that seems like a different issue to me as this would be beneficial with or without join points. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

28 Aug 28 Aug

8:18 a.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):

...

Ideally we would combine the checks for all branches into a single one to begin with.

Do you mean that in {{{ case x of True -> e1 False -> e2 }}} instead of a heap-check at the start of `e1` and another at the start of `e2`, we could have a single one before the case? No, we can't do this: evaluating `x` might force a thunk, and hence allocate an arbitrary amount of stuff. If the `case` is strutinising an unlifted type (which does not require evaluating) then yes it's different, and indeed in that case we sometimes ''do'' move the heap check up. See the long `Note [Compiling case expressions]` in `StgCmmExpr.hs`. (Another possibility that looks unattractive, and that I have not explored: put the heap check after returning from evaluating `x` but before doing the case-analysis to decide which branch to take. That might reduce code size, but would never eliminate a heap check altogether; indeed it might put one in the code path that was not there before, for a branch that did not allocate at all.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

9:58 a.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:11 simonpj]:

...

...
Ideally we would combine the checks for all branches into a single one to begin with.

(Another possibility that looks unattractive, and that I have not explored: put the heap check after returning from evaluating `x` but before doing the case-analysis to decide which branch to take. That might reduce code size, but would never eliminate a heap check altogether; indeed it might put one in the code path that was not there before, for a branch that did not allocate at all.)

...

I wasn't very clear. My idea is to ignore whether a top-level binding started life as a join point, and instead optimise all top-level bindings

The obvious solution seems to me is to simply limit this to cases where all branches allocate. This would reduce code size while coming with few penalties. It would probably make sense to special case two other scenarios: * If the difference in allocation is huge. * If the non allocating branches are bottoming (eg pattern match failures). But I'm not sure how easy it ease to check these conditions during StgToCmm generation. I'm not really familiar with that part of codegen yet. the same way. That sounds like something we should do! Making them join points would still have additional advantages for code layout and possible register allocation. So a late "float in" pass of sorts after we are done inlining might still make sense. But your suggest changes should still reduce overhead of these calls quite a bit compared to what we have now.

...

These would be interesting ideas to try out. If you feel motivated, I could advise.

I'm interested as I might need these optimizations for other things I'm working on anyway. I guess a good way to start would be to look into 1) starting at the StgToCmm code? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

10:09 a.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):

...

I guess a good way to start would be to look into 1) starting at the StgToCmm code?

Yes, sounds good to me! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

29 Sep 29 Sep

4:05 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chessai): * owner: (none) => chessai * milestone: 8.6.1 => 8.8.1 Comment: I am looking into this. Not sure if it's going to be better, but it would at least be good to have benchmarks for the differences over various mock setups and popular libraries. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

22 Oct 22 Oct

1:25 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): @chessai Are you still interested in this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

5 Nov 5 Nov

2:17 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chessai): is it ok to make a `joinpoints` directory in testsuite/tests/? if not, where should tests for this go? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

4:05 p.m.

#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: chessai Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 #13286 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chessai): ah, i didn't look in perf/. thanks -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Reply

Sign in to reply online Use email software

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

tags

participants (1)