[GHC] #10069: CPR related performance issue

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- By default CRP analysis can be too aggressive in trying to pass as much as possible in unboxed tuples, in general it's not a problem but when one big datatype is passed to several consumers it might end up pushed to stack several times instead of once - to heap, things are getting worse when there are sufficient fields to cause stack overflow which otherwise is possible to avoid - in our codebase adding one field with ExistentialQuantification (unused, but that prevents ghc from doing CRP transformation) reduces number of stack overflow by a factor of 1000 and increases overall performance by 10%. In provided example performance for both A and B should be identical and yet B is consistently faster by 3-5% It's possible to increase this performance gap by adding more and more fields. I was able to replicate this issue in ghc 7.8.3 and 7.10,1rc2 {{{#!hs {-# LANGUAGE ExistentialQuantification #-} module Blah where import Criterion import Criterion.Main import Data.Typeable data A = A () !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int data B = forall rep. (Typeable rep) => B rep !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int !Int a :: A a = A () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 b :: B b = B () 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 {-# NOINLINE a1 #-} a1 :: A -> Int a1 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1 {-# NOINLINE a2 #-} a2 :: A -> Int a2 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2 {-# NOINLINE a3 #-} a3 :: A -> Int a3 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3 {-# NOINLINE a4 #-} a4 :: A -> Int a4 (A _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4 {-# NOINLINE b1 #-} b1 :: B -> Int b1 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f1 {-# NOINLINE b2 #-} b2 :: B -> Int b2 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f2 {-# NOINLINE b3 #-} b3 :: B -> Int b3 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f3 {-# NOINLINE b4 #-} b4 :: B -> Int b4 (B _ f1 f2 f3 f4 f5 f6 f7 f8 g1 g2 g3 g4 g5 g6 g7 g8 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _) = f4 {-# NOINLINE fa #-} fa :: A -> Int fa a = a1 a + a2 a + a3 a + a4 a {-# NOINLINE fb #-} fb :: B -> Int fb b = b1 b + b2 b + b3 b + b4 b main :: IO () main = defaultMain [ bgroup "single call" [ bench "A" $ whnf fa a , bench "B" $ whnf fb b ] ] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by akio): * cc: tkn.akio@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by maoe): * cc: maoe@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): For me `fa` and `fb` generate identical code. Are you sure about those NOINLINE pragmas? Maybe give the command line you use for compiling, and show the output of `-ddump-simpl`? Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Those NOINLINE pragmas are required to replicate the behavior I'm getting in the production code. Generated core looks the same, generated assembly code is different though. With more fields added (see attachment) performance difference is 4x, goes away as soon as I get rid of forall'ed field. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): It's compiled with ghc -O2 bench.hs -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Thanks for the `-ddump-simpl` output. I can see what is happening. The trouble is that `a1`, `a2` etc are marked `NOINLINE`, but their strictness information is still visible. That makes `fa` strict too. So the original code looks like this (I have decreased the number of selectors to make it easy to read): {{{ --------- Before strictness analysis ------------- fa a = case a1 a of I# i1 -> case a2 a of I# i2 -> I# (i1 +# i2) --------- After strictness analysis and worker/wrapper ------------- fa a = case a of A _ _ _ _ i1 _ _ _ _ i2 _ _ _ _ -> $wfa i1 i2 $wfa i1 i2 = let a = A bot bot bot bot i1 bot bot bot bot i2 bot bot bot bot in case a1 a of I# i1 -> case a2 a of I# i2 -> I# (i1 +# i2) }}} The worker `$wfa` would normally collapse into nice tight code, when the workers for `a1` and `a2` are inlined. But here they are not! So `$wfa` does reboxing which is terrible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): There are no NOINLINE in code we are using in production but behavior is similar - works faster with unused field with ExistentialQuantification. So it disables not only CPR but some other optimizations as well. What's the conclusion then? Is that fixable or I need to look for other examples? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): That's odd. I don't understand how that is happening. I need more information. Can you show a `-ddump-simpl` of the bit of your production code that is allocating more? (With and without the change.) If you compile with `-ticky` and then run with `+RTS -rfoo.ticky` the file `foo.ticky` will show which function is allocating how much. If you compare with and without the existential change you should see an obvious culprit. Thanks Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Found a file, it contains -fno-spec-constr on top, a comment to that key: -- this was added because it makes compilation blow up with ghc-7.8.3, and -- sims were neutral. -- Consider removing after ghc is fixed. It contains relatively simple Parser parser and a bunch of functions modifying a huge data structure - different fields in this structure. 507572 1368414112 0 313 >MSMLSSSSSSLLMMSDDDD XXXXXXX-0.1.0.0:XXXXXXX.$wa{v rbsKm} With removed -fno-spec-constr it takes forever to compile (30 minutes already, that's 700 lines of code). Trying to pick relevant bits of core. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Leave the `-fno-spec-constr`. Let's change only one thing at a time. Does it happen with `-O` or only with `-O2`? Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): With -O $wa is still there and looks about the same. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): And performance also suffers. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Like I say in comment:8, I can't help without more info. Can you give a `-ddump-simpl` of code that shows the problem without use of NOINLINE? Maybe your production code. I understand why it happens with NOINLINE (comment:6). But you say you don't use it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): 22k lines, trying to pick relevant bits. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Simon, can I send you an email? Core is 1.3Mb so I can't attach it here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): $wa takes half of the file. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): send email if you like, yes -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Did you got my email? Can I get any other extra information for you? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Yes, got it, but I'm just snowed under. It would be super-helpful to be able to reproduce your problem myself, if you were somehow able to boil your production code down into a test case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Emailed 10x smaller version. I'm still trying to untangle some dependencies to provide hs file. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): OK, I looked at the 10x smaller version. Can you attach it to the ticket? (180kb is ok I think.) I could do so, but do not want to upload sensitive info. I have learned some things (see below). But to explore solutions I really do need a reproducible test case. So if you untangle dependencies that would be really helpful. It's hard for me to make more progress without that. E.g. I want to see the code just before strictness analysis. I want to see the code and strictness for `Trader.Order.Logging.$wa`, which is in a different module. The rest of this is notes for me or other hackers, to record what I have learned. I'm certain that difficulty is to do with '''reboxing''' (c.f. #2289). I see a function `processManualOrders` that I believe looks somethign like this (before strictness analysis) {{{ pma _ p q r = let help x v = case select_deep_field_of p of True -> ...(help x v')... False -> log_fn x -- log_fn is lazy in x in help p r }}} So the `help` function is not strict in `x`. But `pma` is strict in `p` because of the `(select_deep_field p)` in `help`. Sadly that means that `p` is passed to `help` in ''boxed'' form (since `help` is lazy), but is passed to `pma` in ''unboxed'' form (since `pma` is strict). The result is that `p` is re-boxed in the worker for `pma`. And `p` is a record with many tens of fields. Various things would help here: * Running the static argument transformation on `help` would solve the problem completely. * It probably does not make sense to do worker/wrapper on records with huge numbers of fields. We end up passing arguments in memory anyway, just on the stack instead of in the heap. And the reboxing penalty is large. * In `Demand.bothUse` I see {{{ bothUse (UProd {}) (UCall {}) = Used -- bothUse (UProd {}) Used = Used -- Note [Used should win] bothUse Used (UProd ux) = UProd (map (`bothMaybeUsed` useTop) ux) bothUse (UProd ux) Used = UProd (map (`bothMaybeUsed` useTop) ux) bothUse Used _ = Used -- Note [Used should win] }}} Note that commented out line. If `Used` really did win, I think that most of this fruitless work would go away. But it's delicate and I'm sure the commented out line was commented out for a reason. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Haskell code itself that generates slowdown, I've managed to get rid of most of dependencies that not in Haskell Platform. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): The dependencies are still substantial, on `lens` in particular. pacak tells me that he is working on cutting it down further. (Thank you.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by pacak): Simon, did you checked most recent upload? From non-base packages that's only enummapset-th and lifted-base -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by liyang): * cc: ghc.haskell.org@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Fuuzetsu): Data-point from 8.6.2 using original code (first comment): {{{ [nix-shell:/tmp]$ ghc -O2 Blah.hs && ./Blah benchmarking single call/A time 21.83 ns (21.55 ns .. 22.09 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 21.77 ns (21.62 ns .. 22.07 ns) std dev 705.5 ps (434.8 ps .. 1.223 ns) variance introduced by outliers: 53% (severely inflated) benchmarking single call/B time 9.741 ns (9.581 ns .. 9.937 ns) 0.998 R² (0.996 R² .. 0.999 R²) mean 9.762 ns (9.641 ns .. 9.942 ns) std dev 490.4 ps (347.4 ps .. 649.6 ps) variance introduced by outliers: 74% (severely inflated) }}} I'm attaching -ddump-simpl output (Blah.dump-simpl). It seems the ticket is still a problem and not 3-5% but 100% -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Fuuzetsu): * Attachment "Blah.dump-simpl" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Fuuzetsu): I ran with multiple GHC versions. I'm attaching a tarball with the results above and Core from each version. It appears the problem went way briefly in 8.0.2 but came back worse than ever afterwards. For exact package sets, I used these: {{{ resolver: lts-6.35 # 7.10.3 resolver: lts-9.21 # 8.0.2 resolver: lts-11.22 # 8.2.2 resolver: lts-12.20 # 8.4.4 resolver: nightly-2018-11-24 # 8.6.2 }}} GHC version 7.10.3 {{{ benchmarking single call/A time 9.800 ns (9.730 ns .. 9.890 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 9.907 ns (9.844 ns .. 9.996 ns) std dev 250.2 ps (167.4 ps .. 344.7 ps) variance introduced by outliers: 42% (moderately inflated) benchmarking single call/B time 10.09 ns (10.00 ns .. 10.20 ns) 0.999 R² (0.999 R² .. 0.999 R²) mean 10.37 ns (10.25 ns .. 10.49 ns) std dev 403.4 ps (353.8 ps .. 475.9 ps) variance introduced by outliers: 63% (severely inflated) }}} GHC version 8.0.2 {{{ benchmarking single call/A time 9.794 ns (9.695 ns .. 9.914 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 9.884 ns (9.805 ns .. 10.01 ns) std dev 308.2 ps (211.6 ps .. 449.5 ps) variance introduced by outliers: 52% (severely inflated) benchmarking single call/B time 9.861 ns (9.784 ns .. 9.946 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 9.834 ns (9.799 ns .. 9.879 ns) std dev 133.3 ps (92.41 ps .. 201.6 ps) variance introduced by outliers: 17% (moderately inflated) }}} GHC version 8.2.2 {{{ benchmarking single call/A time 22.82 ns (22.59 ns .. 23.16 ns) 0.999 R² (0.997 R² .. 1.000 R²) mean 22.83 ns (22.60 ns .. 23.15 ns) std dev 890.2 ps (655.8 ps .. 1.268 ns) variance introduced by outliers: 62% (severely inflated) benchmarking single call/B time 10.10 ns (10.06 ns .. 10.15 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 10.11 ns (10.08 ns .. 10.15 ns) std dev 120.7 ps (72.17 ps .. 184.5 ps) variance introduced by outliers: 14% (moderately inflated) }}} GHC version 8.4.4 {{{ benchmarking single call/A time 21.51 ns (21.33 ns .. 21.72 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 21.55 ns (21.35 ns .. 22.21 ns) std dev 1.128 ns (374.2 ps .. 2.297 ns) variance introduced by outliers: 75% (severely inflated) benchmarking single call/B time 9.505 ns (9.449 ns .. 9.568 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 9.509 ns (9.454 ns .. 9.570 ns) std dev 197.7 ps (159.3 ps .. 243.1 ps) variance introduced by outliers: 33% (moderately inflated) }}} GHC version 8.6.2 {{{ benchmarking single call/A time 21.71 ns (21.44 ns .. 22.06 ns) 0.998 R² (0.998 R² .. 0.999 R²) mean 21.93 ns (21.70 ns .. 22.24 ns) std dev 884.3 ps (721.8 ps .. 1.228 ns) variance introduced by outliers: 64% (severely inflated) benchmarking single call/B time 10.28 ns (10.20 ns .. 10.40 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 10.31 ns (10.24 ns .. 10.41 ns) std dev 286.9 ps (224.3 ps .. 372.6 ps) variance introduced by outliers: 46% (moderately inflated) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Fuuzetsu): * Attachment "results.tar.gz" added. Results from runs of multiple GHC versions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by pacak): * version: 7.10.1-rc2 => 8.6.2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Fuuzetsu): * cc: Fuuzetsu (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: CPRAnalysis, | DemandAnalysis Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CPRAnalysis, DemandAnalysis -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: | DemandAnalysis Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * keywords: CPRAnalysis, DemandAnalysis => DemandAnalysis Comment: Looking at https://ghc.haskell.org/trac/ghc/attachment/ticket/10069/Blah .dump-simpl#L1668, I don't think this is related to CPR analysis but to the worker/wrapper transformation having issues with NOINLINE functions. What happens here is that `f1` to `f4` can't be inlined (so we don't see the case on the `A`), but `fa` still gets a strictness signature saying that all but arguments 2 to 5 are dead. WW will now split `fa` into a wrapper function that scrutinises the `A` to just project out the 4 arguments that aren't dead and pass it on to the worker `$wfa` unboxed. So far so good. Now, WW arranges it so that the worker `$wfa` builds up a new `A` with dummy values for absent fields (0# for Int#). Normally, this new `A` binding would cancel out with case matches in `$wfa`, because the strictness signature must ultimately come from some case expression. These however are hidden in `NOINLINE` functions, so no cancelling is happening. As a result, we allocate the dummy `A` for nothing, we could have just passed along the old `A`. Here's an example demonstrating this in the small: {{{#!hs data C = C !Int !Int {-# NOINLINE c1 #-} c1 :: C -> Int c1 (C _ c) = c {-# NOINLINE fc #-} fc :: C -> Int fc c = c1 c + c1 c }}} Relevant Core: {{{#!hs c1_rP = \ (ds_d3af :: C) -> case ds_d3af of { C dt_d3PA dt1_d3PB -> GHC.Types.I# dt1_d3PB } Main.$wfc = \ (ww_s7DJ :: GHC.Prim.Int#) -> case c1_rP (Main.C 0# ww_s7DJ) of { GHC.Types.I# x_a4kS -> GHC.Prim.*# 2# x_a4kS } fc = \ (w_s7DF :: C) -> case w_s7DF of { C ww1_s7DI ww2_s7DJ -> case Main.$wfc ww2_s7DJ of ww3_s7DN { __DEFAULT -> GHC.Types.I# ww3_s7DN } } }}} The problem I see here is that we don't WW `c1`, or that we don't inline the resulting wrapper into `fc` before the hypothetical worker `$wc1` of `c1` gets inlined back into `c1` because it's so small. If we inlined `$wc1` into `$wfc`, the case on `C` would cancel out with the dummy `C` and everything would be well. So: If we WW `fc`, we should also WW `c1`, otherwise we end up with bad code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: | DemandAnalysis Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
So: If we WW fc, we should also WW c1, otherwise we end up with bad code.
Quite right. Why does that not happen? It's because of {{{ tryWW dflags fam_envs is_rec fn_id rhs -- See Note [Worker-wrapper for NOINLINE functions] | Just stable_unf <- certainlyWillInline dflags fn_info = return [ (fn_id `setIdUnfolding` stable_unf, rhs) ] -- See Note [Don't w/w INLINE things] -- See Note [Don't w/w inline small non-loop-breaker things] }}} The Notes are informative. What happens here is that * `certainlyWillInline` returns True for `c1`, despite the NOINLINE pragma * We set a stable unfolding, for reasons described in `Note [Don't w/w inline small non-loop-breaker things]` * But alas the NOINLINE stays there and prevents `c1` inlining. I think we should always create a wrapper for NOINLINE things, just as if they were big. We already have technology for transferring the NOINLINE pragma to the worker; see * `Note [Worker-wrapper for INLINABLE functions]` * `Note [Worker activation]`, which even talks about NOINLINE specifically. So the bottom line is, I think, that the `certainlyWillInline` test should return False for NOINLINE functions. This call in `WorkWrap` is its only call site, so the change should be easy. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: | DemandAnalysis Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: T10069 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): !401 Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * status: new => patch * testcase: => T10069 * differential: => !401 Comment: I prepared a fix in https://gitlab.haskell.org/ghc/ghc/merge_requests/401. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: | DemandAnalysis Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: T10069 Blocked By: | Blocking: Related Tickets: #13143 | Differential Rev(s): !401 Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * related: => #13143 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue -------------------------------------+------------------------------------- Reporter: pacak | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.6.2 Resolution: | Keywords: | DemandAnalysis Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: T10069 Blocked By: | Blocking: Related Tickets: #13143 | Differential Rev(s): !401 Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * milestone: => 8.8.1 Comment: I think this would be nice to have in 8.8. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10069#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10069: CPR related performance issue
-------------------------------------+-------------------------------------
Reporter: pacak | Owner: (none)
Type: bug | Status: patch
Priority: normal | Milestone: 8.8.1
Component: Compiler | Version: 8.6.2
Resolution: | Keywords:
| DemandAnalysis
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case: T10069
Blocked By: | Blocking:
Related Tickets: #13143 | Differential Rev(s): !401
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Marge Bot
participants (1)
-
GHC