[GHC] #10788: performance regression involving minimum (and maybe Vector)

#10788: performance regression involving minimum (and maybe Vector) -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- This program (taken from http://stackoverflow.com/questions/32158319 /difference-in-performance-for-coin-change-between-haskell-and-c) runs about 50% slower when compiled with `ghc-7.10.1 -O` compared to `ghc-7.8.4 -O`. {{{ import Data.Vector.Unboxed as U ((!), constructN, length) coinchangev :: Int -> [Int] -> Int coinchangev n cs = v ! n where v = constructN (n+1) f f w = case U.length w of 0 -> 0 m -> 1 + minimum [w ! x | x <- map (m-) cs, x >= 0] main = print $ coinchangev 10000000 [1, 5, 10, 25, 100] }}} However if I change `minimum` to `sum`, while the runtime in 7.8.4 is unchanged, the runtime in 7.10.1 drops by a factor of 5! Allocations also decrease by a large factor. So my guess is that something has gone wrong with call arity analysis for `minimum` (and gone very right for `sum`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum (and maybe Vector) -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): In 7.8, both `minimum` and `sum` were non-fusing left-folds. In 7.10, `sum`, via `foldl`, is fusing, so this is the 5× improvement you observe. `minimum` is not easily foldable: It is a `foldl1`, which treats the first `(:)` different from the rest, and it is not clear how to fix that. So performance difference between `minimum` and `sum` in 7.10 can be explained. What needs to be investigated is why 7.10 degraded by 50% over 7.8. I do not expect Call Arity/foldl fusion to play a role here, but I might be wrong. BTW: One could reasonably expect the compiler to transform `minimum (x:xs)` into `foldl' min x xs`, which could then maybe fuse, but that does not seem to be the case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum (and maybe Vector)
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by nomeata):
GHC 7.8 will actually inline `minimum` (and hence `foldl1` and `foldl`),
allowing the compiler to specialize it for the type at hand:
{{{#!hs
Rec {
$wlgo_r6X4 :: GHC.Prim.Int# -> [GHC.Types.Int] -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType ]
$wlgo_r6X4 =
\ (ww_s6TV :: GHC.Prim.Int#) (w_s6TS :: [GHC.Types.Int]) ->
case w_s6TS of _ [Occ=Dead] {
[] -> ww_s6TV;
: x_a52E xs_a52F ->
case x_a52E of _ [Occ=Dead] { GHC.Types.I# y1_a52Q ->
case GHC.Prim.tagToEnum#
@ GHC.Types.Bool (GHC.Prim.<=# ww_s6TV y1_a52Q)
of _ [Occ=Dead] {
GHC.Types.False -> $wlgo_r6X4 y1_a52Q xs_a52F;
GHC.Types.True -> $wlgo_r6X4 ww_s6TV xs_a52F
}
}
}
end Rec }
}}}
GHC-7.10 will ''not'' inline `minimum`, but only replace it by
`minimumStrict` via a rule, and the latter then called polymorphically:
{{{#!hs
...
case strictMinimum @ Int GHC.Classes.$fOrdInt (go_a6gJ
cs_r9bl)
of _ [Occ=Dead] { I# y_a6hm ->
}}}
GHC-7.8’s inlining seems to be a little excessive, but in 7.10 there is
certainly a lack of specialization. Maybe some `INLINEABLE` pragma would
help? Not sure...
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:2
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#10788: performance regression involving minimum (and maybe Vector)
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by rwbarton):
It looks like more inlining happened in 7.10.1 at the definition site of
`strictMinimum`, and the result was that the unfolding became too large
for GHC to want to inline it at use sites.
7.8.4:
{{{
Considering inlining: Data.List.strictMinimum
arg infos [ValueArg, NonTrivArg]
uf arity 2
interesting continuation CaseCtxt
some_benefit True
is exp: True
is work-free: True
guidance IF_ARGS [30 30] 80 0
discounted size = -10
ANSWER = YES
strictMinimum :: GHC.Classes.Ord a -> [a] -> a
{- Arity: 2, Strictness: ,
Unfolding: (\ @ a $dOrd :: GHC.Classes.Ord a ds :: [a] ->
case ds of wild {
[] -> Data.List.minimum1 @ a
: ipv ipv1
-> Data.List.foldl'
@ a
@ a
(GHC.Classes.min @ a $dOrd)
ipv
ipv1 }) -}
}}}
7.10.1:
{{{
Considering inlining: strictMinimum
arg infos [ValueArg, NonTrivArg]
interesting continuation CaseCtxt
some_benefit True
is exp: True
is work-free: True
guidance IF_ARGS [30 30] 200 0
discounted size = 110
ANSWER = NO
strictMinimum :: Ord a => [a] -> a
{- Arity: 2, Strictness: ,
Unfolding: (\ @ a $dOrd :: Ord a ds :: [a] ->
case ds of wild {
[] -> minimum1 @ a
: ipv ipv1
-> let {
k :: a -> a -> a = min @ a $dOrd
} in
letrec {
go :: [a] -> a -> a {- Arity: 2, Strictness:
-}
= \ ds1 :: [a] eta :: a ->
case ds1 of wild1 {
[] -> eta : y ys -> case eta of z { DEFAULT ->
go ys (k z y) } }
} in
go ipv1 ipv }) -}
}}}
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:3
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: new Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by rwbarton): * owner: => ekmett * component: Compiler => Core Libraries -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: new Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): So one solution would be to mark `strictMinimum` as `INLINE`, so that its unfolding stays small and both `strictMinimum` and `foldl` will be inlined at the use site? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

So one solution would be to mark strictMinimum as INLINE, so that its unfolding stays small and both strictMinimum and foldl will be inlined at
#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: new Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): the use site? Tried that. The unfolding will then mention `foldr` instead of the above code, but is still too large. I was expecting the unfolding to mention `foldl1'`, as I thought the unfolding of something marked `INLINE` is never changed? Weird. Maybe someone else can give this a shot? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner: ekmett
Type: bug | Status: new
Priority: normal | Milestone:
Component: Core Libraries | Version: 7.10.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by nomeata):
With
{{{
{-# SPECIALIZE minimum :: [Int] -> Int #-}
}}}
instead of the rule rewriting it to strictMinimum, and *not* adding an
INLINE pragma to `minimum`, I get good code in `GHC.List`, and this is
being used here without too much inlining (it inlines the wrapper that
distinguishes `[]` from a non-empty list and unboxes the int, but then
calls the wrapper in `GHC.List.$wgo1`.
Removing `INLINE` is important, as otherwise we’d be having this worker in
every use of minimum.
Even without `INLINE` we have this in the interface
{{{
minimum :: Ord a => [a] -> a
{- Arity: 2, Strictness: ,
Unfolding: (\ @ a11 $dOrd :: Ord a11 ds :: [a11] ->
case ds of wild {
[] -> minimum3 @ a11
: ipv ipv1
-> let {
k :: a11 -> a11 -> a11 = min @ a11 $dOrd
} in
letrec {
go :: [a11] -> a11 -> a11 {- Arity: 2, Strictness:

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: new Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj):
This is all so brittle...
I agree that brittle-ness is bad. Can you stand back and give a description of the brittle-ness? * Of course, if a function is inlined, it can be specialised for the call site, and without pragmas that decision is indeed dependent on how big the function is. I see no way to avoid that. * But pragmas should not be brittle. Can you show an example in which they seem to be. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: new Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata):
Can you stand back and give a description of the brittle-ness?
I actually planned to write something to the list about this... and I [https://mail.haskell.org/pipermail/libraries/2015-September/026099.html now have]. In this particular case, I am quite happy with the `INLINEABLE`/`SPECIALIZE` solution, and will submit a DR soon. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: new Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata):
In this particular case, I am quite happy with the INLINEABLE/SPECIALIZE solution, and will submit a DR soon.
Spoke too soon. Looking at the core of `List`, the `maximum` for `Int` is
great (worker with a strict unboxed `Int`), but for `Integer`, the
strictness analyzer is failing me, and I get this loop:
{{{#!hs
Rec {
-- RHS size: {terms: 12, types: 8, coercions: 0}
maximum_go [Occ=LoopBreaker] :: [Integer] -> Integer -> Integer
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType ]
maximum_go =
\ (ds_a2d4 :: [Integer]) (eta_B1 :: Integer) ->
case ds_a2d4 of _ [Occ=Dead] {
[] -> eta_B1;
: y_a2d9 ys_a2da ->
maximum_go
ys_a2da
(integer-gmp-1.0.0.0:GHC.Integer.Type.$fOrdInteger_$cmax
eta_B1 y_a2d9)
}
end Rec }
}}}
So it sees that `go` is strict in its second argument. Why is it then not
strictly evaluated before the recursive call, avoiding this obvious space
leak?
Note that `max` is not inlined (as it is for Int), but the strictness data
is there, so that should not make a difference.
Anyways, I guess this discussion is derailing for this particular ticket.
I’ll look into a combination of rewriting with `RULES` to `strictMinimum`
to avoid relying on the strictness analyzer to produce good code.
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:10
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#10788: performance regression involving minimum
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner: ekmett
Type: bug | Status: new
Priority: normal | Milestone:
Component: Core Libraries | Version: 7.10.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: closed Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by nomeata): * status: new => closed * resolution: => fixed Comment: I applied a small (and probably uncontroversial) change that improves upon this particular issue. There is no good story for `minimum` applied to any other data type with a strict `min`, though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: closed Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): And a slightly larger patch, which avoids the use of `strictMinimum`, for this is currently tested via Phabricator at Phab:1229. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner: ekmett
Type: bug | Status: closed
Priority: normal | Milestone:
Component: Core Libraries | Version: 7.10.1
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: closed Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj):
Update: I looked into this, and it seems that even if -ddump-simpl looks like this, with a space-leaky way of accumulating the argument, that in the CorePrep stage the evaluation of cmax is pulled before the call to go and alls is well. I did not know that there are still such things going on in that stage.
Suppose we have a function call `f (g x)` and `f` is strict. Then GHC keeps it looking like that so that rewrite rules apply easily. In `CorePrep` GHC just makes the order of evaluation explicit, by moving to {{{ case (g x) of y -> f y }}} There is no new analysis; the call always was call-by-value; but after `CorePrep` that fact is 100% explicit. I believe that the conclusion here is that GHC is behaving perfectly predictably; but library authors need to take a little care with pragmas and rules if they want to get predictable fusion. That's what your email in comment:9 is about, if I read it right. That is, you are not seeking any change to GHC. Correct? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10788: performance regression involving minimum -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: ekmett Type: bug | Status: closed Priority: normal | Milestone: Component: Core Libraries | Version: 7.10.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata):
That is, you are not seeking any change to GHC. Correct?
Correct! I was a bit confused along the way, due to mis-reading the Core, but there is purely a library issue at hand. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10788#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC