[GHC] #13001: EnumFromThenTo is is not a good producer

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: MacOS X Architecture: x86_64 | Type of failure: Runtime (amd64) | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- {{{#!hs {-# OPTIONS_GHC -Wall #-} module Foo where testFromTo :: Int -> Int testFromTo n = length ([0..(10^n)] :: [Int]) testFromThenTo :: Int -> Int testFromThenTo n = length ([0,2..(10^n)] :: [Int]) }}} {{{ $ ghci -fobject-code -O GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help Prelude> :load Foo :load Foo [1 of 1] Compiling Foo ( Foo.hs, Foo.o ) Ok, modules loaded: Foo (Foo.o). Prelude Foo> :set +s :set +s Prelude Foo> testFromTo 5 testFromTo 5 100001 (0.02 secs, 97,992 bytes) Prelude Foo> testFromThenTo 5 testFromThenTo 5 50001 (0.00 secs, 5,694,424 bytes) Prelude Foo> testFromThenTo 6 testFromThenTo 6 500001 (0.02 secs, 56,095,288 bytes) Prelude Foo> }}} I set the Type to bug rather than feature request as the source code in http://hackage.haskell.org/package/base-4.9.0.0/docs/src/GHC.Enum.html#enumF... seems to be trying to do list fusion: {{{#!hs -- efdInt and efdtInt deal with [a,b..] and [a,b..c]. -- The code is more complicated because of worries about Int overflow. -- See Note [How the Enum rules work] {-# RULES "efdtInt" [~1] forall x1 x2 y. efdtInt x1 x2 y = build (\ c n -> efdtIntFB c n x1 x2 y) "efdtIntUpList" [1] efdtIntFB (:) [] = efdtInt #-} }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by akio): * cc: akio (added) Comment: I have a fix to this problem locally, but I've never got around to submitting a patch. I believe the problem is that `efdtIntFB` is not marked `INLINE [0]`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by akio): I tested a [https://github.com/takano- akio/ghc/commit/930d42755e6753ba695157b96b8d651263b1934c a patch] and indeed marking `*FB` functions `INLINE[0]` seems to help. However this change increases binary sizes of some programs in nofib: {{{ -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- gen_regexps -0.3% 0.0% 0.000 0.000 0.0% puzzle +0.8% 0.0% 0.089 0.090 0.0% reptile +0.8% -0.0% 0.008 0.008 0.0% -------------------------------------------------------------------------------- Min -0.3% -0.0% -7.3% -7.1% 0.0% Max +0.8% +0.0% +7.8% +7.7% +1.8% Geometric Mean +0.0% -0.0% +0.2% +0.2% +0.0% }}} I have only looked at the code for `puzzle` so far. The increase in the code size seems to come from inlining `efdtInt*FB` into a derived `Enum` instance declaration. I wonder if this change is acceptable. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
I wonder if this change is acceptable.
I’d say yes. The mean changes by <0.1%, and we really want fusion to be as reliable as possible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): It's true that doing more inlining will generally improve things, but we could be a bit more forensic about this. Just sprinkling more inline pragmas, some of which may do no good, is a bit of a blunt instrument. I checked the code for `testFromThenTo`, compiled with -O, and got {{{ Foo.$wtestFromThenTo = \ (ww_s2vE :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<# ww_s2vE 0#) of { False -> case ww_s2vE of wild2_a2uk { __DEFAULT -> case GHC.Real.$wf1 10# wild2_a2uk of ww4_a2uE { __DEFAULT -> GHC.Enum.efdtIntUpFB @ (Int -> Int) (GHC.List.lengthFB @ Int) (id @ Int) 0# 2# ww4_a2uE Foo.testFromThenTo2 }; 0# -> Foo.testFromThenTo1 }; True -> GHC.Real.^2 } }}} Fusion has happened (there is a use of the `foldr/build` rule), but the call to `edftIntUpFB` remains. That's not necessarily bad. As you'll see, it's not a tiny function: {{{ efdtIntUpFB :: (Int -> r -> r) -> r -> Int# -> Int# -> Int# -> r efdtIntUpFB c n x1 x2 y -- Be careful about overflow! | isTrue# (y <# x2) = if isTrue# (y <# x1) then n else I# x1 `c` n | otherwise = -- Common case: x1 <= x2 <= y let !delta = x2 -# x1 -- >= 0 !y' = y -# delta -- x1 <= y' <= y; hence y' is representable -- Invariant: x <= y -- Note that: z <= y' => z + delta won't overflow -- so we are guaranteed not to overflow if/when we recurse go_up x | isTrue# (x ># y') = I# x `c` n | otherwise = I# x `c` go_up (x +# delta) in I# x1 `c` go_up x2 }}} BUT the bad thing is that it allocated loads of `Int` values! Why does it do that? Becuase the function passed to it (`c` in the above definition) takes an `Int` as its argument. So if `efdtIntUpFB` isn't inlined, it ''must'' allocate an `Int` box for every iteration. Bad! But this is an internal function. Suppose we gave it this signature: {{{ efdtIntUpFB :: (Int# -> r -> r) -> r -> Int# -> Int# -> Int# -> r -- ^^^ NB Int# not Int }}} Now it won't allocate! Can we make the rest of the pieces fit together now? Would you like to have a try? I also looked at the call to `lengthFB` in the above optimised code. It's defined like this: {{{ lengthFB :: x -> (Int -> Int) -> Int -> Int lengthFB _ r = \ !a -> r (a + 1) }}} Uh-ho! More `Int` boxes! So I tried rewriting the `length` moving parts (in `GHC.List`) like this: {{{ {-# INLINE xlength #-} xlength :: [a] -> Int xlength xs = I# (xlenAcc xs 0#) xlenAcc :: [a] -> Int# -> Int# xlenAcc [] n = n xlenAcc (_:ys) n = xlenAcc ys (n +# 1#) {-# RULES "xlength" [~1] forall xs . xlenAcc xs = foldr xlengthFB idXlength xs "xlengthList" [1] foldr xlengthFB idXlength = xlenAcc #-} -- The lambda form turns out to be necessary to make this inline -- when we need it to and give good performance. {-# INLINE [0] xlengthFB #-} xlengthFB :: x -> (Int# -> Int#) -> Int# -> Int# xlengthFB _ r = \ a -> r (a +# 1#) {-# INLINE [0] idXlength #-} idXlength :: Int# -> Int# idXlength x = x }}} That compiles fine. Even if it generates the same code as before, GHC will have to do less work to optimise it, so it's a win. Would you like to try that change to `length` and see if it is an improvement? Maybe you can do the same for `efdtIntUpFB`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by akio): Hmm, you could eliminate allocation of `Int`s that way, but the code still has to allocate one function closure for each element in the list, right? It seems to me that inlining `efdtIntUpFB` is the only way to eliminate this allocation, if the final list consumer is `length` or `foldl'`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I think perhaps you mean that in the call {{{ I# x `c` go_up (x +# delta) }}} since GHC doesn't know `c` is strict, it'll allocate a thunk for the second argument. Quite true. And that does mean it'd be good to inline `efdtIntUpFB`, I agree. But still, fewer Int boxes is better regardless. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by akio):
since GHC doesn't know c is strict, it'll allocate a thunk for the second argument.
This is true. I was actually referring to another source of allocation: since the call to `c` is under-saturated, evaluating this call involves allocating a closure for the partial-application of `c`. This is also the case when the final consumer is `foldl'` instead of `length`. I think this is a general argument for marking *all* `FB` functions `INLINE [0]`. If `p = build pFB` is advertised as a good producer, a user would expect `foldl' f z p` to allocate no intermediate data structure. However, if `pFB` is not inlined, this expression will necessarily allocate a few closures for each eliminated cons cell. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I think that's a reasonable point. If you follow it up, could you include a Note to explain the reasoning? And refer to the Note from the relevant INLINE pragmas? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: akio Type: bug | Status: new Priority: low | Milestone: Component: libraries/base | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2951 Wiki Page: | -------------------------------------+------------------------------------- Changes (by akio): * owner: => akio * component: Compiler => libraries/base * differential: => Phab:D2951 * os: MacOS X => Unknown/Multiple -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer
-------------------------------------+-------------------------------------
Reporter: George | Owner: akio
Type: bug | Status: new
Priority: low | Milestone:
Component: libraries/base | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2951
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: akio Type: bug | Status: new Priority: low | Milestone: 8.2.1 Component: libraries/base | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2951 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: => 8.2.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: akio Type: bug | Status: new Priority: low | Milestone: 8.4.1 Component: libraries/base | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2951 Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): although this is low priority, it looks like it is done. Is there any reason not to put it into 8.4.1? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13001: EnumFromThenTo is is not a good producer -------------------------------------+------------------------------------- Reporter: George | Owner: akio Type: bug | Status: closed Priority: low | Milestone: 8.4.1 Component: libraries/base | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2951 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed Comment: Indeed it's already in 8.4.1; it looks like the ticket was simply never closed. Thanks George! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13001#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC