[GHC] #12354: Word foldl' isn't optimized as well as Int foldl'

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- {{{#!hs import Data.List test :: Int -> Int test n = foldl' (+) 0 [1..n] main :: IO () main = do print $ test $ 10^8 }}} GHC optimizes the above code to the point that the garbage collector doesn't even have to do anything: {{{ $ ghc -rtsopts -O2 testInt && ./testInt +RTS -s [1 of 1] Compiling Main ( testInt.hs, testInt.o ) Linking testInt ... 5000000050000000 51,752 bytes allocated in the heap 3,480 bytes copied during GC 44,384 bytes maximum residency (1 sample(s)) 17,056 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s INIT time 0.000s ( 0.000s elapsed) MUT time 0.101s ( 0.101s elapsed) GC time 0.000s ( 0.000s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 0.103s ( 0.102s elapsed) %GC time 0.1% (0.1% elapsed) Alloc rate 511,162 bytes per MUT second Productivity 99.8% of total user, 100.9% of total elapsed }}} However, if I change the type of {{{test}}} to {{{test :: Word -> Word}}}, then a lot of garbage is produced and the code runs 40x slower: {{{ ghc -rtsopts -O2 testWord && ./testWord +RTS -s [1 of 1] Compiling Main ( testWord.hs, testWord.o ) Linking testWord ... 5000000050000000 11,200,051,784 bytes allocated in the heap 1,055,520 bytes copied during GC 44,384 bytes maximum residency (2 sample(s)) 21,152 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 21700 colls, 0 par 0.077s 0.073s 0.0000s 0.0000s Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s INIT time 0.000s ( 0.000s elapsed) MUT time 4.551s ( 4.556s elapsed) GC time 0.077s ( 0.073s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 4.630s ( 4.630s elapsed) %GC time 1.7% (1.6% elapsed) Alloc rate 2,460,957,186 bytes per MUT second Productivity 98.3% of total user, 98.3% of total elapsed }}} I expected the performance to be nearly identical. I'm using GHC version 8.0.1 on x86_64 Arch Linux. I asked about this on stackoverflow, and the issue appears to be related to rewrite rules: [http://stackoverflow.com/a/38113639/6531137] -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): As pointed out by dfeuer on stackexchange, the `Enum` instance for `Int` is better than the one for `Word`: `Int`: {{{ instance Enum Int where {-# INLINE enumFromTo #-} enumFromTo (I# x) (I# y) = eftInt x y {-# RULES "eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y) "eftIntList" [1] eftIntFB (:) [] = eftInt #-} {- Note [How the Enum rules work] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Phase 2: eftInt ---> build . eftIntFB * Phase 1: inline build; eftIntFB (:) --> eftInt * Phase 0: optionally inline eftInt -} {-# NOINLINE [1] eftInt #-} eftInt :: Int# -> Int# -> [Int] -- [x1..x2] eftInt x0 y | isTrue# (x0 ># y) = [] | otherwise = go x0 where go x = I# x : if isTrue# (x ==# y) then [] else go (x +# 1#) {-# INLINE [0] eftIntFB #-} eftIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> r eftIntFB c n x0 y | isTrue# (x0 ># y) = n | otherwise = go x0 where go x = I# x `c` if isTrue# (x ==# y) then n else go (x +# 1#) -- Watch out for y=maxBound; hence ==, not > -- Be very careful not to have more than one "c" -- so that when eftInfFB is inlined we can inline -- whatever is bound to "c" }}} Now `Word` actually uses the implementation for `Integer` {{{ enumFromTo n1 n2 = map integerToWordX [wordToIntegerX n1 .. wordToIntegerX n2] }}} which uses {{{ instance Enum Integer where enumFromTo x lim = enumDeltaToInteger x 1 lim }}} Now `enumDeltaToInteger` has rewrite rules set up, but it turns out that `Word`’s `enumFromTo` is never inlined, so this setup has no chance of fusing here. Inlining this function into my test code causes `fold/build` to fire, cutting down allocation severely, but the conversion from and to `Integer` remains. One could of course write similar hand-written code such as for `Int` also for `Word`. But what about `Word8`, `Word16`, `Word32` and `Word64` then? Where does it stop? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
it turns out that Word’s enumFromTo is never inlined, so this setup has no chance of fusing here
So the short term solution is to add an INLINE for Word's `enumFromTo`?
Where does it stop?
Well you might hope that you could write `eftInt`, `eftIntFB` once, at type `a` and SPECIALISE them. That would save copying them manually. Writing rewrite rules that also specialise is something I have not thought much about though. I suppose that, conceivably, all the fusion could happen generically (i.e. on the class-overloaded functions), before we specialise to the particular type. To achieve that, we'd have to delay the class-op selection e.g. `eunmFrom dEnumInt` --> `enumFrom_Int`. But that ought to be possible. A good project here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I think it's reasonable to expect that `Word` and `Int` optimize roughly as well. I might even go so far to say that the same should be expected of the narrower `Word` and `Int` variants. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
Well you might hope that you could write `eftInt`, `eftIntFB` once, at type `a` and SPECIALISE them. That would save copying them manually.
Indeed, unfortunately this would require that `eftInt` be runtime- representationally polymorphic since `Word :: RuntimeRep WordRep` and `Int :: RuntimeRep IntRep`. This is a rather unfortunate limitation since we would like to ensure that the polymorphism is resolved at compile-time via inlining. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I think the generic versions could work with boxed values and we could fuse there. After fusion the unboxed versions could kick in. Short term: just replicate Int stuff for Word. Medium term: let's hope someone takes up the challenge. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2376 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => patch * differential: => Phab:D2376 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl'
-------------------------------------+-------------------------------------
Reporter: kjslag | Owner:
Type: bug | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2376
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2376 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Merged to `master`. Seems like this would be an easy thing to merge for 8.0.2. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: merge Priority: normal | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2376 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => merge * milestone: => 8.0.2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12354: Word foldl' isn't optimized as well as Int foldl'
-------------------------------------+-------------------------------------
Reporter: kjslag | Owner:
Type: bug | Status: merge
Priority: normal | Milestone: 8.0.2
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2376
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#12354: Word foldl' isn't optimized as well as Int foldl' -------------------------------------+------------------------------------- Reporter: kjslag | Owner: Type: bug | Status: closed Priority: normal | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2376 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: merge => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12354#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC