[GHC] #8766: length [Integer] is twice as slow but length [Int] is 10 times faster

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster ------------------------------+-------------------------------------------- Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime performance bug Unknown/Multiple | Test Case: Difficulty: Unknown | Blocking: Blocked By: | Related Tickets: | ------------------------------+-------------------------------------------- Compared to 7.6.3 length in 7.8.1-rc1 has a performance regression: {{{ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.8.20140130 bash-3.2$ ghc -O2 LengthIntegerList.hs [1 of 1] Compiling Main ( LengthIntegerList.hs, LengthIntegerList.o ) Linking LengthIntegerList ... bash-3.2$ time ./LengthIntegerList 1073741824 real 0m45.344s user 0m44.230s sys 0m0.494s bash-3.2$ /usr/bin/ghc --version The Glorious Glasgow Haskell Compilation System, version 7.6.3 bash-3.2$ /usr/bin/ghc -O2 LengthIntegerList.hs [1 of 1] Compiling Main ( LengthIntegerList.hs, LengthIntegerList.o ) Linking LengthIntegerList ... bash-3.2$ time ./LengthIntegerList 1073741824 real 0m22.769s user 0m22.042s sys 0m0.385s bash-3.2$ cat LengthIntegerList.hs {-# OPTIONS_GHC -Wall #-} module Main where main :: IO() main = print $ length [1..(2^(30::Int)::Integer)] }}} thus length of [Integer] is twice as slow in rc1 but length of [Int] is 10 times faster: {{{ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.8.20140130 bash-3.2$ ghc -O2 LengthIntList.hs [1 of 1] Compiling Main ( LengthIntList.hs, LengthIntList.o ) Linking LengthIntList ... bash-3.2$ time ./LengthIntList 1073741824 real 0m0.723s user 0m0.693s sys 0m0.003s bash-3.2$ /usr/bin/ghc --version The Glorious Glasgow Haskell Compilation System, version 7.6.3 bash-3.2$ /usr/bin/ghc -O2 LengthIntList.hs [1 of 1] Compiling Main ( LengthIntList.hs, LengthIntList.o ) Linking LengthIntList ... bash-3.2$ time ./LengthIntList 1073741824 real 0m11.805s user 0m10.900s sys 0m0.351s bash-3.2$ cat LengthIntList.hs {-# OPTIONS_GHC -Wall #-} module Main where main :: IO() main = print $ length [1..(2^(30::Int)::Int)] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by hvr): ...here's what simplified Core of {{{#!hs intlen :: Int intlen = length [1..(2^(30::Int))::Int] }}} looks like for GHC 7.6.3: {{{ intlen = case $wf 2 30 of ww { __DEFAULT -> case $wlen (eftInt 1 ww) 0 of ww1 { __DEFAULT -> I# ww1 } } }}} vs. GHC 7.8.20140130: {{{ intlen = case $wf 2 30 of ww4 { __DEFAULT -> case tagToEnum# (># 1 ww4) of _ { False -> letrec { $wgo $wgo = \ w w1 -> case tagToEnum# (==# w ww4) of _ { False -> $wgo (+# w 1) (+# w1 1); True -> +# w1 1 }; } in case $wgo 1 0 of ww { __DEFAULT -> I# ww }; True -> I# 0 } } }}} So the significant runtime reduction for the `Int`-case is seemingly due to proper inlining of `eftInt` and thus turning a `length [a..b]` into a tighter counting-loop w/o any `[Int]` involved anymore. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by nomeata): The improvement for `Int` is most likely from 82f56e5 (#876). But I guess the important point of the ticket is the regression for `Integer`. Maybe it is because of #8638? There might now be many checks whether the value fits in a `S#` that were not done before. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by nomeata): Nevermind, 82f56e5/integer-gmp did not change addition, so it should not be the problem here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by hvr): Replying to [comment:2 nomeata]:
The improvement for `Int` is most likely from 82f56e5 (#876). But I guess the important point of the ticket is the regression for `Integer`.
I was getting to that (needed to verify something first though): The Core for {{{#!hs intlen = length [1..(2^(30::Int))::Integer] }}} Otoh, looks differs as following: GHC 7.6.3: {{{ intlen1 = enumDeltaToInteger intlen3 intlen3 intlen2 }}} vs. GHC 7.8.20140130: {{{ intlen1 = enumDeltaToIntegerFB (incLen) I# intlen3 intlen3 intlen2 }}} The use of `enumDeltaToIntegerFB` instead of `enumDeltaToInteger` accounts for the speed difference. I've verified that by copying the definition from `GHC.Enum` into the test-code and using `enumDeltaToInteger` directly with GHC 7.8.20140130 which resulted in a slightly better runtime than for GHC 7.6.3 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by George): Right, the important point of the ticket is the regression for [Integer] but implicit in mentioning the order of magnitude improvement for [Int] is the question of why [Integer] is, even after this bug is fixed, an order of magnitude slower than [Int] and would it be possible to make the time for [Integer] much closer to [Int]. After all length only traverses the spine of the list so why the big difference? This is probably obvious to many but not to me and I'd like to know why. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by nomeata): hvr, your analysis is a bit misleading: In GHC 7.6.3 we have {{{#!haskell intlen1 :: [Integer] intlen1 = enumDeltaToInteger intlen4 intlen4 intlen2 intlen :: Int intlen = case $wlen @ Integer intlen1 0 of ww { __DEFAULT -> I# ww } }}} and in 7.9 we have {{{#!haskell intlen1 :: Int# -> Int intlen1 = enumDeltaToIntegerFB @ (Int# -> Int) (incLen @ Integer) I# intlen4 intlen4 intlen2 intlen :: Int intlen = intlen1 0 }}} Note how the `intlen1` do not directly correspond to each other. So we have a case of successful list fusion that does ''not'' speed up the program. The two `enumDeltaToInteger...` functions (at the end of source:base/GHC/Enum.lhs) are almost the same; both call auxiliary functions, the only difference is whether `:` is used, or an explicitly passed `c`. I believe the problem is that we are using `enumDeltaToIntegerFB` in a higher order way (note the `Int# -> Int`), which allocates partial function applications – basically the same that happens when `foldl` is implemented with `foldr` (#7994). The fix for that would to make sure that stuff is inlined far enough for the `go` from `up_fb` can be visible (as it is for `Int` in comment:1). It could be related to this comment with `Int` (or not, because inlining does not even go that far): {{{#!haskell {-# INLINE [0] eftIntFB #-} eftIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> r eftIntFB c n x0 y | isTrue# (x0 ># y) = n | otherwise = go x0 where go x = I# x `c` if isTrue# (x ==# y) then n else go (x +# 1#) -- Watch out for y=maxBound; hence ==, not > -- Be very careful not to have more than one "c" -- so that when eftInfFB is inlined we can inline -- whatever is bound to "c" }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by nomeata): Ok, with this change (i.e. a special `eftIntegerFB`) I get good results: Run time decreases from 16.3s in 7.6.3 to 11.7s in GHC HEAD. But if I use `-fno-call-arity` (Call arity analysis is not in 7.8), I still get 23s, which is still a regression over 16s, but better than the 30 I got with the unmodified library code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by nomeata): The same is achieved with a much simpler change, namely the rule {{{ "enumDeltaToInteger1" [0] forall c n x . enumDeltaToIntegerFB c n x 1 = up_fb c n x 1 }}} now I get 11.7s with `-fcall-arity` and 19.7s without, which is quite close the original 11.7s, and non-invasive enough to go into 7.8, I’d say. I’ll prepare a patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by nomeata): * owner: => nomeata * milestone: => 7.8.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster
--------------------------------------------+------------------------------
Reporter: George | Owner: nomeata
Type: bug | Status: new
Priority: normal | Milestone: 7.8.1
Component: Compiler | Version: 7.8.1-rc1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime performance bug | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: | Blocked By:
| Related Tickets:
--------------------------------------------+------------------------------
Comment (by Joachim Breitner

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster
--------------------------------------------+------------------------------
Reporter: George | Owner: nomeata
Type: bug | Status: new
Priority: normal | Milestone: 7.8.1
Component: Compiler | Version: 7.8.1-rc1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime performance bug | Unknown/Multiple
Test Case: | Difficulty: Unknown
Blocking: | Blocked By:
| Related Tickets:
--------------------------------------------+------------------------------
Comment (by Joachim Breitner

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: nomeata Type: bug | Status: merge Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: T8755 | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by nomeata): * status: new => merge * testcase: => T8755 Comment: Pushed, please merge to 7.8. If you also merge the test case (the first performance test in `libraries/base/tests` – I hope there was no good reason why there weren’t any before) you probably have to adjust the numbers, unless we also merge the call-arity analysis (which I don’t think has seen enough testing to go into a stable release). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: nomeata Type: bug | Status: closed Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.8.1-rc1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: T8755 | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by thoughtpolice): * status: merge => closed * resolution: => fixed Comment: Merged. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.8.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: T8755 | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by nomeata): * owner: nomeata => * status: closed => new * resolution: fixed => Comment: Hmm, this is not fully fixed, it seems. Consider {{{ f :: Integer -> Integer f n = n {-# NOINLINE f #-} main :: IO () main = sum (map f [0 .. 10000]) `seq` return () }}} Here, the rule {{{ {-# RULES "enumDeltaToInteger1" [0] forall c n x . enumDeltaToIntegerFB c n x 1 = up_fb c n x 1 #-} }}} does not fire and we end up with {{{ main3 :: Integer -> Integer main3 = enumDeltaToIntegerFB @ (Integer -> Integer) main6 (id @ Integer) main2 main5 main4 main5 :: Integer main5 = __integer 1 }}} and that despite that we have this code in `Phase = 0`: {{{ enumDeltaToIntegerFB @ (Integer -> Integer) (\ (x :: Integer) (ys [OS=OneShot] :: Integer -> Integer) -> let { ds [OS=ProbOneShot] :: Integer ds = f x } in \ (ds2 :: Integer) -> ys (plusInteger ds2 ds)) (id @ Integer) (__integer 0) (__integer 1) (__integer 10000) (__integer 0) }}} The rule engine is still a bit of a mystery to me. Why does the rule not match reliably here? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8766: length [Integer] is twice as slow but length [Int] is 10 times faster --------------------------------------------+------------------------------ Reporter: George | Owner: Type: bug | Status: closed Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.8.1-rc1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: T8755 | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by nomeata): * status: new => closed * resolution: => fixed Comment: Darn, ignore me. Used the binary from the wrong working copy. That’s what happens when I start work in `ghc-master/` that I don’t get to push right away, and `ghc/` becomes my working copy that tracks master... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8766#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC