[GHC] #11701: ghc generates significant slower code

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: efficiency | Operating System: Linux Architecture: x86_64 | Type of failure: Runtime (amd64) | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I've already started a discussion here in SO: http://stackoverflow.com/questions/35941674/latest-ghc-generates-slower- code So again, I realized that the latest version of ghc produces significantly slower code than older version. my default ghc is the latest version as of now: {{{ $ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.10.3 }}} I have also two other old versions installed in my local machine. test code as following {{{#!hs import Data.Word import Data.List import System.Environment collatzNext :: Word32 -> Word32 collatzNext a = (if even a then a else 3*a+1) `div` 2 -- new code collatzLen :: Word32 -> Int collatzLen a0 = lenIterWhile collatzNext (/= 1) a0 lenIterWhile :: (a -> a) -> (a -> Bool) -> a -> Int lenIterWhile next notDone start = len start 0 where len n m = if notDone n then len (next n) (m+1) else m -- End of new code main = do [a0] <- getArgs let max_a0 = (read a0)::Word32 print $ maximum $ map (\a0 -> (collatzLen a0, a0)) [1..max_a0] }}} following are the three runs in my machine: {{{ $ ~/Tools/ghc-7.6.1/bin/ghc -O2 Test.hs [1 of 1] Compiling Main ( Test.hs, Test.o ) Linking Test ... $ time ./Test 1000000 (329,837799) real 0m1.901s user 0m1.896s sys 0m0.000s $ ~/Tools/ghc/bin/ghc -O2 Test.hs [1 of 1] Compiling Main ( Test.hs, Test.o ) Linking Test ... $ time ./Test 1000000 (329,837799) real 0m10.562s user 0m10.528s sys 0m0.036s $ ~/Tools/ghc-7.4.2/bin/ghc -O2 Test.hs [1 of 1] Compiling Main ( Test.hs, Test.o ) Linking Test ... $ time ./Test 1000000 (329,837799) real 0m1.879s user 0m1.876s sys 0m0.000s }}} Obviously we can tell latest version of ghc produces worse code than the older two versions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by HuStmpHrrr): according to the investigation of Zeta in SO, it's because `even` is not inlined. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Wow, indeed 7.10.3 chooses not to inline `even`. I've not looked into why but this is quite surprising. Inlining `even` produces a significant speedup (roughly a factor of five) as one would expect. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Ahh, the problem appears to be that `GHC.Real.even` and `odd` are not marked as `INLINE` but rather are merely specialized to `Integer` and `Int`, whereas you use it at `Word32`. We should specialize these to these other types but it seems to me like they are cheap enough operations to be worth inlining. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1997 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * differential: => Phab:D1997 * milestone: => 8.0.1 Comment: I've opened Phab:D1997 to address this particular issue although I suspect we generally need to audit our use of inlining vs. specialization. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: patch Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1997 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => patch -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: patch Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1997 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): A factor of 5 is an impressively big loss from failing to inline `even`! I think we should investigate why it ''isn't'' being inlined automatically. I tried a tiny case: {{{ module Foo where even :: (Integral a) => a -> Bool even n = n `rem` 2 == 0 module Bar where import Foo f :: Int -> Bool f x = Foo.even x }}} Sure enough, `even` is not inlined. With `-dverbose-core2core -ddump- inlinings` we get {{{ Considering inlining: even arg infos [ValueArg, TrivArg] interesting continuation BoringCtxt some_benefit True is exp: True is work-free: True guidance IF_ARGS [60 0] 240 0 discounted size = 120 ANSWER = NO }}} So we are only getting a tiny discount from the fact that we are giving a completely fixed dictionary to `even`; even though `even`'s body is dominated by dictionary selections that would disappear if we inlined `even`. So maybe we should look at our discounting scheme. See `CoreUnfold.classOpSize` and `ufDictDiscount` in particular. But there's a bit more to it than that. Here's the unfolding for `even`: {{{ Unfolding: (\ @ a ($dIntegral :: Integral a) (eta :: a) -> let { $dReal :: Real a = $p1Integral @ a $dIntegral } in let { $dNum :: Num a = $p1Real @ a $dReal } in == @ a ($p1Ord @ a ($p2Real @ a $dReal)) (rem @ a $dIntegral eta (fromInteger @ a $dNum even2)) (fromInteger @ a $dNum even1)) -} }}} The original argument is `$dIntegral` and only two of the seven dictionary-selection operations are applied to that argument; and only they attract discounts. Food for thought here. This function ''obviously'' should be inlined (when applied to a particular fixed dictionary) but it's not quite clear how to make that happen. Making it INLINEABLE, as the patch does, makes it specialisable which is good. Then it becomes small, and then it gets inlined. That path works quite well. Maybe all overloaded functions, perhaps up to some size limit, should automatically be INLINEABLE. Food for thought Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11701: ghc generates significant slower code
-------------------------------------+-------------------------------------
Reporter: HuStmpHrrr | Owner:
Type: bug | Status: patch
Priority: normal | Milestone: 8.0.1
Component: Compiler | Version: 7.10.3
Resolution: | Keywords: efficiency
Operating System: Linux | Architecture: x86_64
Type of failure: Runtime | (amd64)
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D1997
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#11701: ghc generates significant slower code -------------------------------------+------------------------------------- Reporter: HuStmpHrrr | Owner: Type: bug | Status: closed Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: fixed | Keywords: efficiency Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1997 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => fixed Comment: Merged to `ghc-8.0` as 92465259ae875a2fece5ab37a45e358ba1819d83. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11701#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC