[GHC] #8313: Poor performance of higher-order functions with unboxing

#8313: Poor performance of higher-order functions with unboxing -------------------------------------+------------------------------------- Reporter: dolio | Owner: Type: task | Status: new Priority: low | Milestone: _|_ Component: Compiler | Version: 7.6.3 Keywords: slow unboxed | Operating System: Unknown/Multiple higher order | Type of failure: Runtime Architecture: Unknown/Multiple | performance bug Difficulty: Easy (less than 1 | Test Case: hour) | Blocking: Blocked By: | Related Tickets: | -------------------------------------+------------------------------------- I was testing out some code to see how GHC handled some unboxed higher- order functions, and was suprised by the results I found. Here is some sample code: {{{ {-# LANGUAGE BangPatterns #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE MagicHash #-} import GHC.Exts import System.Environment rel# :: Int# -> Int# -> Int# -> Bool rel# i# j# k# = (i# +# j# +# k#) ># 100000000# rel :: Int -> Int -> Int -> Bool rel (I# i#) (I# j#) (I# k#) = rel# i# j# k# manual :: (Int# -> Int# -> Int# -> Bool) -> (Int, Int, Int) manual r# = go 0# 0# 0# where go i# j# k# | r# i# j# k# = (I# i#, I# j#, I# k#) | otherwise = go j# k# (i# +# 1#) {-# NOINLINE manual #-} auto :: (Int -> Int -> Int -> Bool) -> (Int, Int, Int) auto r = go 0 0 0 where go !i !j !k | r i j k = (i, j, k) | otherwise = go j k (i+1) {-# NOINLINE auto #-} main = getArgs >>= \case "manual" : _ -> print $ manual rel# -- This case is significantly slower. "auto" : _ -> print $ auto rel -- Why? }}} A loop that has to box its loop parameters to call a predicate turns out to be significantly faster than one that uses a predicate that takes unboxed values directly. The answer turns out to be (I believe) in ghc/utils/genapply/GenApply.hs. applyTypes has an entry [P,P,P], but only [N]. This means that the manual loop has to use a slower calling convention than the boxed loop. I'm not sure whether this should be 'fixed,' as my encounter with it was experimental in nature, and I don't have a real use case. The comment on applyTypes says its cases cover 99% of uses, and mine is not a real use. This ticket may serve as documentation at least, though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8313 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8313: Poor performance of higher-order functions with unboxing -------------------------------------+------------------------------------- Reporter: dolio | Owner: Type: task | Status: new Priority: low | Milestone: _|_ Component: Compiler | Version: 7.6.3 Resolution: | Keywords: slow unboxed Operating System: Unknown/Multiple | higher order Type of failure: Runtime | Architecture: Unknown/Multiple performance bug | Difficulty: Easy (less than 1 Test Case: | hour) Blocking: | Blocked By: | Related Tickets: -------------------------------------+------------------------------------- Comment (by simonmar): Apply the patch from #6084 and see if that helps. If it's a win, let's commit it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8313#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8313: Poor performance of higher-order functions with unboxing -------------------------------------+------------------------------------- Reporter: dolio | Owner: Type: task | Status: new Priority: low | Milestone: _|_ Component: Compiler | Version: 7.6.3 Resolution: | Keywords: slow unboxed Operating System: Unknown/Multiple | higher order Type of failure: Runtime | Architecture: Unknown/Multiple performance bug | Difficulty: Easy (less than 1 Test Case: | hour) Blocking: | Blocked By: 6084 | Related Tickets: -------------------------------------+------------------------------------- Changes (by simonmar): * blockedby: => 6084 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8313#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8313: Poor performance of higher-order functions with unboxing -------------------------------------+------------------------------------- Reporter: dolio | Owner: Type: task | Status: new Priority: low | Milestone: _|_ Component: Compiler | Version: 7.6.3 Resolution: | Keywords: slow unboxed Operating System: Unknown/Multiple | higher order Type of failure: Runtime | Architecture: Unknown/Multiple performance bug | Difficulty: Easy (less than 1 Test Case: | hour) Blocking: | Blocked By: 6084 | Related Tickets: -------------------------------------+------------------------------------- Comment (by simonmar): The optimisation from #6084 is now committed. Todo: turn this program into a ''performance'' test, so that we can tell if the optimisation stops firing for any reason. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8313#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8313: Poor performance of higher-order functions with unboxing
-------------------------------------+-------------------------------------
Reporter: dolio | Owner:
Type: task | Status: new
Priority: low | Milestone: _|_
Component: Compiler | Version: 7.6.3
Resolution: | Keywords: slow unboxed
Operating System: Unknown/Multiple | higher order
Type of failure: Runtime | Architecture: Unknown/Multiple
performance bug | Difficulty: Easy (less than 1
Test Case: | hour)
Blocking: | Blocked By: 6084
| Related Tickets:
-------------------------------------+-------------------------------------
Comment (by nomeata):
Checking if this is really fixed, but here, `manual` is still slower than
`auto`, so it does not seem to be fixed (although it might have been even
slower before). Also, `manual` allocates much more – is that the symptom
of this problem, or is it something else?
I had slightly change the test due to
[f6e2398adb63f5c35544333268df9c8837fd2581/base] to
{{{#!haskell
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import System.Environment
rel# :: Int# -> Int# -> Int# -> Int#
rel# i# j# k# = (i# +# j# +# k#) ># 100000000#
rel :: Int -> Int -> Int -> Bool
rel (I# i#) (I# j#) (I# k#) = tagToEnum# (rel# i# j# k#)
manual :: (Int# -> Int# -> Int# -> Int#) -> (Int, Int, Int)
manual r# = go 0# 0# 0#
where
go i# j# k# | tagToEnum# (r# i# j# k#) = (I# i#, I# j#, I# k#)
| otherwise = go j# k# (i# +# 1#)
{-# NOINLINE manual #-}
auto :: (Int -> Int -> Int -> Bool) -> (Int, Int, Int)
auto r = go 0 0 0
where
go !i !j !k | r i j k = (i, j, k)
| otherwise = go j k (i+1)
{-# NOINLINE auto #-}
main = getArgs >>= \case
"manual" : _ -> print $ manual rel# -- This case is significantly
slower.
"auto" : _ -> print $ auto rel -- Why?
}}}
and I get these numbers:
{{{
$ ./T8313 manual +RTS -t
(33333333,33333334,33333334)
<

#8313: Poor performance of higher-order functions with unboxing
-------------------------------------+-------------------------------------
Reporter: dolio | Owner:
Type: task | Status: new
Priority: low | Milestone: ⊥
Component: Compiler | Version: 7.6.3
Resolution: | Keywords: slow unboxed
Operating System: | higher order
Unknown/Multiple | Architecture: Unknown/Multiple
Type of failure: Runtime | Difficulty: Easy (less than 1
performance bug | hour)
Test Case: | Blocked By: 6084
Blocking: | Related Tickets:
Differential Revisions: |
-------------------------------------+-------------------------------------
Comment (by thomie):
After compiling the program from comment:4 with `-O2`, `manual` is now 4x
faster.
{{{
$ ghc-7.8.3 -O2 T8313.hs
$ ./T8313 manual +RTS -t
(33333333,33333334,33333334)
<
participants (1)
-
GHC