[GHC] #12781: Significantly higher allocation with INLINE vs NOINLINE

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: | Owner: MikolajKonarski | Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Linux Architecture: x86_64 | Type of failure: Runtime (amd64) | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The following program allocates much more memory ("bytes allocated in the heap") in its current form than with INLINE replaced by NOINLINE, as in the comment. That may be normal, but virtually every feature of this example is needed to trigger the behaviour and so I have no clear idea which language feature is responsible and should be avoided and, consequently, how to optimize my much more complex codebase (in which INLINES of once used functions tend to help and sometimes significantly so). As a comment to this ticket I will post another example, where the difference in allocation is pumped to a factor of 2000, as a proof that the issue is serious. {{{ {-# LANGUAGE RankNTypes #-} import Control.Monad.ST.Strict import qualified Data.IntMap.Strict as IM -- ghc --make -O1 InlineBloat.hs; ./InlineBloat +RTS -s data P = P Int instance Enum P where fromEnum (P x) = x toEnum n = undefined main = do let {-# NOINLINE z #-} z = 44 dis :: Int -> () {-# INLINE dis #-} -- change here to NOINLINE and observe lower alloc dis pI = let p0 = let (_, x) = pI `quotRem` z in P x p1 = let (_, x) = pI `quotRem` z in P x m = IM.lookup (fromEnum p0) IM.empty b = IM.member (fromEnum p1) IM.empty in m == Just 'c' `seq` b `seq` () {-# NOINLINE l #-} l = [0..10000000] mapVT :: forall s. () -> ST s () {-# INLINE mapVT #-} mapVT _ = mapM_ (\x -> return $! dis x) l return $! runST (mapVT ()) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Here is the example where the code with INLINE allocates 2000 times more heap memory that with NOINLINE: {{{ {-# LANGUAGE BangPatterns, RankNTypes #-} import Control.Monad.ST.Strict import qualified Data.IntMap.Strict as IM import Data.List import Data.Maybe -- ghc --make -O1 InlineBloat.hs; ./InlineBloat +RTS -s data P = P !Int instance Enum P where fromEnum (P x) = x toEnum n = undefined main = do let {-# NOINLINE z #-} z = 44 dis :: Int -> () {-# INLINE dis #-} -- change here to NOINLINE -- -- with INLINE: -- 384,409,080 bytes allocated in the heap -- with NOINLINE: -- 169,080 bytes allocated in the heap dis pI = let p0 = let (_, x) = pI `quotRem` z in P x p1 = let (y, _) = pI `quotRem` z in P y !_ = isJust $ IM.lookup (fromEnum p0) IM.empty !_ = isJust $ IM.lookup (fromEnum p1) IM.empty in () {-# NOINLINE l #-} l = [0..1600] mapVT :: forall s. Int -> ST s () {-# INLINE mapVT #-} mapVT _ = mapM_ (\x -> return $! dis x) l !runRes = foldl' (\() n -> runST (mapVT n)) () [1..10000] return () }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * related: => #12747 #12781 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * related: #12747 #12781 => #12747 #12603 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): #5775 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * related: #12747 #12603 => #12747 #12603 #5775 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): #5775 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: maurerl@…, pdownen@…, ariola@… (added) Comment: Interesting. I know exactly what is going on. With `NOINLINE dis` we get {{{ dis = \p. let $w$j ww = blah in case p of -1# -> $w$j (...) DEFAULT -> $w$j (..) go xs z = case xs of [] -> ... (y:ys) -> case dis y of () -> go ys z }}} Note that `$w$j` is a join point, so incurs no allocation cost. There is just one call to `dis` in the body of `go`. So without the `NOINLINE dis`, the function `dis` inlines in the body of `go`. But then we get something like {{{ go xs z = case xs of [] -> ... (y:ys) -> let $w$j ww = blah in case p of -1# -> case ($w$j (...)) of () -> go ys z DEFAULT -> case ($w$j (...)) of () -> go yz z }}} Yikes! `$w$j` is no longer a join point, so allocation goes up. Six months ago I would have felt sad. But today I feel happy. In our paper [https://www.microsoft.com/en-us/research/publication/compiling- without-continuations/ Compiling without continuations], we explain how to ensure that join points are never destroyed. Moreover, Luke Maurer is well advanced on a full implementation in GHC. Luke, how is it going? When will it land in GHC? We need this! This ticket would make a lovely example to add to the paper, because it's a true "from the wild" example. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): #5775 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => JoinPoints -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): #5775 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by lukemaurer): Confirmed—with join points, `INLINE` vs. `NOINLINE` makes no difference. And indeed the key is that we move the case into `$w$j`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): #5775 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): I've tried both examples with 8.2.1-rc1 and indeed INLINE vs. NOINLINE makes no difference. My original big program is 10% faster than with 8.0.1 (I guess most of that is due to GHC, but some may also be due to newer containers, etc.). Congratulations! Closing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12781: Significantly higher allocation with INLINE vs NOINLINE -------------------------------------+------------------------------------- Reporter: MikolajKonarski | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: JoinPoints Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12603 | Differential Rev(s): #5775 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12781#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC