[GHC] #8662: GHC does not inline cheap inner loop when used in two places

#8662: GHC does not inline cheap inner loop when used in two places ------------------------------+-------------------------------------------- Reporter: nh2 | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime performance bug Unknown/Multiple | Test Case: Difficulty: Unknown | Blocking: Blocked By: | Related Tickets: | ------------------------------+-------------------------------------------- When I made a Criterion benchmark of Neil Michell's "Tight Inner Loop" blog post (http://neilmitchell.blogspot.co.uk/2014/01/optimising-haskell- for-tight-inner-loop.html), I noticed that GHC 7.6.3 will not inline the performance-critical function when it's called from two different places (inside the same module). {{{ break0_2pleaseinline :: (Char -> Bool) -> ByteString0 -> (ByteString, ByteString0) break0_2pleaseinline f (BS0 bs) = (BS.unsafeTake i bs, BS0 $ BS.unsafeDrop i bs) where i = Internal.inlinePerformIO $ BS.unsafeUseAsCString bs $ \ptr -> do let start = castPtr ptr :: Ptr Word8 let end = go start return $! Ptr end `minusPtr` start go s@(Ptr a) | c == '\0' || f c = a | otherwise = go $ inc s where c = chr s versionInl1 :: ByteString0 -> (ByteString, ByteString0) versionInl1 str = break0_2pleaseinline test str where test x = x <= '$' && (x == ' ' || x == '\r' || x == '\n' || x == '$') versionInl2 :: ByteString0 -> (ByteString, ByteString0) versionInl2 str = break0_2pleaseinline test str where test x = x <= '$' && (x == ' ' || x == '\r' || x == '\n' || x == '$') }}} Full code here: https://github.com/nh2/inner-loop- benchmarks/blob/6715d1e9946d6b5e6d9bb53203982ed3d2ed32ff/Bench.hs#L166. {{{break0_2pleaseinline}}} does not get inlined, which makes {{{versionInl1}}} and {{{versionInl2}}} over 4 times slower than when inlined. The inlining doens't happen, not even with {{{-O2}}} and {{{-O3}}}, only an {{{INLINE}}} pragma will move GHC to do it. If I was a compiler, I would so inline that function! I am surprised that GHC doesn't decide to do so. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8662 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8662: GHC does not inline cheap inner loop when used in two places --------------------------------------------+------------------------------ Reporter: nh2 | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by schyler): Maybe you just need to tune the numbers higher. Will `-funfolding-use-threshold=16` do it? Maybe this is relevant to #1371 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8662#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8662: GHC does not inline cheap inner loop when used in two places --------------------------------------------+------------------------------ Reporter: nh2 | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by nh2): Why should it not do this automatically? It is a pretty cheap function (from my human-not-compiler perspective). Why do you need to set such a big threshold to get it inlined? Is there a page that describes how these thresholds work? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8662#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8662: GHC does not inline cheap inner loop when used in two places --------------------------------------------+------------------------------ Reporter: nh2 | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Comment (by simonpj): GHC decides when to inline like this: * It computes the "size" of the function * At a call site, it takes the size of the function, subtracts a call- site "discount", and if the result is less than the "unfolding threshold" it does the inlining. * The discount is increased if both (a) an actual argument has some structure (eg is a constructor application) and (b) that argument is scrutinised by a case expression in the function body. * There is also a "result discount" if (a) the function call is consumed by a `case`, and (b) the function body returns a constructor application All this is computed in module `CoreUnfold`, so it is nicely separated from the rest of the compiler. Do by all means have a go at changing the computation of size or discount. Then measure (a) the change in code size and complication time vs (b) the change in allocation and/or run-time. If you can reduce (b) without increasing (a) you are winning! But it's easy to mess up. I have spent many happy hours fiddling with these heuristics. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8662#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8662: GHC does not inline cheap inner loop when used in two places -------------------------------------+------------------------------------- Reporter: nh2 | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * keywords: => Inlining -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8662#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC