[GHC] #14211: Compiler is unable to INLINE as well as the programmer can manually

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The test case is in this repo on the `inlining-issue` branch: https://github.com/harendra-kumar/ghc-perf/tree/inlining-issue. Performance with manually inlining a function is more than 10% faster compared to factoring out code and using INLINE pragma. `stack bench` for compiler inlined code {{{ time 46.71 ms (45.53 ms .. 47.79 ms) }}} `stack bench --flag ghc-perf:manual` for manually inlined code {{{ time 39.46 ms (38.92 ms .. 39.94 ms) }}} Here is the relevant code: {{{#!hs {-# INLINE bindWith #-} bindWith :: (forall c. AsyncT m c -> AsyncT m c -> AsyncT m c) -> AsyncT m a -> (a -> AsyncT m b) -> AsyncT m b bindWith k (AsyncT m) f = AsyncT $ \_ stp yld -> let run x = (runAsyncT x) Nothing stp yld yield a _ Nothing = run $ f a yield a _ (Just r) = run $ f a `k` (bindWith k r f) in m Nothing stp yield instance Monad m => Monad (AsyncT m) where return a = AsyncT $ \ctx _ yld -> yld a ctx Nothing #ifdef MANUAL_INLINE AsyncT m >>= f = AsyncT $ \_ stp yld -> let run x = (runAsyncT x) Nothing stp yld yield a _ Nothing = run $ f a yield a _ (Just r) = run $ f a <> (r >>= f) in m Nothing stp yield #else (>>=) = bindWith (<>) #endif }}} I have seen this many times. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * keywords: => StaticArgumentTransformation, Inlining Comment: `bindWith` is a self-recursive function so GHC won't inline it by itself. If you rewrite `bindWith` to {{{ bindWith :: (forall c. AsyncT m c -> AsyncT m c -> AsyncT m c) -> AsyncT m a -> (a -> AsyncT m b) -> AsyncT m b bindWith k m f = go m where go (AsyncT m) = AsyncT $ \_ stp yld -> let run x = (runAsyncT x) Nothing stp yld yield a _ Nothing = run $ f a yield a _ (Just r) = run $ f a `k` (go r) in m Nothing stp yield }}} so it delegates the recursion to the `go` function then it seems like the performance is the same. This is the transformation the static argument transformation is meant to achieve (turned on by `-fstatic-argument-transformation`). It didn't seem to work just by turning on this flag but I don't have time to investigate more now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by harendra): That information is a gem for optimization! I need to change how I code recursive functions. Every programmer must know that a static argument impacts performance significantly and can be easily optimized. It not only recovered the lost performance but gave a lot more because the manually inlined bind function also has a static argument which got removed in this version. I think the problem is not really inlining. GHC won't be able to inline the recursive call of a function to itself, of course, but it can still inline the whole function. So `bindWith` is likely getting inlined, I can see that with `-ddump-inlinings`. I guess, the problem is that we have added one more static argument which degrades the performance. The compiler is able to inline the function but it cannot undo that static argument. I see that the INLINE section in the GHC manual mentions mutually recursive and recursive functions, however it does not mention `-fstatic- argument-transformation`. It may be useful to mention because the programmer naively expects that inlining will give us equivalent code but in reality the added static argument is an overhead that cannot be reverted by the compiler. I have a few questions, suggestions: 1) The INLINE section of the manual should mention `-fstatic-argument- transformation` and how factoring out of code can add a static argument that cannot be undone by the compiler unless we use this option. Also, an example of how the programmer can manually perform static argument transformation and achieve what (s)he wants in case this option does not work. 2) Can we have a warning option which warns the programmer about missed inlining? If I have placed an INLINE pragma and the compiler is unable to inline the function for whatever reason I want to know. The `-ddump- inlinings` shows what got inlined but what did not get inlined will be more useful information. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): There is a warning in core lint which warns about `INLINE` pragmas on loops breakers but I don't know why it is not on by default. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): I looked at this issue all of this afternoon and don't feel any closer to understanding what is going on but have some diagnosis. Manually performing SAT makes the program much faster. The SAT pass itself does nothing to the definition when there is an `INLINE` pragma on the definition. Removing the `INLINE` pragma causes SAT is happen but makes the program much slower as it is not inlined. Adding `-fexpose-all- unfoldings` to the defining module again makes the program much faster. I also noticed some interaction with SAT and inline pragmas, obviously the unsatted definition is included as the unfolding when an INLINE pragma is present even if it is a loop-breaker. SAT has the effect of changing a definition from a loop-breaker into an inlinable function but because we only export one unfolding this is then not usable across modules. All very unsatisfactory. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by harendra): {{{ -dcore-lint Turn on heavyweight intra-pass sanity-checking within GHC, at Core level. (It checks GHC’s sanity, not yours.) }}} Someone who is not working on GHC would not even look at such flags and rightly so. We need a flag for regular programmers helping them in the optimization process, something like `-Wmissed-specialisations` or does that flag cover inlining as well? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by harendra): Thanks Matthew for looking into this. It looks like as of now inline and SAT are exclusive to each other. What's the solution to this - we either need to inline the SATed definition or can we do SAT again after inlining? I observed that my package had several SAT opportunities that I fixed manually but `-fstatic-argument-transformation` did not make any difference, it could be because of this same reason perhaps. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I'm a bit lost in all this. Is there a small reproducible test case I can look at? What exactly is being compared with what? I'm also lost with the SAT discussion. I think SAT is off by default; and it usually applies to recursive definitions, which in turn usually do not have ININE pragmas. Sorry to be dim. Maybe start from zero and explain one bit at a time? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14211: Compiler is unable to INLINE as well as the programmer can manually -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: | StaticArgumentTransformation, | Inlining Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): The reproducible example is in this github repo https://github.com /harendra-kumar/ghc-perf/tree/inlining-issue in the `inlining-issue` branch. You can compile it with `cabal new-build all` and then run the benchmark. The interesting point is the definition of `bindWith` which is a self- recursive function with 2 static arguments. Harendra observes that by "manually inlining" it, ie replacing the arguments with the statically known values produces much faster code than GHC produces on its own. I then observe that 1. Removing the `INLINE` pragma and turning on `-fstatic-argument- transformation` has the same effect. 2. The `INLINE` (or `INLINABLE`) pragma inhibits `-fstatic-argument- transformation` by creating a mutually recursive group (see https://mail.haskell.org/pipermail/ghc-devs/2017-September/014672.html). 3. However, there is no way to make sure the optimised satted unfolding for `bindWith` is exposed so it can be inlined across modules (hence the comment about `-fexpose-all-unfoldings`). Harendra then has two questions about how this information should be communicated to users. 1. Whether the `INLINE` section should mention about SAT. 2. Whether we have a warning option which warns the programmer about missed inlining. This is only communicated by running `-dcore-lint` currently. Does that clear things up? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14211#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC