Re: [GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best

13 Sep 2017

      #14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
        Reporter:  harendra          |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by MikolajKonarski):

 Replying to [comment:17 harendra]:
...
The combination of `-fexpose-all-unfoldings` and `-fspecialise-
 aggressively` is not "exactly" equivalent to putting everything in the
 same module. O1 with everything in the same module finishes in 8ms while
 with the combination of these two finishes in 4ms. So they do something
 more. I guess the added effect is that they make everything INLINEABLE.
Yep, forgot that bit. That's exactly what I use the two options for: to be
 able to split things among modules and to avoid INLINEABLE for every
 polymorphic function. With this, I only ever need an occasional INLINE in
 random places, but then it's not for specialization, but real inlining.
...
When everything is in the same module and `toList` marked NOINLINE then
 it takes 14ms (i.e. the worst case) irrespective of the monoid functions
 being marked INLINE or not.
And what if they are marked NOINLINE? In any case, that means we now have
 an example of failed fusion that fits in one module. And additionally, we
 know that GHC can effectively generate such an example from innocently
 looking set of modules, by automatically inlining too much (or not
 enough).

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14208#comment:19
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler