
#14208: Performance with O0 is much better than the default or with -O2, runghc performs the best -------------------------------------+------------------------------------- Reporter: harendra | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Replying to [comment:17 harendra]:
The combination of `-fexpose-all-unfoldings` and `-fspecialise- aggressively` is not "exactly" equivalent to putting everything in the same module. O1 with everything in the same module finishes in 8ms while with the combination of these two finishes in 4ms. So they do something more. I guess the added effect is that they make everything INLINEABLE.
Yep, forgot that bit. That's exactly what I use the two options for: to be able to split things among modules and to avoid INLINEABLE for every polymorphic function. With this, I only ever need an occasional INLINE in random places, but then it's not for specialization, but real inlining.
When everything is in the same module and `toList` marked NOINLINE then it takes 14ms (i.e. the worst case) irrespective of the monoid functions being marked INLINE or not.
And what if they are marked NOINLINE? In any case, that means we now have an example of failed fusion that fits in one module. And additionally, we know that GHC can effectively generate such an example from innocently looking set of modules, by automatically inlining too much (or not enough). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14208#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler