Re: [GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best

28 Mar 2018

      #14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
        Reporter:  harendra          |                Owner:  osa1
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):
...
If I change the optimization flags to -O0 for benchmark stanza in cabal
 file I can get close to ghci performance.
That contradicts what Omer found in comment:27.

 Nevertheless, if what you say is true, it'd be easier to debug with -O0
 than GHCi (which brings the bytecode generator into the picture).
...
GHCi is 6x faster than my regular compiled code
This is totally bonkers and we MUST find out what is happening :-).

 I suggest not getting diverted into speculation about CPS.  We have a
 repro case; let's just dig into it and find out what is going on.

 My suggestions

 * In comment:31 Does the same thing happen with -O0 vs -O, or only with
 GHCi vs -O?

 * In all repros, do the huge differences also show up in the bytes-
 allocated numbers?  (If so, we don't need the Criterion apparatus.)

 * I notice that in comment:27, in the 2-module case, comparing -O0 and
 -O1:
   * Allocation is about halved in -O1
   * But runtime actually increases

   That is most peculiar.

 * Matthew says in comment:34 "I can reproduce this..".  That's great.  But
 what is "this" precisely?  Which version of GHC?  What timing data?  What
 happened to allocation and GC numbers?

 Somehow a 6x increase in execution time ought not to be hard to find!

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14208#comment:38
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler