Hi Nicolas,

In my opinion we should look at nofib (slow) and make sure that

 1) it's at least neutral on average (runtimes and preferably allocations too),
 2) there are some benchmarks that improve significantly (that's why we're making the change after all), and
 3) we can attribute the losses to something other than significantly worse Core (or at least more programs get better than get worse).

If these 3 hold and the compile times aren't up too much, I think it's a candidate for being on by default in -02.

In my mind the key is to understand why the programs that got worse got worse. For example, when I enabled -funbox-small-strict-fields by default there were some losers, but the reasons these were losers was more accidental than due to -funbox-small-strict-fields so I was happy to turn it on by default anyway.

-- Johan



On Fri, Aug 30, 2013 at 12:28 PM, Nicolas Frisby <nicolas.frisby@gmail.com> wrote:
TO: Performance czars and devs

I pushed a patch yesterday enabling a second demand analysis at the end of the core2core simplification pipeline. The flag is -flate-dmd-anal, and it is off by default.

My question:

    What's the protocol for deciding if -O2 should imply it?

See http://ghc.haskell.org/trac/ghc/wiki/LateDmd for context.

In particular, this section includes highlights of some nofib runs I did.

  http://ghc.haskell.org/trac/ghc/wiki/LateDmd#Newperformancenumbers

For some tests, it decreases allocation by 10% to 20%. But on the platforms I have tried, it causes a couple repeatable slowdowns, up to 10%. I've investigated a bit, but haven't found any clear explanations. I'm worried that it's caching effects, eg.

Any suggestions on how I should proceed with my investigation?

Also: I'd appreciate if any developer would generously run some benchmarks on various platforms they might have and add them to the same section in the wiki page.

  http://ghc.haskell.org/trac/ghc/wiki/LateDmd#Newperformancenumbers

NB That it is unfortunately key to build the libraries twice: once with -flate-dmd-anal in GhcLibHcOpts and once without. I have not determined how to do this robustly without a distclean — please let me know if you have a better method.

So I've used

# one of the following
#GhcLibHcOpts    = -O2  # both with and without -flate-dmd-anal
GhcLibHcOpts    = -O2 -flate-dmd-anal
SplitObjs          = NO
DYNAMIC_BY_DEFAULT   = NO
DYNAMIC_GHC_PROGRAMS = NO

The last three aren't necessary, but please record what you use, if you are so generous as to run it :).

Thanks.