TO: Performance czars and devs

I pushed a patch yesterday enabling a second demand analysis at the end of the core2core simplification pipeline. The flag is -flate-dmd-anal, and it is off by default.

My question:

    What's the protocol for deciding if -O2 should imply it?

See http://ghc.haskell.org/trac/ghc/wiki/LateDmd for context.

In particular, this section includes highlights of some nofib runs I did.

  http://ghc.haskell.org/trac/ghc/wiki/LateDmd#Newperformancenumbers

For some tests, it decreases allocation by 10% to 20%. But on the platforms I have tried, it causes a couple repeatable slowdowns, up to 10%. I've investigated a bit, but haven't found any clear explanations. I'm worried that it's caching effects, eg.

Any suggestions on how I should proceed with my investigation?

Also: I'd appreciate if any developer would generously run some benchmarks on various platforms they might have and add them to the same section in the wiki page.

  http://ghc.haskell.org/trac/ghc/wiki/LateDmd#Newperformancenumbers

NB That it is unfortunately key to build the libraries twice: once with -flate-dmd-anal in GhcLibHcOpts and once without. I have not determined how to do this robustly without a distclean — please let me know if you have a better method.

So I've used

# one of the following
#GhcLibHcOpts    = -O2  # both with and without -flate-dmd-anal
GhcLibHcOpts    = -O2 -flate-dmd-anal
SplitObjs          = NO
DYNAMIC_BY_DEFAULT   = NO
DYNAMIC_GHC_PROGRAMS = NO

The last three aren't necessary, but please record what you use, if you are so generous as to run it :).

Thanks.