
Perf builds of GHC also use -O2 for ghc-stage2, so check out what happens to GHC itself with late demand analysis. Edward Excerpts from Nicolas Frisby's message of Fri Aug 30 10:28:24 -0700 2013:
TO: Performance czars and devs
I pushed a patch yesterday enabling a second demand analysis at the end of the core2core simplification pipeline. The flag is -flate-dmd-anal, and it is off by default.
My question:
What's the protocol for deciding if -O2 should imply it?
See http://ghc.haskell.org/trac/ghc/wiki/LateDmd for context.
In particular, this section includes highlights of some nofib runs I did.
http://ghc.haskell.org/trac/ghc/wiki/LateDmd#Newperformancenumbers
For some tests, it decreases allocation by 10% to 20%. But on the platforms I have tried, it causes a couple repeatable slowdowns, up to 10%. I've investigated a bit, but haven't found any clear explanations. I'm worried that it's caching effects, eg.
Any suggestions on how I should proceed with my investigation?
Also: I'd appreciate if any developer would generously run some benchmarks on various platforms they might have and add them to the same section in the wiki page.
http://ghc.haskell.org/trac/ghc/wiki/LateDmd#Newperformancenumbers
NB That it is unfortunately key to build the libraries twice: once with -flate-dmd-anal in GhcLibHcOpts and once without. I have not determined how to do this robustly without a distclean — please let me know if you have a better method.
So I've used
# one of the following #GhcLibHcOpts = -O2 # both with and without -flate-dmd-anal GhcLibHcOpts = -O2 -flate-dmd-anal SplitObjs = NO DYNAMIC_BY_DEFAULT = NO DYNAMIC_GHC_PROGRAMS = NO
The last three aren't necessary, but please record what you use, if you are so generous as to run it :).
Thanks.