So I've tried to compile Idris/Agda with prof compilers but this
didn't quite work out due to deps not compiling (apparently it's not
possible to use template haskell with a profiled compiler).
Out of curiosity I had a look at compiling haskell-src-exts since that
takes quite a while. I've used ghc HEAD and 7.8.4 (both built with
BuildFlavour=prof & bootstrapped with a standard ghc 7.8.4) and it's
interesting -- the current HEAD takes quite a bit longer and allocates
way more than 7.8.4. One of the main things that stand out is the
CallArity analysis (which IIRC was not there in 7.8.4). So unless I
messed something up with measuring, the analysis seem to be
pretty expensive.
Anyway, the results are below.
Cheers,
Michal
** HEAD
Sun Apr 12 15:52 2015 Time and Allocation Profiling Report (Final)
ghc +RTS -p -RTS [...]
total time = 147.84 secs (147841 ticks @ 1000 us, 1 processor)
total alloc = 172,378,600,408 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
SimplTopBinds SimplCore 32.4 28.8
CallArity SimplCore 18.4 25.6
lintAnnots CoreLint 4.5 4.6
CoreTidy HscMain 4.5 5.1
pprNativeCode AsmCodeGen 3.2 3.4
OccAnal SimplCore 3.2 3.1
occAnalBind.assoc OccurAnal 2.6 2.5
StgCmm HscMain 2.3 1.9
Simplify SimplCore 2.1 0.2
RegAlloc AsmCodeGen 2.1 2.4
FloatOutwards SimplCore 2.0 1.6
regLiveness AsmCodeGen 1.9 1.9
tc_rn_src_decls TcRnDriver 1.8 1.3
sink CmmPipeline 1.7 1.5
NewStranal SimplCore 1.3 1.5
genMachCode AsmCodeGen 1.1 1.0
layoutStack CmmPipeline 1.0 1.0
** HEAD with -fno-call-arity
Sun Apr 12 18:16 2015 Time and Allocation Profiling Report (Final)
ghc +RTS -p -RTS [...] -fno-call-arity
total time = 113.71 secs (113714 ticks @ 1000 us, 1 processor)
total alloc = 121,884,896,720 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
SimplTopBinds SimplCore 37.2 36.6
CoreTidy HscMain 6.0 7.3
lintAnnots CoreLint 5.8 6.5
pprNativeCode AsmCodeGen 4.1 4.8
OccAnal SimplCore 3.6 3.8
occAnalBind.assoc OccurAnal 2.9 3.2
StgCmm HscMain 2.9 2.6
RegAlloc AsmCodeGen 2.6 3.4
FloatOutwards SimplCore 2.6 2.3
regLiveness AsmCodeGen 2.5 2.8
tc_rn_src_decls TcRnDriver 2.4 1.9
Simplify SimplCore 2.4 0.3
sink CmmPipeline 2.1 2.2
NewStranal SimplCore 1.7 2.1
genMachCode AsmCodeGen 1.4 1.4
layoutStack CmmPipeline 1.4 1.4
NativeCodeGen CodeOutput 1.1 1.2
FloatInwards SimplCore 1.1 1.4
do_block Hoopl.Dataflow 1.0 0.6
Digraph.scc Digraph 0.8 1.3
** 7.8.4
Sun Apr 12 15:41 2015 Time and Allocation Profiling Report (Final)
ghc +RTS -p -RTS [...]
total time = 93.11 secs (93112 ticks @ 1000 us, 1 processor)
total alloc = 103,135,975,120 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
SimplTopBinds SimplCore 38.5 37.4
pprNativeCode AsmCodeGen 6.2 7.2
StgCmm HscMain 3.9 4.2
RegAlloc AsmCodeGen 3.7 5.1
occAnalBind.assoc OccurAnal 3.3 3.6
OccAnal SimplCore 3.3 3.6
regLiveness AsmCodeGen 3.1 3.4
FloatOutwards SimplCore 2.9 2.4
sink CmmPipeline 2.8 2.8
Simplify SimplCore 2.6 0.3
tc_rn_src_decls TcRnDriver 2.4 2.1
genMachCode AsmCodeGen 1.9 2.0
NewStranal SimplCore 1.8 2.1
layoutStack CmmPipeline 1.8 1.8
Core2Core HscMain 1.3 1.2
deSugar HscMain 1.1 1.1
do_block Hoopl.Dataflow 1.1 0.7
CoreTidy HscMain 1.0 1.1
CorePrep HscMain 1.0 1.1
Digraph.scc Digraph 0.9 1.5
versioninfo MkIface 0.9 1.0
zonkEvBndr_zonkTcTypeToType TcHsSyn 0.6 1.4