Re: Where do I start if I would like help improve GHC compilation times?

10 Apr 2017

      Alfredo Di Napoli  writes:
...
Hey Ben,
Hi Alfredo,

Sorry for the late response! The email queue from the weekend was a bit
longer than I would like.
...
as promised I’m back to you with something more articulated and hopefully
meaningful. I do hear you perfectly — probably trying to dive head-first
into this without at least a rough understanding of the performance
hotspots or the GHC overall architecture is going to do me more harm than
good (I get the overall picture and I’m aware of the different stages of
the GHC compilation pipeline, but it’s far from saying I’m proficient with
the architecture as whole). I have also read a couple of years ago the GHC
chapter on the “Architeture of Open Source Applications” book, but I don’t
know how much that is still relevant. If it is, I guess I should refresh my
memory.
It sounds like you have done a good amount of reading. That's great.
Perhaps skimming the AOSA chapter again wouldn't hurt, but otherwise
it's likely worthwhile diving in.
...
I’m currently trying to move on 2 fronts — please advice if I’m a fool
flogging a dead horse or if I have any hope of getting anything done ;)
1. I’m trying to treat indeed the compiler as a black block (as you
adviced) trying to build a sufficiently large program where GHC is not “as
fast as I would like” (I know that’s a very lame definition of “slow”,
hehe). In particular, I have built the stage2 compiler with the “prof”
flavour as you suggested, and I have chosen 2 examples as a reference
“benchmark” for performance; DynFlags.hs (which seems to have been
mentioned multiple times as a GHC perf killer) and the highlighting-kate
package as posted here: https://ghc.haskell.org/trac/ghc/ticket/9221 .
Indeed, #9221 would be a very interesting ticket to look at. The
highlighting-kate package is interesting in the context of that ticket
as it has a very large amount of parallelism available.

If you do want to look at #9221, note that the cost centre profiler may
not provide the whole story. In particular, it has been speculated that
the scaling issues may be due to either,

 * threads hitting a blackhole, resulting in blocking

 * the usual scaling limitations of GHC's stop-the-world GC

The eventlog may be quite useful for characterising these.
...
The idea would be to compile those with -v +RTS -p -hc -RTS enabled,
look at the output from the .prof file AND the `-v` flag, find any
hotspot, try to change something, recompile, observe diff, rinse and
repeat. Do you think I have any hope of making progress this way? In
particular, I think compiling DynFlags.hs is a bit of a dead-end; I
whipped up this buggy script which
escalated into a Behemoth which is compiling pretty much half of the
compiler once again :D
```
#!/usr/bin/env bash
../ghc/inplace/bin/ghc-stage2 --make -j8 -v +RTS -A256M -qb0 -p -h \
-RTS -DSTAGE=2 -I../ghc/includes -I../ghc/compiler -I../ghc/compiler/stage2
\
-I../ghc/compiler/stage2/build \
-i../ghc/compiler/utils:../ghc/compiler/types:../ghc/compiler/typecheck:../ghc/compiler/basicTypes
\
-i../ghc/compiler/main:../ghc/compiler/profiling:../ghc/compiler/coreSyn:../ghc/compiler/iface:../ghc/compiler/prelude
\
-i../ghc/compiler/stage2/build:../ghc/compiler/simplStg:../ghc/compiler/cmm:../ghc/compiler/parser:../ghc/compiler/hsSyn
\
-i../ghc/compiler/ghci:../ghc/compiler/deSugar:../ghc/compiler/simplCore:../ghc/compile/specialise
\
-fforce-recomp -c $@
```
I’m running it with `./dynflags.sh ../ghc/compiler/main/DynFlags.hs` but
it’s taking a lot to compile (20+ mins on my 2014 mac Pro) because it’s
pulling in half of the compiler anyway :D I tried to reuse the .hi files
from my stage2 compilation but I failed (GHC was complaining about
interface file mismatch). Short story short, I don’t think it will be a
very agile way to proceed. Am I right? Do you have any recommendation in
such sense? Do I have any hope to compile DynFlags.hs in a way which would
make this perf investigation feasible?
What I usually do in this case is just take the relevant `ghc` command
line directly from the `make` output and execute it manually. I would
imagine your debug cycle would look something like,

 * instrument the compiler
 * build stage1
 * use stage2 to build DynFlags using the stage1 compiler (using a saved command line)
 * think
 * repeat

This should only take a few minutes per iteration.
...
The second example (the highlighting-kate package) seems much more
promising. It takes maybe 1-2 mins on my machine, which is enough to take a
look at the perf output. Do you think I should follow this second lead? In
principle any 50+ modules package I think would do (better if with a lot of
TH ;) ) but this seems like a low-entry barrier start.
2. The second path I’m exploring is simply to take a less holistic approach
and try to dive in into a performance ticket like the ones listed here:
https://www.reddit.com/r/haskell/comments/45q90s/is_anything_being_done_to_r...
Maybe some are very specific, but it seems like fixing small things and
move forward could help giving me understanding of different sub-parts of
GHC, which seems less intimidating than the black-box approach.
Do you have any specific tickets from these lists that you found
interesting?
...
In conclusion, what do you think is the best approach, 1 or 2, both or
none? ;)
I would say that it largely depends upon what you feel most comfortable
with. If you feel up for it, I think #9221 would be a nice, fairly
self-contained, yet high-impact ticket which would be worth spending a
few days diving further into.

Cheers,

- Ben

Re: Where do I start if I would like help improve GHC compilation times?

Ben Gamari