Be aware that some of the biggest performance problems with TH simply can't be fixed without changes to the TH language. For details, see Edward Yang's blog post: http://blog.ezyang.com/2016/07/what-template-haskell-gets-wrong-and-racket-gets-right/
There was a Reddit thread discussing that post at https://www.reddit.com/r/haskell/comments/4tfzah/what_template_haskell_gets_wrong_and_racket_gets/
-------- Original message --------
From: Alfredo Di Napoli <alfredo.dinapoli@gmail.com>
Date: 4/9/17 5:37 AM (GMT-05:00)
To: Ben Gamari <ben@smart-cactus.org>
Cc: ghc-devs@haskell.org
Subject: Re: Where do I start if I would like help improve GHC compilation times?
Hey Ben,
as promised I’m back to you with something more articulated and hopefully meaningful. I do hear you perfectly — probably trying to dive head-first into this without at least a rough understanding of the performance hotspots or the GHC overall architecture is going to do me more harm than good (I get the overall picture and I’m aware of the different stages of the GHC compilation pipeline, but it’s far from saying I’m proficient with the architecture as whole). I have also read a couple of years ago the GHC chapter on the “Architeture of Open Source Applications” book, but I don’t know how much that is still relevant. If it is, I guess I should refresh my memory.
I’m currently trying to move on 2 fronts — please advice if I’m a fool flogging a dead horse or if I have any hope of getting anything done ;)
1. I’m trying to treat indeed the compiler as a black block (as you adviced) trying to build a sufficiently large program where GHC is not “as fast as I would like” (I know that’s a very lame definition of “slow”, hehe). In particular, I have built the stage2 compiler with the “prof” flavour as you suggested, and I have chosen 2 examples as a reference “benchmark” for performance; DynFlags.hs (which seems to have been mentioned multiple times as a GHC perf killer) and the highlighting-kate package as posted here:
https://ghc.haskell.org/trac/ghc/ticket/9221 . The idea would be to compile those with -v +RTS -p -hc -RTS enabled, look at the output from the .prof file AND the `-v` flag, find any hotspot, try to change something, recompile, observe diff, rinse and repeat. Do you think I have any hope of making progress this way? In particular, I think compiling DynFlags.hs is a bit of a dead-end; I whipped up this buggy script which escalated into a Behemoth which is compiling pretty much half of the compiler once again :D
```
#!/usr/bin/env bash
../ghc/inplace/bin/ghc-stage2 --make -j8 -v +RTS -A256M -qb0 -p -h \
-RTS -DSTAGE=2 -I../ghc/includes -I../ghc/compiler -I../ghc/compiler/stage2 \
-I../ghc/compiler/stage2/build \
-i../ghc/compiler/utils:../ghc/compiler/types:../ghc/compiler/typecheck:../ghc/compiler/basicTypes \
-i../ghc/compiler/main:../ghc/compiler/profiling:../ghc/compiler/coreSyn:../ghc/compiler/iface:../ghc/compiler/prelude \
-i../ghc/compiler/stage2/build:../ghc/compiler/simplStg:../ghc/compiler/cmm:../ghc/compiler/parser:../ghc/compiler/hsSyn \
-i../ghc/compiler/ghci:../ghc/compiler/deSugar:../ghc/compiler/simplCore:../ghc/compile/specialise \
-fforce-recomp -c $@
```
I’m running it with `./dynflags.sh ../ghc/compiler/main/DynFlags.hs` but it’s taking a lot to compile (20+ mins on my 2014 mac Pro) because it’s pulling in half of the compiler anyway :D I tried to reuse the .hi files from my stage2 compilation but I failed (GHC was complaining about interface file mismatch). Short story short, I don’t think it will be a very agile way to proceed. Am I right? Do you have any recommendation in such sense? Do I have any hope to compile DynFlags.hs in a way which would make this perf investigation feasible?
The second example (the highlighting-kate package) seems much more promising. It takes maybe 1-2 mins on my machine, which is enough to take a look at the perf output. Do you think I should follow this second lead? In principle any 50+ modules package I think would do (better if with a lot of TH ;) ) but this seems like a low-entry barrier start.
Maybe some are very specific, but it seems like fixing small things and move forward could help giving me understanding of different sub-parts of GHC, which seems less intimidating than the black-box approach.
In conclusion, what do you think is the best approach, 1 or 2, both or none? ;)
Thank you!
Alfredo