
"Young, Jeff"
Hi ghc devs,
I'm a long-time Haskeller but am just getting into GHC development. I started a 12 week internship at Tweag I/O under Richard Eisenberg this week with the singular goal to speedup GHC compile times. I'm specifically looking to contribute to ghc issues 18541 https://gitlab.haskell.org/ghc/ghc/-/issues/18541 and 18535 https://gitlab.haskell.org/ghc/ghc/-/issues/18535. So I thought I would reach out to the community to get some direction on issues/features/problems to tackle in the pursuit of faster compilation times. This is a full time internship and so I think there is a real opportunity to nail down a deliverable for the community, but I want to get some guidance from the experts (you fine people!) before going down a rabbit hole.
To be specific I'm looking for lingering items such as: 1. It would be great if we had <thing-here> but no one has time. 2. Primop foo is half complete but is the right type for <common-use-case-which-is-currently-just-a-list>. 3. Swap <some-type> to an array-like type *non-incrementally*, that is, establish a patch that rips out the previous type and replaces it with the array-like across the entire compiler, rather than module-by-module.
Point 2 is a specific reference to MR 3571 https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3571 but I'm unsure of the status and etiquette around MRs, and I'm unsure exactly how fulfilling the todos at the end of that MR would aid in faster compilation times (and if there is evidence to that effect somewhere).
Hi Jeff, Indeed I'm a bit skeptical that (2) will produce a meaningful compile-time improvement on typical programs. I would likely not prioritise this if the goal is compile-time in particular. A few people (namely Sebastian Graf and Andreas Klebinger) have thought about (3) in the past (e.g. for the arguments of TyConApps); preliminary experiments suggest that it's not as clear a win as one might expect, although AFAIK no one has tried to optimise pattern matching on the head, which may help matters (Sebastian has thought about this). One thing area where I feel a bit of attention may be useful is in the performance of substitution. In particular, tracking "taintedness" of the substitution result (as suggested in #19537) will help reduce garbage produced by substitution and potentially reduce compiler residency. Another related class of ideas can be found in #19538, which suggests that we try deferring substitution (or, alternatively, annotation expressions with free variable sets). The payoff here is far less certain that the taintedness idea and consequently I would only explore it after doing the former. This is all that comes to mind at the moment. I'll continue pondering other ideas, however. Cheers, - Ben