PUBLIC
Looking at this with ghc-debug, at least I can see why we have the huge memory usage when recreating ModDetails: there are lots of HscEnvs stored all
over the heap (I assume in thunks inside already compiled modules), when I recreate the ModDetails and replace them in the HUG, I am accumulating more and more copies of the same MOdDetails, since other HscEnvs are of course still pointing to the HUG that
had the previous ModDetails, and so on.
Given this, I don’t really know yet if remaking the ModDetails would help me or not, since trying it is now blocking on figuring out how I could avoid
having multiple HscEnvs in memory at the same time…
From: ghc-devs <ghc-devs-bounces@haskell.org>
On Behalf Of Gergo Érdi
Sent: Friday, January 17, 2025 4:08 PM
To: GHC Devs <ghc-devs@haskell.org>
Cc: Montelatici, Raphael Laurent <Raphael.Montelatici@sc.com>
Subject: [External] GHC memory usage when typechecking from source vs. loading ModIfaces
Hi,
I’m using the GHC API to typecheck 35,000 modules that form a complicated dependency graph (with multiple top-level modules, i.e. there’s no single “god module” that would transitively depend on everything else), and I noticed that peak
memory usage is wildly different when everything is done from scratch vs. when everything is loaded from files containing ModIfaces: 17G vs. 8G. This ratio replicates for smaller samples as well, e.g. 80M vs 33M for 407 modules.
I’m aware of
https://gitlab.haskell.org/ghc/ghc/-/issues/13586 and so when I finish typechecking a module, I take the resulting ModIface and create the ModDetails that ends up in the HomeUnitGraph from that. My understanding of Matt’s original GHC fix in
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5478 is that it does the same, i.e. it only makes a fresh ModDetails only once per module, after the ModIface is ready.
But of course that still means that ModDetails can only keep growing as more and more parts of it are used for typechecking more and more dependants. Could that be the cause? I tried a crude experiment of “putting the toothpaste back in
the tube” by replacing all ModDetails with a fresh one in the HUG after each finished typechecking , but that’s a complete disaster for memory usage: even for the small 407 module example, the memory usage shoots up to 1.5G. I can imagine it’s because imported
Ids are probably not shared anymore between different importer modules.
Any ideas on how I could improve memory usage in the from-scratch case, so that it's more similar to the from-ModIface case?
Thanks,
Gergo