
On 7 June 2018 at 22:25, Evan Laforge
On Thu, Jun 7, 2018 at 1:47 PM, Simon Marlow
wrote: For loading large amounts of code into GHCi, you want to add -j<n> +RTS -A128m where <n> is the number of cores on your machine. We've found that parallel compilation works really well in GHCi provided you use a nice large allocation area for the GC. This dramatically speeds up working with large numbers of modules in GHCi. (500 is small!)
This is a bit of a thread hijack (feel free to change the subject), but I also have a workflow that involves loading a lot of modules in ghci (500-700). As long as I can coax ghci to load them, things are fast and work well, but my impression is that this isn't a common workflow, and specifically ghc developers don't do this, because just about every ghc release will break it in one way or another (e.g. by putting more flags in the recompile check hash), and no one seems to understand what I'm talking about when I suggest features to improve it (e.g. the recent msg about modtime and recompilation avoidance).
Given the uphill battle, I've been thinking that linking most of those modules into a package and loading much fewer will be a better supported workflow. It's actually less convenient, because now it's divided between package level (which require a restart and relink if they change) and ghci level (which don't), but is maybe less likely to be broken by ghc changes. Also, all those loaded module consume a huge amount of memory, which I haven't tracked down yet, but maybe packages will load more efficiently.
But ideally I would prefer to continue to not use packages, and in fact do per-module more aggressively for larger codebases, because the need to restart ghci (or the ghc API-using program) and do a lengthy relink every time a module in the "wrong place" changed seems like it could get annoying (in fact it already is, for a cabal-oriented workflow).
Does the workflow at Facebook involve loading tons of individual modules as I do?
Yes, our workflow involves loading a large number of modules into GHCi. However, we have run into memory issues, which was the reason for the recent work on fixing this space leak: https://phabricator.haskell.org/D4659 As it is, this workflow is OK thanks to Bartosz' work on speedups for large numbers of modules, tweaking the RTS flags as I mentioned and some other fixes we've made in GHCi to avoid performance issues. (all of this is upstream, incidentally). There is probably low-hanging fruit to be had in reducing the memory usage of GHCi, nobody has really attacked this with the heap profiler for a while. However, I imagine at some point loading everything into GHCi will become unsustainable and we'll have to explore other strategies. There are a couple of options here: - pre-compile modules so that GHCi is loading the .o instead of interpreted code - move some of the code into pre-compiled packages, as you mentioned Cheers Simon
Or do they get packed into packages? If it's the many modules, do you have recommendations making that work well and keeping it working? If packages are the way you're "supposed" to do things, then is there any idea about how hard it would be to reload packages at runtime? If both modules and packages can be reloaded, is there an intended conceptual difference between a package and an unpackaged collection of modules? To illustrate, I would put packages purely as a way to organize builds and distribution, and have no meaning at the compiler level, which is how I gather C compilers traditionally work (e.g. 'cc a.o b.o c.o' is the same as 'ar abc.a a.o b.o c.o; cc abc.a'). But that's clearly not how ghc sees it!
thanks!