Re: Loading GHC into GHCi (and ghcid)

8 Jun 2018

      On 7 June 2018 at 22:25, Evan Laforge  wrote:
...
On Thu, Jun 7, 2018 at 1:47 PM, Simon Marlow  wrote:
...
For loading large amounts of code into GHCi, you want to add -j<n> +RTS
-A128m where <n> is the number of cores on your machine. We've found that
parallel compilation works really well in GHCi provided you use a nice
large
allocation area for the GC. This dramatically speeds up working with
large
numbers of modules in GHCi. (500 is small!)
This is a bit of a thread hijack (feel free to change the subject),
but I also have a workflow that involves loading a lot of modules in
ghci (500-700).  As long as I can coax ghci to load them, things are
fast and work well, but my impression is that this isn't a common
workflow, and specifically ghc developers don't do this, because just
about every ghc release will break it in one way or another (e.g. by
putting more flags in the recompile check hash), and no one seems to
understand what I'm talking about when I suggest features to improve
it (e.g. the recent msg about modtime and recompilation avoidance).
Given the uphill battle, I've been thinking that linking most of those
modules into a package and loading much fewer will be a better
supported workflow.  It's actually less convenient, because now it's
divided between package level (which require a restart and relink if
they change) and ghci level (which don't), but is maybe less likely to
be broken by ghc changes.  Also, all those loaded module consume a
huge amount of memory, which I haven't tracked down yet, but maybe
packages will load more efficiently.
But ideally I would prefer to continue to not use packages, and in
fact do per-module more aggressively for larger codebases, because the
need to restart ghci (or the ghc API-using program) and do a lengthy
relink every time a module in the "wrong place" changed seems like it
could get annoying (in fact it already is, for a cabal-oriented
workflow).
Does the workflow at Facebook involve loading tons of individual
modules as I do?
Yes, our workflow involves loading a large number of modules into GHCi.
However, we have run into memory issues, which was the reason for the
recent work on fixing this space leak: https://phabricator.haskell.org/D4659

As it is, this workflow is OK thanks to Bartosz' work on speedups for large
numbers of modules, tweaking the RTS flags as I mentioned and some other
fixes we've made in GHCi to avoid performance issues. (all of this is
upstream, incidentally).  There is probably low-hanging fruit to be had in
reducing the memory usage of GHCi, nobody has really attacked this with the
heap profiler for a while. However, I imagine at some point loading
everything into GHCi will become unsustainable and we'll have to explore
other strategies. There are a couple of options here:
- pre-compile modules so that GHCi is loading the .o instead of interpreted
code
- move some of the code into pre-compiled packages, as you mentioned

Cheers
Simon
...
Or do they get packed into packages?  If it's the
many modules, do you have recommendations making that work well and
keeping it working?  If packages are the way you're "supposed" to do
things, then is there any idea about how hard it would be to reload
packages at runtime?  If both modules and packages can be reloaded, is
there an intended conceptual difference between a package and an
unpackaged collection of modules?  To illustrate, I would put packages
purely as a way to organize builds and distribution, and have no
meaning at the compiler level, which is how I gather C compilers
traditionally work (e.g. 'cc a.o b.o c.o' is the same as 'ar abc.a a.o
b.o c.o; cc abc.a').  But that's clearly not how ghc sees it!
thanks!