Re: Improving GHC GC for latency-sensitive networked services

17 Oct 2016

      Christopher Allen  writes:
...
It'd be unfortunate if more companies trying out Haskell came to the
same result: https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/#comment-2866985...
(They gave up and rewrote the service in Golang)
Aside: Go strikes me as an odd choice here; I would have thought they
would just move to something like Rust or C++ to avoid GC entirely and
still benefit from a reasonably expressive type system. Anyways, moving
along...
...
Most of the state of the art I'm aware of (such as from Azul Systems)
is from when I was using a JVM language, which isn't necessarily
applicable for GHC.
I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was
abandoned because it penalized performance too much. Does the
impression that there isn't the labor to maintain two GCs still hold?
It seems like thread-local heaps would be pervasive.
Yes, I believe that this indeed still holds. In general the RTS lacks
hands and garbage collectors (especially parallel implementations)
require a fair bit of background knowledge to maintain.
...
Does anyone know what could be done in GHC itself to improve this
situation? Stop-the-world is pretty painful when the otherwise
excellent concurrency primitives are much of why you're using Haskell.
Indeed it is quite painful. However, I suspect that compact regions
(coming in 8.2) could help in many workloads.

In the case of Pusher's workload (which isn't very precisely described,
so I'm guessing here) I suspect you could take batches of N messages and
add them to a compact region, essentially reducing the number of live
heap objects (and hence work that the GC must perform) by a factor of N.
Of course, in doing this you give up the ability to "retire" messages
individually. To recover this ability one could introduce a Haskell
"garbage collector" task to scan the active regions and copy messages
that should be kept into a new region, dropping those that
should be retired. Here you benefit from the fact that copying into a
compact region can be done in parallel (IIRC), allowing us to
essentially implement a copying, non-stop-the-world GC in our Haskell
program. This allows the runtime's GC to handle a large, static heap as
though it were a constant factor smaller, hopefully reducing pause
duration. That being said, this is all just wild speculation; I could be
wrong, YMMV, etc.

Of course, another option is splitting your workload across multiple
runtime systems. Cloud Haskell is a very nice tool for this which I've
used on client projects with very good results. Obviously it isn't
always possible to segment your heap as required by this approach, but
it is quite effective when possible.

While clearly neither of these are as convenient as a more scalable
garbage collector, they are both things we can (nearly) do today.

Looking farther into the future, know there is a group looking to add
linear types to GHC/Haskell with a separate linear heap (which needn't
be garbage collected). I'll let them elaborate if they so desire.

Cheers,

- Ben

Re: Improving GHC GC for latency-sensitive networked services

Ben Gamari