
Christopher Allen
It'd be unfortunate if more companies trying out Haskell came to the same result: https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/#comment-2866985... (They gave up and rewrote the service in Golang)
Aside: Go strikes me as an odd choice here; I would have thought they would just move to something like Rust or C++ to avoid GC entirely and still benefit from a reasonably expressive type system. Anyways, moving along...
Most of the state of the art I'm aware of (such as from Azul Systems) is from when I was using a JVM language, which isn't necessarily applicable for GHC.
I understand Marlow's thread-local heaps experiment circa 7.2/7.4 was abandoned because it penalized performance too much. Does the impression that there isn't the labor to maintain two GCs still hold? It seems like thread-local heaps would be pervasive.
Yes, I believe that this indeed still holds. In general the RTS lacks hands and garbage collectors (especially parallel implementations) require a fair bit of background knowledge to maintain.
Does anyone know what could be done in GHC itself to improve this situation? Stop-the-world is pretty painful when the otherwise excellent concurrency primitives are much of why you're using Haskell.
Indeed it is quite painful. However, I suspect that compact regions (coming in 8.2) could help in many workloads. In the case of Pusher's workload (which isn't very precisely described, so I'm guessing here) I suspect you could take batches of N messages and add them to a compact region, essentially reducing the number of live heap objects (and hence work that the GC must perform) by a factor of N. Of course, in doing this you give up the ability to "retire" messages individually. To recover this ability one could introduce a Haskell "garbage collector" task to scan the active regions and copy messages that should be kept into a new region, dropping those that should be retired. Here you benefit from the fact that copying into a compact region can be done in parallel (IIRC), allowing us to essentially implement a copying, non-stop-the-world GC in our Haskell program. This allows the runtime's GC to handle a large, static heap as though it were a constant factor smaller, hopefully reducing pause duration. That being said, this is all just wild speculation; I could be wrong, YMMV, etc. Of course, another option is splitting your workload across multiple runtime systems. Cloud Haskell is a very nice tool for this which I've used on client projects with very good results. Obviously it isn't always possible to segment your heap as required by this approach, but it is quite effective when possible. While clearly neither of these are as convenient as a more scalable garbage collector, they are both things we can (nearly) do today. Looking farther into the future, know there is a group looking to add linear types to GHC/Haskell with a separate linear heap (which needn't be garbage collected). I'll let them elaborate if they so desire. Cheers, - Ben