Re: What is the mutator?

Message: 1 Date: Thu, 6 Aug 2009 00:38:08 -0700 From: Jason Dusek
Subject: What is the mutator? To: glasgow-haskell-users@haskell.org Message-ID: <42784f260908060038h53d7cc0dy9f80e43f269a2faf@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 I've been reading a little about GC latency and have run across statements like this:
One solution to the GC synchronisation problem would be to implement a concurrent garbage collector. Typically, however, concurrent GC adds some overhead to the mutator, since it must synchronise with the collector.some thunks are never “black-holed”, so giving a potential performance win. Unfortunately, in the parallel setting, it substantially enlarges the time window in which two or more duplicate threads might evaluate the same think, and thus
-- Comparing and Optimising Parallel Haskell Implementations for Multicore Machines
What is the mutator?
Hi Jason, as Malcolm already said, the "mutator" in this text is the/a thread evaluating some Haskell expression. Just to add some more details to the picture... In general, a Haskell expression is a computation represented as a graph in the heap. Haskell evaluates lazily and does not have to fully evaluate every part of it for the program to finish. Unevaluated parts are "thunks". As soon as one of potentially several concurrent (mutator) threads starts to evaluate a thunk, it is replaced by a blackhole, which keeps other threads out of it until the node in the graph is evaluated (say, to a list cons (:), with probably unevaluated head and tail). Then the blackhole is updated with the new value. Other threads block on the blackhole in the meantime (so not necessarily an exception in the case of concurrent mutator threads) and are woken up by the update. The passage you quote above is about two separate aspects: 1. Garbage collection and mutator running concurrently: while they usually do, they do not _have_ to exclude each other, but not doing so means that the objects they are treating have to be locked. 2. About "Blackholing": in the sequential evaluation (where hitting a blackhole indeed means to have a loop), some better performance can be gained by not blackholing a thunk immediately, so this was done in GHC earlier. However, it increases the chance for 2 mutator threads to evaluate the same thunk (double work), and we got better performance by blackholing immediately. Cheers, Jost

2009/08/06 Jost Berthold
as Malcolm already said, the "mutator" in this text is the/a thread evaluating some Haskell expression.
I want to thank everyone for taking the time to clarify that to me; I'm now much more able to follow discussions of Haskell garbage collection.
1. Garbage collection and mutator running concurrently: while they usually do, they do not _have_ to exclude each other, but not doing so means that the objects they are treating have to be locked.
So this is the part that actually lead me here. Say you are implementing a network server, for example -- you don't want to have big spikes in the request latency due to GC. Not that Haskell is so much worse off relative to Java, say; Erlang is the only language I'm aware of that takes concurrent GC seriously. However, it seems that this problem is hard to solve for Haskell: Parallel GC is when the whole system stops and performs multi-threaded GC, as opposed to "concurrent GC", which is when the GC runs concurrently with the program. We think concurrent GC is unlikely to be practical in the Haskell setting, due to the extra synchronisation needed in the mutator. However, there may always be clever techniques that we haven't discovered, and synchronisation might become less expensive, so the balance may change in the future. -- Simon Marlow So I wonder, to what degree is GC latency controllable in Haskell? It seems that, pending further research, we can not hope for concurrent GC.
2. About "Blackholing": in the sequential evaluation (where hitting a blackhole indeed means to have a loop), some better performance can be gained by not blackholing a thunk immediately, so this was done in GHC earlier. However, it increases the chance for 2 mutator threads to evaluate the same thunk (double work), and we got better performance by blackholing immediately.
Can blackholing too early could result in non-termination ("...hitting a blackhole indeed means to have a loop")? Then it's not just a matter of performance when we do it? -- Jason Dusek

Say you are implementing a network server, for example -- you don't want to have big spikes in the request latency due to GC.
We think concurrent GC is unlikely to be practical in the Haskell setting, due to the extra synchronisation needed in the mutator. -- Simon Marlow
It is perfectly possible to do real-time concurrent garbage collection for Haskell, in an incremental fashion that guarantees a small maximum bound on each packet of GC work. The tradeoff is that the percentage of time devoted to GC in total is much greater, and the mutator must do more work to co-operate with the GC. In other words, the program runs slower. This tradeoff is the same for all real-time garbage collection schemes as far as I am aware, in any language - either you can have responsiveness, or you can have better overall application speed, but you cannot have both.
So I wonder, to what degree is GC latency controllable in Haskell? It seems that, pending further research, we can not hope for concurrent GC.
There have been several papers on real-time GC in Haskell (including one of my own). There is no technical problem, only performance worries. This is what I think Simon means by "unlikely to be practical". Regards, Malcolm

2009/08/07 Malcolm Wallace
There have been several papers on real-time GC in Haskell (including one of my own). There is no technical problem, only performance worries. This is what I think Simon means by "unlikely to be practical".
So I guess there is no "right answer" here -- a realistic solution would offer two GC options, maybe set by an option to the RTS or on a per-thread level. I guess your work was specifically on garbage collection of functional languages in embedded systems? An incremental garbage collector for embedded real-time systems ftp://ftp.cs.york.ac.uk/pub/malcolm/rtgc.html Also there is the "Non-stop Haskell" paper. Non-stop Haskell http://research.microsoft.com/en-us/um/people/simonpj/Papers/inc-gc.htm Both of these papers are from some time ago; the most recent thing I can find is: Exploring the Barrier to Entry - Incremental Generational Garbage Collection for Haskell -- Jason Dusek

Malcolm Wallace wrote:
Say you are implementing a network server, for example -- you don't want to have big spikes in the request latency due to GC.
We think concurrent GC is unlikely to be practical in the Haskell setting, due to the extra synchronisation needed in the mutator. -- Simon Marlow
It is perfectly possible to do real-time concurrent garbage collection for Haskell, in an incremental fashion that guarantees a small maximum bound on each packet of GC work. The tradeoff is that the percentage of time devoted to GC in total is much greater, and the mutator must do more work to co-operate with the GC. In other words, the program runs slower. This tradeoff is the same for all real-time garbage collection schemes as far as I am aware, in any language - either you can have responsiveness, or you can have better overall application speed, but you cannot have both.
So I wonder, to what degree is GC latency controllable in Haskell? It seems that, pending further research, we can not hope for concurrent GC.
There have been several papers on real-time GC in Haskell (including one of my own). There is no technical problem, only performance worries. This is what I think Simon means by "unlikely to be practical".
You're quite right, I don't really mean "practical", more like "not cheap enough to replace the existing GC as the default". My current thoughts on reducing pause times are to adopt a region-style GC, where the heap is divided into regions and any subset of the regions can be collected in any GC cycle. This generalisation of generational GC is becoming quite popular amongst the Java folks in particular. Without going to proper incremental GC, this eliminates the need to do full-heap collections (but introduces a slight danger due to cycles between regions), and leads to shorter, or even bounded, pauses. Cheers, Simon
participants (4)
-
Jason Dusek
-
Jost Berthold
-
Malcolm Wallace
-
Simon Marlow