What makes GHC's GC good for purely functional workloads?

Apologies if this has been asked before. In some ways GHC's garbage collectors are very traditional, so I was wondering what magic dust makes it do better in Haskelly workloads (e.g. whereas you'd need to do a bit more work in the JVM to make some allocation pattern play nice.) Edward

On 23/11/2011 04:09, Edward Z. Yang wrote:
Apologies if this has been asked before. In some ways GHC's garbage collectors are very traditional, so I was wondering what magic dust makes it do better in Haskelly workloads (e.g. whereas you'd need to do a bit more work in the JVM to make some allocation pattern play nice.)
Our GC is generally tuned towards making allocation fast. So we use bump-pointer allocation and allocate several objects at once, and we use generational GC where the default nursery size is small (about L2 cache-sized). Young-generation collections are very quick - on the order of 10us. So we collect a lot of the garbage before it even hits main memory; this is also what makes GHC's GC scale reasonably well for parallel programs. On the other hand, our write barriers are more expensive than you'd typically find in a JVM, but they are more accurate: remembered set rather than card marking. This is a good tradeoff because mutation is rare in Haskell. The one exception is thunk update - there might be room for experimenting with a cheaper write barrier there. One thing I'm particularly proud of in our GC is the block layer. All memory is managed in units of blocks (currently 4k, but that's just a constant you can change). It makes everything a lot simpler, including parallel GC. We have a paper about it: http://community.haskell.org/~simonmar/papers/parallel-gc.pdf Cheers, Simon

On Wed, Nov 23, 2011 at 4:31 AM, Simon Marlow
On 23/11/2011 04:09, Edward Z. Yang wrote:
Apologies if this has been asked before. In some ways GHC's garbage collectors are very traditional, so I was wondering what magic dust makes it do better in Haskelly workloads (e.g. whereas you'd need to do a bit more work in the JVM to make some allocation pattern play nice.)
Our GC is generally tuned towards making allocation fast. So we use bump-pointer allocation and allocate several objects at once, and we use generational GC where the default nursery size is small (about L2 cache-sized). Young-generation collections are very quick - on the order of 10us. So we collect a lot of the garbage before it even hits main memory; this is also what makes GHC's GC scale reasonably well for parallel programs.
On the other hand, our write barriers are more expensive than you'd typically find in a JVM, but they are more accurate: remembered set rather than card marking. This is a good tradeoff because mutation is rare in Haskell. The one exception is thunk update - there might be room for experimenting with a cheaper write barrier there.
One thing I'm particularly proud of in our GC is the block layer. All memory is managed in units of blocks (currently 4k, but that's just a constant you can change). It makes everything a lot simpler, including parallel GC. We have a paper about it:
http://community.haskell.org/~simonmar/papers/parallel-gc.pdf
It sounds like the new "Garbage First" collector might be closer to these goals than the older HotSpot JVM collectors: http://labs.oracle.com/jtech/pubs/04-g1-paper-ismm.pdf It was built with different goals, but the end result is that it (can) focus on collecting garbage while its young. The paper was written a few years before the first version shipped, so I'm not sure how the implementation differs.
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Wed, Nov 23, 2011 at 7:55 AM, Antoine Latter
It sounds like the new "Garbage First" collector might be closer to these goals than the older HotSpot JVM collectors:
http://labs.oracle.com/jtech/pubs/04-g1-paper-ismm.pdf
It was built with different goals, but the end result is that it (can) focus on collecting garbage while its young.
The paper was written a few years before the first version shipped, so I'm not sure how the implementation differs.
Although it looks like the young-generation collections pause for quite a bit longer than 10us, and if I'm reading the paper correctly don't necessarily collect the entire young generation.
participants (3)
-
Antoine Latter
-
Edward Z. Yang
-
Simon Marlow