Current state of garbage collection in Haskell

Hi, I was looking at a video that talks about GC pauses. That got me curious about the current state of GC in Haskell - say ghc 7.4.1. Would it suffer from lengthy pauses when we talk about memory in the range of 500M +? What would be a good way to keep abreast with the progress on haskell GC? Regards, Kashyap

On Sun, Jul 29, 2012 at 12:52 AM, C K Kashyap
Hi, I was looking at a video that talks about GC pauses. That got me curious about the current state of GC in Haskell - say ghc 7.4.1. Would it suffer from lengthy pauses when we talk about memory in the range of 500M +? What would be a good way to keep abreast with the progress on haskell GC? Regards, Kashyap
Have you read the latest GHC manual pages?[1] It has a list of options, suggestions, gotchas, etc. I haven't read the GHC specific mailing lists, but cvs-ghc sounds like where you might get real-time updates. [1]( http://www.haskell.org/ghc/docs/latest/html/users_guide/runtime-control.html... )

GHC does not provide any form of real-time guarantees (and support for
them is not planned).
That said, it's not as bad as it sounds:
- Collecting the first (young) generation is fast and you can control
the size of that first generation via runtime system (RTS) options.
- The older generation is collected rarely and can be collected in parallel.
- You can explicitly invoke the GC via System.Mem.performGC
In a multi-threaded / multi-core program collecting the first
generation still requires stopping all application threads even though
only one thread (CPU) will perform GC (and having other threads help
out usually doesn't work out due to locality issues). This can be
particularly expensive if the OS decides to deschedule an OS thread,
as then the GHC RTS has to wait for the OS. You can avoid that
particular problem by properly configuring the OS via (linux boot
isolcpus=... and taskset(8)). The GHC team has been working on a
independent *local* GC, but it's unlikely to make it into the main
branch at this time. It turned out to be very difficult to implement,
with not large enough gains. Building a fully-concurrent GC is
(AFAICT) even harder.
I don't know how long the pause times for your 500MB live heap would
be. Generally, you want your heap to be about twice the size of your
live data. Other than that it depends heavily on the characteristics
of you heap objects. E.g., if it's mostly arrays of unboxed
non-pointer data, then it'll be very quick to collect (since the GC
doesn't have to do anything with the contents of these arrays). If
it's mostly many small objects with pointers to other objects, GC will
be very expensive and bound by the latency of your RAM. So, I suggest
you run some tests with realistic heaps.
Regarding keeping up, Simon Marlow is the main person working on GHC's
GC (often collaborating with others) and he keeps a list of papers on
his homepage: http://research.microsoft.com/en-us/people/simonmar/
If you have further questions about GHC's GC, you can ask them on the
glasgow-haskell-users@haskell.org mailing list (but please consult the
GHC user's guide section on RTS options first).
HTH
On 29 July 2012 08:52, C K Kashyap
Hi, I was looking at a video that talks about GC pauses. That got me curious about the current state of GC in Haskell - say ghc 7.4.1. Would it suffer from lengthy pauses when we talk about memory in the range of 500M +? What would be a good way to keep abreast with the progress on haskell GC? Regards, Kashyap
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Push the envelope. Watch it bend.

Thank you so much Alexander and Thomas.
Regards,
Kashyap
On Sun, Jul 29, 2012 at 11:59 PM, Thomas Schilling
GHC does not provide any form of real-time guarantees (and support for them is not planned).
That said, it's not as bad as it sounds:
- Collecting the first (young) generation is fast and you can control the size of that first generation via runtime system (RTS) options.
- The older generation is collected rarely and can be collected in parallel.
- You can explicitly invoke the GC via System.Mem.performGC
In a multi-threaded / multi-core program collecting the first generation still requires stopping all application threads even though only one thread (CPU) will perform GC (and having other threads help out usually doesn't work out due to locality issues). This can be particularly expensive if the OS decides to deschedule an OS thread, as then the GHC RTS has to wait for the OS. You can avoid that particular problem by properly configuring the OS via (linux boot isolcpus=... and taskset(8)). The GHC team has been working on a independent *local* GC, but it's unlikely to make it into the main branch at this time. It turned out to be very difficult to implement, with not large enough gains. Building a fully-concurrent GC is (AFAICT) even harder.
I don't know how long the pause times for your 500MB live heap would be. Generally, you want your heap to be about twice the size of your live data. Other than that it depends heavily on the characteristics of you heap objects. E.g., if it's mostly arrays of unboxed non-pointer data, then it'll be very quick to collect (since the GC doesn't have to do anything with the contents of these arrays). If it's mostly many small objects with pointers to other objects, GC will be very expensive and bound by the latency of your RAM. So, I suggest you run some tests with realistic heaps.
Regarding keeping up, Simon Marlow is the main person working on GHC's GC (often collaborating with others) and he keeps a list of papers on his homepage: http://research.microsoft.com/en-us/people/simonmar/
If you have further questions about GHC's GC, you can ask them on the glasgow-haskell-users@haskell.org mailing list (but please consult the GHC user's guide section on RTS options first).
HTH
On 29 July 2012 08:52, C K Kashyap
wrote: Hi, I was looking at a video that talks about GC pauses. That got me curious about the current state of GC in Haskell - say ghc 7.4.1. Would it suffer from lengthy pauses when we talk about memory in the range of 500M +? What would be a good way to keep abreast with the progress on haskell GC? Regards, Kashyap
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Push the envelope. Watch it bend.
participants (3)
-
Alexander Solla
-
C K Kashyap
-
Thomas Schilling