An interesting paper on VM-friendly GC

Somebody showed me this the other day, and I thought it was interesting: http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf Basically, "we designed a garbage collector which tries to avoid touching memory pages that have been swapped out to disk just because we need to do a GC sweep". Which is a pretty obvious thing to do, when you think about it, and has several obvious performance implications. Maybe we should think about how GHC handles this? On the other hand, their implementation uses a modified Linux kernel, and no sane person is going to recompile their OS kernel with a custom patch just to run Haskell applications, so we can't do quite as well as they did. But still, and interesting read...

On 10/15/2010 03:15 PM, Andrew Coppin wrote:
On the other hand, their implementation uses a modified Linux kernel, and no sane person is going to recompile their OS kernel with a custom patch just to run Haskell applications, so we can't do quite as well as they did. But still, and interesting read...
Ah, but you are missing an important fact about the article: it is not about improving garbage collection for Haskell, it is about improving collection for *Java*, which a language in heavy use on servers. If this performance gain really is such a big win, then I bet that it would highly motivate people to make this extension as part of the standard Linux kernel, at which point we could use it in the Haskell garbage collector. Cheers, Greg

On 15/10/2010 11:50 PM, Gregory Crosswhite wrote:
On 10/15/2010 03:15 PM, Andrew Coppin wrote:
On the other hand, their implementation uses a modified Linux kernel, and no sane person is going to recompile their OS kernel with a custom patch just to run Haskell applications, so we can't do quite as well as they did. But still, and interesting read...
Ah, but you are missing an important fact about the article: it is not about improving garbage collection for Haskell, it is about improving collection for *Java*, which a language in heavy use on servers. If this performance gain really is such a big win, then I bet that it would highly motivate people to make this extension as part of the standard Linux kernel, at which point we could use it in the Haskell garbage collector.
Mmm, that's interesting. The paper talks about "Jikes", but I have no idea what that is. So it's a Java implementation then? Also, it's news to me that Java finds heavy use anywhere yet. (Then again, if they run Java server-side, how would you tell?) It seems to me that most operating systems are designed with the assumption that all the code being executed will be C or C++ with manual memory management. Ergo, however much memory the process has requested, it actually *needs* all of it. With GC, this assumption is violated. If you ask the GC nicely, it may well be able to release some memory back to you. It's just that the OS isn't designed to do this, so the GC has no idea whether it's starving the system of memory, or whether there's plenty spare. I know the GC engine in the GHC RTS just *never* releases memory back to the OS. (I imagine that's a common choice.) It means that if the amount of truly live data fluctuates up and down, you don't spend forever allocating and freeing memory from the OS. I think we could probably do better here. (There's an [ancient] feature request ticket for it somewhere on the Traq...) At a minimum, I'm not even sure how much notice the current GC takes of memory page boundaries and cache effects... GC languages are not exactly rare, so maybe we'll see some OSes start adding new system calls to allow the OS to ask the application whether there's any memory it can cheaply hand back. We'll see...

On 16 October 2010 10:35, Andrew Coppin
On 15/10/2010 11:50 PM, Gregory Crosswhite wrote:
On 10/15/2010 03:15 PM, Andrew Coppin wrote:
On the other hand, their implementation uses a modified Linux kernel, and no sane person is going to recompile their OS kernel with a custom patch just to run Haskell applications, so we can't do quite as well as they did. But still, and interesting read...
Ah, but you are missing an important fact about the article: it is not about improving garbage collection for Haskell, it is about improving collection for *Java*, which a language in heavy use on servers. If this performance gain really is such a big win, then I bet that it would highly motivate people to make this extension as part of the standard Linux kernel, at which point we could use it in the Haskell garbage collector.
Mmm, that's interesting. The paper talks about "Jikes", but I have no idea what that is. So it's a Java implementation then?
Jikes as a virtual machine used for research, it actually has a decent just in time compiler. Its memory management toolkit (MMTk) also makes it quite easy to experiment with new GC designs.
Also, it's news to me that Java finds heavy use anywhere yet. (Then again, if they run Java server-side, how would you tell?)
Oh, it's *very* heavily used. Many commercial products run on Java both server and client.
It seems to me that most operating systems are designed with the assumption that all the code being executed will be C or C++ with manual memory management. Ergo, however much memory the process has requested, it actually *needs* all of it. With GC, this assumption is violated. If you ask the GC nicely, it may well be able to release some memory back to you. It's just that the OS isn't designed to do this, so the GC has no idea whether it's starving the system of memory, or whether there's plenty spare.
I know the GC engine in the GHC RTS just *never* releases memory back to the OS. (I imagine that's a common choice.) It means that if the amount of truly live data fluctuates up and down, you don't spend forever allocating and freeing memory from the OS. I think we could probably do better here. (There's an [ancient] feature request ticket for it somewhere on the Traq...) At a minimum, I'm not even sure how much notice the current GC takes of memory page boundaries and cache effects...
Actually that's been fixed in GHC 7.
GC languages are not exactly rare, so maybe we'll see some OSes start adding new system calls to allow the OS to ask the application whether there's any memory it can cheaply hand back. We'll see...
I wouldn't be surprised if some OS kernels already have some undocumented features to aid VM-friendly GC. I think it's probably going to have to be the other way around, though. Not the OS should ask for its memory back, but the application should ask for the page access bits and then decide itself (as done in the paper). I don't know how that interacts with the VM paging strategy, though. Microkernels such as L4 already support these things (e.g., L4 using the UNMAP system call). Xen and co. probably have something similar. -- Push the envelope. Watch it bend.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/16/10 05:35 , Andrew Coppin wrote:
GC languages are not exactly rare, so maybe we'll see some OSes start adding new system calls to allow the OS to ask the application whether there's any memory it can cheaply hand back. We'll see...
I thought Windows already had a system message for something like that. Or at least it used to, although I can see why it would have been removed or at least deprecated. Unix could do it with a signal, but in general the application can't easily do that at times chosen by an external entity (consider that the act of finding such memory could inadvertently *increase* memory pressure on the system, since an application can't tell which of its pages aren't in core) The correct solution is to give the application the tools necessary for it to do its own memory management --- which is what the paper is about. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAky5yLEACgkQIn7hlCsL25UU/ACfXc8mmUeR2oIJMKGYSwd61JvM qC0AoJ7BrEf0+ApE+Ohr4BnyqfqBCQ4q =VBBc -----END PGP SIGNATURE-----

On Sat, Oct 16, 2010 at 8:45 AM, Brandon S Allbery KF8NH
I thought Windows already had a system message for something like that. Or at least it used to, although I can see why it would have been removed or at least deprecated.
You're probably thinking of CreateMemoryResourceNotification [1], available since Windows XP. If they were to deprecate it (doubtful) it typically takes two major releases to do so. -n [1] http://msdn.microsoft.com/en-us/library/aa366541(VS.85).aspx

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/18/10 01:18 , Nathan Howell wrote:
On Sat, Oct 16, 2010 at 8:45 AM, Brandon S Allbery KF8NH
wrote: I thought Windows already had a system message for something like that. Or at least it used to, although I can see why it would have been removed or at least deprecated.
You're probably thinking of CreateMemoryResourceNotification [1], available since Windows XP. If they were to deprecate it (doubtful) it typically takes two major releases to do so.
No, the one I'm thinking of was around (and buggy enough to get some press as the target of a service pack) for NT4. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAky8f0UACgkQIn7hlCsL25XXPwCgvsUYMa6EH/Ryp9WGhyZa3z/g XAcAn0Vdn6WZu7I4VoA1VVCKcLdkb913 =DTNY -----END PGP SIGNATURE-----

On 16/10/2010, at 10:35 PM, Andrew Coppin wrote:
Also, it's news to me that Java finds heavy use anywhere yet. (Then again, if they run Java server-side, how would you tell?)
Java seems to be used heavily for developing mobile phone applications (other than iPhone, of course). My young daughters have el-cheapo phones (< NZD 100), but even they run Java. Android even has its own Java VM called Dalvik, which doesn't use Sun's byte codes. There are alleged to be nine million Java developers (whatever that means, don't ask _me_, ask Oracle).
It seems to me that most operating systems are designed with the assumption that all the code being executed will be C or C++ with manual memory management.
You have to use Objective C for the iPhone. Recent versions of Objective C support garbage collection. For that matter, Sun shipped a conservative garbage collector with Solaris for years; look for 'libgc'.
participants (6)
-
Andrew Coppin
-
Brandon S Allbery KF8NH
-
Gregory Crosswhite
-
Nathan Howell
-
Richard O'Keefe
-
Thomas Schilling