
On 16 October 2010 10:35, Andrew Coppin
On 15/10/2010 11:50 PM, Gregory Crosswhite wrote:
On 10/15/2010 03:15 PM, Andrew Coppin wrote:
On the other hand, their implementation uses a modified Linux kernel, and no sane person is going to recompile their OS kernel with a custom patch just to run Haskell applications, so we can't do quite as well as they did. But still, and interesting read...
Ah, but you are missing an important fact about the article: it is not about improving garbage collection for Haskell, it is about improving collection for *Java*, which a language in heavy use on servers. If this performance gain really is such a big win, then I bet that it would highly motivate people to make this extension as part of the standard Linux kernel, at which point we could use it in the Haskell garbage collector.
Mmm, that's interesting. The paper talks about "Jikes", but I have no idea what that is. So it's a Java implementation then?
Jikes as a virtual machine used for research, it actually has a decent just in time compiler. Its memory management toolkit (MMTk) also makes it quite easy to experiment with new GC designs.
Also, it's news to me that Java finds heavy use anywhere yet. (Then again, if they run Java server-side, how would you tell?)
Oh, it's *very* heavily used. Many commercial products run on Java both server and client.
It seems to me that most operating systems are designed with the assumption that all the code being executed will be C or C++ with manual memory management. Ergo, however much memory the process has requested, it actually *needs* all of it. With GC, this assumption is violated. If you ask the GC nicely, it may well be able to release some memory back to you. It's just that the OS isn't designed to do this, so the GC has no idea whether it's starving the system of memory, or whether there's plenty spare.
I know the GC engine in the GHC RTS just *never* releases memory back to the OS. (I imagine that's a common choice.) It means that if the amount of truly live data fluctuates up and down, you don't spend forever allocating and freeing memory from the OS. I think we could probably do better here. (There's an [ancient] feature request ticket for it somewhere on the Traq...) At a minimum, I'm not even sure how much notice the current GC takes of memory page boundaries and cache effects...
Actually that's been fixed in GHC 7.
GC languages are not exactly rare, so maybe we'll see some OSes start adding new system calls to allow the OS to ask the application whether there's any memory it can cheaply hand back. We'll see...
I wouldn't be surprised if some OS kernels already have some undocumented features to aid VM-friendly GC. I think it's probably going to have to be the other way around, though. Not the OS should ask for its memory back, but the application should ask for the page access bits and then decide itself (as done in the paper). I don't know how that interacts with the VM paging strategy, though. Microkernels such as L4 already support these things (e.g., L4 using the UNMAP system call). Xen and co. probably have something similar. -- Push the envelope. Watch it bend.