
On Sat, 2005-04-23 at 14:10 -0400, Jan-Willem Maessen wrote:
On Apr 22, 2005, at 5:33 PM, Duncan Coutts wrote:
Though arn't there some issues with the fact that regular garbage colection touches most of the heap (even if it doesn't modify it) and so very little of it can be paged out of physical ram.
This is a common misconception about garbage collection in general.
There are only two reasons for a garbage collector to walk through a given piece of memory: * The memory is live, and may contain pointers; those pointers must be found and traced. * A copying/compacting collector needs to move the data.
Most collectors keep a special large object area which contains big arrays. Even if copying collection is used for other objects, these large objects never move.
Yes, indeed.
Furthermore, if an array contains no pointers (because, for example, it's a byte array read from a file) it does not need to be scanned by the garbage collector.
Like these unboxed array types.
So I wouldn't worry about having your huge binary objects walked by the garbage collector. Whatever GC may do to a heap chock-full of tiny objects, a single large pointer-free object should be left alone.
Sadly the case I had in mind is exactly the former, of large syntax trees and large symbol tables. About 400Mb of seldom accessed mostly read-only and yet unpagable data. Then to makes things worse we've got some nasty little piece of code which walks the AST and for some inexplicable reason generates vast amounts of garbage. To make things work on normal machines we have to set the heap limit as low as possible and so the garbage collector has to run very frequently reclaiming very little each time and yet it has to touch all of the rest of the 400Mb dataset which prevents it being paged out. My tests indicate that 3/4 of the running time is spent doing GC. </grumble> :-) Duncan