
On Apr 23, 2005, at 2:54 PM, Duncan Coutts wrote:
On Sat, 2005-04-23 at 14:10 -0400, Jan-Willem Maessen wrote:
So I wouldn't worry about having your huge binary objects walked by the garbage collector. Whatever GC may do to a heap chock-full of tiny objects, a single large pointer-free object should be left alone.
Sadly the case I had in mind is exactly the former, of large syntax trees and large symbol tables. About 400Mb of seldom accessed mostly read-only and yet unpagable data.
Ah. Now that's another kettle of fish entirely... However, generational GC *ought* to help here. If you're using GHC, I assume you've turned on compacting GC to avoid doubling your memory, and have set an appropriate upper bound on the heap size.
Then to makes things worse we've got some nasty little piece of code which walks the AST and for some inexplicable reason generates vast amounts of garbage. To make things work on normal machines we have to set the heap limit as low as possible and so the garbage collector has to run very frequently reclaiming very little each time and yet it has to touch all of the rest of the 400Mb dataset which prevents it being paged out. My tests indicate that 3/4 of the running time is spent doing GC. </grumble> :-)
Hmm; this sounds like a lot of full-heap collections, which is exactly what generational GC is trying to avoid. A very large old generation (like, say, 500+Mb) might help a lot in this instance; I have no idea how GHC decides generation sizes. It might also help to set a very large allocation area to reduce promotion rate to the second generation, and give the gobs of transient data some time to die---or, similarly, to increase the number of generations to increase the time it takes things to get to the old generation. Fundamentally, though, when you run really close to your memory limits GC tends to be unhappy. -Jan-Willem Maessen
Duncan