
On 14 October 2005 15:17, Ketil Malde wrote:
I have a program that uses hash tables to store word counts. It can use few, large hash tables, or many small ones. The problem is that it uses an inordinate amount of time in the latter case, and profiling/-sstderr shows it is GC that is causing it (accounting for up to 99% of the time(!))
Is there any reason to expect this behavior?
Heap profiling shows that each hash table seems to incur a memory overhead of approx 5K, but apart from that, I'm not able to find any leaks or unexpected space consumption.
I'm not certain that this is your problem, but hash tables are by definition mutable objects, and mutable objects don't work too well with generational garbage collection, GHC's in particular. Basically every GC, even the minor ones, will have to scan all the mutable objects. GHC doesn't make any attempt to record which mutable objects have actually been mutated since the last GC. This is a compromise: it makes writing to a mutable object faster, at the expense of GC time later if you have lots of mutable objects. We haven't tried the alternative approach, but it's almost certainly a case of swings and roundabouts: programs with lots of infrequently-mutated data would benefit, programs with a few often-mutated objects would lose. I guess we currently optimise for the "few often-mutated objects" case. Cheers, Simon