RE: Profiling and Data.HashTable

14 Oct 2005

      On 14 October 2005 15:17, Ketil Malde wrote:
...
I have a program that uses hash tables to store word counts.  It can
use few, large hash tables, or many small ones.  The problem is that
it uses an inordinate amount of time in the latter case, and
profiling/-sstderr shows it is GC that is causing it (accounting for
up to 99% of the time(!))
Is there any reason to expect this behavior?
Heap profiling shows that each hash table seems to incur a memory
overhead of approx 5K, but apart from that, I'm not able to find any
leaks or unexpected space consumption.
I'm not certain that this is your problem, but hash tables are by
definition mutable objects, and mutable objects don't work too well with
generational garbage collection, GHC's in particular.  Basically every
GC, even the minor ones, will have to scan all the mutable objects.

GHC doesn't make any attempt to record which mutable objects have
actually been mutated since the last GC.  This is a compromise: it makes
writing to a mutable object faster, at the expense of GC time later if
you have lots of mutable objects.  We haven't tried the alternative
approach, but it's almost certainly a case of swings and roundabouts:
programs with lots of infrequently-mutated data would benefit, programs
with a few often-mutated objects would lose.  I guess we currently
optimise for the "few often-mutated objects" case.

Cheers,
	Simon

RE: Profiling and Data.HashTable

Simon Marlow