RE: Profiling and Data.HashTable

17 Oct 2005

      On 14 October 2005 20:31, Jan-Willem Maessen wrote:
...
That "5K" number made me immediately suspicious, so I took a look at
the source code to Data.HashTable.  Sure enough, it's allocating a
number of large IOArrays, which are filled with pointers.  The
practical upshot is that, for a hash table with (say) 24 entries, the
GC must scan an additional 1000 pointers and discover that each one
is [].
I've seen other implementations of this two-level technique which use
a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for
less-than-gigantic hash tables.  This might be worth doing in the
Data.HashTable implementation.
[Curious: what (if anything) is being used to test Data.HashTable?
I'd be willing to undertake very small amounts of fiddling if I could
be sure I wasn't slowing down something which mattered.]
The few benchmarks I've tried seem to indicate that Data.HashTable
performance isn't overwhelming, and can often be bettered by Data.Map.
I've never investigated too deeply.  GC overhead seems a very plausible
cause, though.

Please by all means tweak the implementation and try some benchmarks,
I'll incorporate any performance fixes that arise.

Cheers,
	Simon

RE: Profiling and Data.HashTable

Simon Marlow