
On 14 October 2005 20:31, Jan-Willem Maessen wrote:
That "5K" number made me immediately suspicious, so I took a look at the source code to Data.HashTable. Sure enough, it's allocating a number of large IOArrays, which are filled with pointers. The practical upshot is that, for a hash table with (say) 24 entries, the GC must scan an additional 1000 pointers and discover that each one is [].
I've seen other implementations of this two-level technique which use a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for less-than-gigantic hash tables. This might be worth doing in the Data.HashTable implementation.
[Curious: what (if anything) is being used to test Data.HashTable? I'd be willing to undertake very small amounts of fiddling if I could be sure I wasn't slowing down something which mattered.]
The few benchmarks I've tried seem to indicate that Data.HashTable performance isn't overwhelming, and can often be bettered by Data.Map. I've never investigated too deeply. GC overhead seems a very plausible cause, though. Please by all means tweak the implementation and try some benchmarks, I'll incorporate any performance fixes that arise. Cheers, Simon