
On Oct 14, 2005, at 10:17 AM, Ketil Malde wrote:
Hi all,
I have a program that uses hash tables to store word counts. It can use few, large hash tables, or many small ones. The problem is that it uses an inordinate amount of time in the latter case, and profiling/-sstderr shows it is GC that is causing it (accounting for up to 99% of the time(!))
Is there any reason to expect this behavior?
Heap profiling shows that each hash table seems to incur a memory overhead of approx 5K, but apart from that, I'm not able to find any leaks or unexpected space consumption.
That "5K" number made me immediately suspicious, so I took a look at the source code to Data.HashTable. Sure enough, it's allocating a number of large IOArrays, which are filled with pointers. The practical upshot is that, for a hash table with (say) 24 entries, the GC must scan an additional 1000 pointers and discover that each one is []. I've seen other implementations of this two-level technique which use a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for less-than-gigantic hash tables. This might be worth doing in the Data.HashTable implementation. [Curious: what (if anything) is being used to test Data.HashTable? I'd be willing to undertake very small amounts of fiddling if I could be sure I wasn't slowing down something which mattered.] -Jan-Willem Maessen
Suggestions?
-k -- If I haven't seen further, it is by standing in the footprints of giants
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users