Re: Profiling and Data.HashTable

14 Oct 2005

      On Oct 14, 2005, at 10:17 AM, Ketil Malde wrote:
...
Hi all,
I have a program that uses hash tables to store word counts.  It can
use few, large hash tables, or many small ones.  The problem is that
it uses an inordinate amount of time in the latter case, and
profiling/-sstderr shows it is GC that is causing it (accounting for
up to 99% of the time(!))
Is there any reason to expect this behavior?
Heap profiling shows that each hash table seems to incur a memory
overhead of approx 5K, but apart from that, I'm not able to find any
leaks or unexpected space consumption.
That "5K" number made me immediately suspicious, so I took a look at  
the source code to Data.HashTable.  Sure enough, it's allocating a  
number of large IOArrays, which are filled with pointers.  The  
practical upshot is that, for a hash table with (say) 24 entries, the  
GC must scan an additional 1000 pointers and discover that each one  
is [].

I've seen other implementations of this two-level technique which use  
a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for  
less-than-gigantic hash tables.  This might be worth doing in the  
Data.HashTable implementation.

[Curious: what (if anything) is being used to test Data.HashTable?   
I'd be willing to undertake very small amounts of fiddling if I could  
be sure I wasn't slowing down something which mattered.]

-Jan-Willem Maessen
...
Suggestions?
-k
-- 
If I haven't seen further, it is by standing in the footprints of  
giants
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users