
On Nov 18, 2008, at 7:03 AM, Bulat Ziganshin wrote:
Hello Tillmann,
Tuesday, November 18, 2008, 2:46:47 PM, you wrote:
Why should a Haskell hash table need more memory then a Python hash table? I've heard that Data.HashTable is bad, so maybe writing a good one could be an option.
about Data.HashTable: it uses one huge array to store all the entries. the catch is that GC need to scan entire array on every (major) GC.
Actually, the scan on every major (full) GC is unavoidable. What *can* be avoided is a scan on every *minor* GC that occurs after an update. I forget what the exact strategy is here, but I know that one write used to cause the entire array to be re-scanned; what I don't remember is when/if the array transitions back to a state where it isn't being scanned by minor GC anymore.
using array of hashtables may improve situation a lot
Yes, this would be worth trying. Understanding the current GC strategy would make it easier to make the right tradeoffs here; we expect n insertions will touch O(n) subtables, so repeated insertion will make life worse if we're not careful. -Jan-Willem Maessen
plus check GC times for every version: +RTS -Soutfile
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe