
On Jan 4, 2006, at 5:30 AM, Simon Marlow wrote:
On 30 December 2005 01:23, Jan-Willem Maessen wrote:
Probably. The minimum table chunk size was rather large. I have been experimenting (tests are running even as I type) with alternate implementations of Data.HashTable. So far the winning implementation is one based on multiplicative hashing and simple table doubling. This seems to be consistently competitive with / faster than Data.Map. At this point my inclination is to make the whole suite available:
Data.HashTable.Class Data.HashTable.DataMap Data.HashTable.Doubling Data.HashTable.Flat Data.HashTable.Freeze Data.HashTable.Modulus Data.HashTable.Multiplicative Data.HashTable.Original Data.HashTable.Test Data.HashTable
I've separated out the hashing technique (Modulus, Multiplicative) from the hash table implementation. Note that a few of the above are bogus, and this is only a fraction of the implementations tried thus far.
I need to get distribution clearance for a bunch of this code from my employer, and figure out how to package it. The latter may be tricky, as Data.Hashtable is currently rather intimately tied to a bunch of rather tricky bits of GHC.
I wonder if you could put together a drop-in replacement for Data.HashTable that we can incorporate? There's not much point in us still providing the inefficient version as standard after you've done all this work to figure out how to do it better.
That'd be good---though what qualifies as the "efficient version" will, I think, change based on the GC changes you said you made (I haven't had the time/patience to try to bootstrap the development version of GHC). This is one of the reasons I've been saving so many bread crumbs along the way. All of the above will work as drop-in Data.HashTable replacements, with Data.HashTable simply re-exporting multiplicative hashing and table doubling. The tricky bit, as you probably know, is the rather intimate ties between Data.HashTable and the rest of the builtin code. This requires the shipping Data.HashTable library to import most things from the source, rather than from the place where they're made publicly available. Do you have any suggestions as to how I might go about testing that integration? -Jan
Cheers, Simon