Re: [Haskell-cafe] Boxed Mutable Arrays

16 Dec 2009

      On Wed, Dec 16, 2009 at 2:42 PM, Richard Kelsall
wrote:
...
Brad Larsen wrote:
...
I have considered using Data.IntMap to implement a sort of faux hash
table, e.g., introduce a Hashable class, and then use an IntMap to
lists of Hashable.  The result would be a pure, persistent ``hash
table''.  In such a case, however, lookup/insert/delete operations
would have average complexity logarithmic in the number of buckets.
I'll throw in an idea that has been nagging me about hash tables.
You could have a list of hash tables of increasing size. On an insert
collision in a smaller table all the entries get put into the next one
up. For a lookup you would have to check all the tables until you find
the hash.
e.g.
With three hash table in the list of these example sizes.
Table size 2^2 = 2 probable entries before collision.
          2^4 = 4
          2^6 = 8
h..h
...h.....h.h...h
h......h.......h....h...........h......h.h..........h...........
I expect if it's any good it has already been done.
Your asymptotics will take a hit if the height of the tower is logarithmic
in the array size, and you'll take a constant hit for bounded-height towers.

Consider that if you have density such that the biggest array has an
expected population, then your smaller arrays will be relatively drastically
over populated, so you'll bump up to the next largest array in more cases,
paying extra seeks against disproportionally heavily loaded buckets, whereas
the time could have been more gainfully employed seeking against the more
uniformly distributed larger bucket.

If you have a decent hash function, you won't have lumps in your
distribution, so having lumps in your bucket densities is purely a
pessimization. Especially when you've ensured that those buckets that are
going to be overpopulated are exactly the ones you are checking first.

You might consider looking at Witold Litwin's Sorted Linear Hash Table. I
have a hackish implementation on hackage somewhere, but bear in mind it was
the first piece of Haskell I'd ever written and the STM version I have
bottlenecks on the root of the tree, so would be better served by multiple
root TVars.

I can't find it on Hackage, but here it is:

http://comonad.com/haskell/thash/

-Edward Kmett