Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

23 Feb 2011

      Here's the approach I was attempting: a while back there was a proposal to
modify Data.Map to support a sort of zipper, including the following API:

data Location k a -- represents a map with a "hole" at a particular key

search :: k -> Map k a -> (Maybe a, Location k a)

key :: Location k a -> k

assign :: a -> Location k a -> Map k a
clear :: Location k a -> Map k a

and from here you might define

insert :: k -> a -> Map k a -> Map k a
insert k a m = case search k m of
   (_, loc) -> assign a loc

The surprising thing was that this approach *increased* the speed of the Map
implementation by something like 7%.  I don't know what happened to the
original proposal, but I was going to try something similar, with something
like

data HashMap k v = HM !Int !(Tree k v)

insert k v (HM sz tree) = case search k tree of
    (Nothing, loc) -> HM (sz + 1) (assign v loc)
    (Just _, loc) -> HM sz (assign v loc)

Louis Wasserman
wasserman.louis@gmail.com
http://profiles.google.com/wasserman.louis

On Tue, Feb 22, 2011 at 8:45 PM, Johan Tibell wrote:
...
On Tue, Feb 22, 2011 at 6:28 PM, Louis Wasserman
 wrote:
...
Apologies!
Accepted. :)
...
I was, in point of fact, working on such a patch, but I think I've been
convinced that it's a legitimate position for a package to take.  (I'm
also
curious, Johann, about the approach you figured out that didn't cost a
word
per node!)
I'm working on a patch that provides O(1) size right now. The trick is
to define HashMap as:
data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v)
data Tree k v
   = Nil
   | Tip {-# UNPACK #-} !Hash
         {-# UNPACK #-} !(FL.FullList k v)
   | Bin {-# UNPACK #-} !Prefix
         {-# UNPACK #-} !Mask
         !(Tree k v)
         !(Tree k v)
And have insert, delete, and other operations that change the size
return both the updated tree and the size delta (-1, 0, or 1 in the
case of insert/delete). The size delta is then applied to the stored
size.
At the moment I'm trying to figure out the performance impact of
making this change.
Johan