Re: enummapset merge into containers

21 Feb 2012

      On Tue, Feb 21, 2012 at 3:58 PM, Ben Gamari  wrote:
...
On Tue, 21 Feb 2012 11:25:05 -0800, Johan Tibell 
wrote:
...
On Tue, Feb 21, 2012 at 11:20 AM, Ben Gamari 
wrote:
...
This is certainly an option. It's actually been suggested that I use
HashMaps instead of IntMaps to get more uniform use of the
key-space. I have been a little worried that the cost of hashing would
outweigh the benefit of using benefits of more shallow trees, but I
suppose memory accesses are expensive. That being said, I could
certainly just use the Enum instance.
In the case of Ints and newtypes thereof hashing is very cheap. A
no-op or a multiplication with a large prime.
Sure. In my application (machine learning) we were mapping from tokens
(short strings, generally less than 15 bytes) to sequential newtype'd
Ints (newtype Token = Token Int). However, the fact that these
identifiers were sequential meant that IntMap's trees would be close to
the worst-case log(N) depth.
Either way, the tries only contain levels on which keys differ, so tree
depths shouldn't vary by more than 1 or so.

HashMap uses lsb-first splitting in its current incarnation, so that you
end up balancing contiguous keys across the trie.  This is the main
argument in favor of low-order-first splitting.  The main argument against
is that high-order-first splitting yields a standard binary search tree for
lookup operations if you encode the pivots appropriately.

I don't recall which method is used by Johan's wide-fanout tries.  I seem
to recall that at wider fanouts you're happier (for GC reasons) churning
the right spine in response to contiguous insertions, rather than churning
the entire tree.

The takeaway: if you're seeing performance problems from contiguous
insertion, it might be worth comparing with HashMap.

-Jan-Willem Maessen