
On Tue, Feb 21, 2012 at 3:58 PM, Ben Gamari
On Tue, 21 Feb 2012 11:25:05 -0800, Johan Tibell
wrote: On Tue, Feb 21, 2012 at 11:20 AM, Ben Gamari
wrote: This is certainly an option. It's actually been suggested that I use HashMaps instead of IntMaps to get more uniform use of the key-space. I have been a little worried that the cost of hashing would outweigh the benefit of using benefits of more shallow trees, but I suppose memory accesses are expensive. That being said, I could certainly just use the Enum instance.
In the case of Ints and newtypes thereof hashing is very cheap. A no-op or a multiplication with a large prime.
Sure. In my application (machine learning) we were mapping from tokens (short strings, generally less than 15 bytes) to sequential newtype'd Ints (newtype Token = Token Int). However, the fact that these identifiers were sequential meant that IntMap's trees would be close to the worst-case log(N) depth.
Either way, the tries only contain levels on which keys differ, so tree depths shouldn't vary by more than 1 or so. HashMap uses lsb-first splitting in its current incarnation, so that you end up balancing contiguous keys across the trie. This is the main argument in favor of low-order-first splitting. The main argument against is that high-order-first splitting yields a standard binary search tree for lookup operations if you encode the pivots appropriately. I don't recall which method is used by Johan's wide-fanout tries. I seem to recall that at wider fanouts you're happier (for GC reasons) churning the right spine in response to contiguous insertions, rather than churning the entire tree. The takeaway: if you're seeing performance problems from contiguous insertion, it might be worth comparing with HashMap. -Jan-Willem Maessen