Re: [Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

2 Jul 2009

      Alex Mason wrote:
...
TernaryTrees is a package that extends Data.Set ad Data.Map with some 
ternary tree structures, based on the article 
[http://www.pcplus.co.uk/node/3074/] .
For the string (or rather ByteString) version:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie

Which has a number of other significant performance improvements (e.g. 
node fusion, ByteString instead of String) and a highly expressive 
interface. Because it uses ByteStrings it can trie any type which can be 
serialized into a vector of bits[1], albeit indirectly.

The real trick with tries is not in just having them[2], it's in having 
the right interface to make use of what they're good at. For example, if 
I have multiple tries, I'd like to merge them without doing it element 
by element[3]. Or if I know I'm going to be making a number of similar 
queries, it'd be nice if I could cache my position in the trie[4] to 
avoid repeating the work for the prefixes of all my queries[5]. Using 
tricks like these leads to significant improvements over using them like 
hashtables; tries aren't hashtables just like lists aren't arrays.

[1] Where the ByteString encodings are equal iff the values are equal. 
Ideally, the encoding should ensure that for any given set of queries, 
the discriminative bits are closest to the front. You can also use 
lossless compression algorithms (e.g. Golomb codes) to shrink the size 
of the keys if you happen to know their distribution; and if your 
queries are already compressed, you don't need to decompress them to do 
lookup!

[2] And they seem to be all the rage for people reinventing them these 
days...

[3] Supported by Data.Trie.mergeBy for the general case, with unionL and 
unionR for common use cases.

[4] Supported by Data.Trie.lookupBy

[5] In the general case, this is trie intersection. You can hack this 
together with the bytestring-trie-0.1.4 interface, but an efficient 
version is on my todo list. A special case of intersection is provided 
by Data.Trie.submap.

-- 
Live well,
~wren

Re: [Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

wren ng thornton