Re[2]: [Haskell-cafe] Pure hashtable library

28 Aug 2008

      Hello Jules,

Wednesday, August 27, 2008, 7:59:55 PM, you wrote:
...
...
...
To get "much better efficient" than a trie, the hash function has to be
so fast that it is faster than following (log n) pointers
afaiu, trie search follows n pointers
...
No.
...
"n" is the number of strings in my data set (dictionary).
...
If I have "n" strings the average string length is asymptotically, in 
some sense, "log n". Of course for particular data sets it's may be more .
...
But "log n" is the length of the shortest unique coding, it's also the
number of characters you typically need to traverse before you have 
reached a unique prefix, at which point your trie can short-circuit.
...
I appreciate that I didn't define my terminology ;) You might have a 
different "n".
...
I repeat my challenge "Prove it". I will be interested to see how much a
good immutable hash outperforms Data.Map.
...
I would then also be interested to see how much it outperforms a decent
Data.Map (such as your own AVL one) and a decent trie.
1) i don't have time/interest to do it, so i just pushed the idea

2) what you have proved there is that for set of n randomly chosen
strings trie lookup need O(log n) time - the same is true for hash

3) practical performance of trie will suffer by the need to follow
several trie levels each providing several elements to check. OTOH
hash lookup will usually need only 1-2 checks, including one check on
full string size

4) using ByteString for hash indexes would make lookups much faster

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@gmail.com

Re[2]: [Haskell-cafe] Pure hashtable library

Bulat Ziganshin