
Hello Jules, Wednesday, August 27, 2008, 7:59:55 PM, you wrote:
To get "much better efficient" than a trie, the hash function has to be so fast that it is faster than following (log n) pointers
afaiu, trie search follows n pointers
No.
"n" is the number of strings in my data set (dictionary).
If I have "n" strings the average string length is asymptotically, in some sense, "log n". Of course for particular data sets it's may be more .
But "log n" is the length of the shortest unique coding, it's also the number of characters you typically need to traverse before you have reached a unique prefix, at which point your trie can short-circuit.
I appreciate that I didn't define my terminology ;) You might have a different "n".
I repeat my challenge "Prove it". I will be interested to see how much a good immutable hash outperforms Data.Map.
I would then also be interested to see how much it outperforms a decent Data.Map (such as your own AVL one) and a decent trie.
1) i don't have time/interest to do it, so i just pushed the idea 2) what you have proved there is that for set of n randomly chosen strings trie lookup need O(log n) time - the same is true for hash 3) practical performance of trie will suffer by the need to follow several trie levels each providing several elements to check. OTOH hash lookup will usually need only 1-2 checks, including one check on full string size 4) using ByteString for hash indexes would make lookups much faster -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com