Re: [Haskell-cafe] Pure hashtable library

Someone wrote:
trie: O(len)*O(width) hashed trie: O(len) hash: O(len)
If "width" here refers to the branching factor of the trie, it's actually O(len.lg(width)), and the width that matters is not the *possible* number of choices but the *actual* number. The great problem with hash tables is devising good hash functions. There is a vast literature about hash tables, but there is very little about how to design good hash functions for anything other than numbers and maybe strings. It would be nice to think that _had__ there been plenty of good advice about constructing good hash functions the Java library implementors would have taken it. As it is, their hashing functions for collections leave much to be desired.

I would much rather have a pure Trie that is foldable. If we have a Trie, we get a space efficient sorted list, too. -- _jsn

Jason Dusek wrote:
I would much rather have a pure Trie that is foldable. If we have a Trie, we get a space efficient sorted list, too.
Well, Data.Sequence can be used as a space efficient sorted list which is Foldable - if you make the decision to insert elements into it in a sorted way, obviously. It's a fingertree not a trie, of course. What advantages would a Trie have over Data.Sequence? Jules

Jules Bean
Jason Dusek wrote:
I would much rather have a pure Trie that is foldable. If we have a Trie, we get a space efficient sorted list, too.
Well, Data.Sequence can be used as a space efficient sorted list which is Foldable - if you make the decision to insert elements into it in a sorted way, obviously.
What advantages would a Trie have over Data.Sequence?
A trie is guaranteed sorted -- so using a trie amounts to a "type level" guarantee for binary search and any other algorithm that relies on sortedness. -- _jsn

Jason Dusek wrote:
Jules Bean
wrote: I would much rather have a pure Trie that is foldable. If we have a Trie, we get a space efficient sorted list, too. Well, Data.Sequence can be used as a space efficient sorted
Jason Dusek wrote: list which is Foldable - if you make the decision to insert elements into it in a sorted way, obviously.
What advantages would a Trie have over Data.Sequence?
A trie is guaranteed sorted -- so using a trie amounts to a "type level" guarantee for binary search and any other algorithm that relies on sortedness.
...No more so than a simple wrapper over a Data.Sequence which puts the elements in the right place. Insert for Data.Sequence is log(i) where i is the position of the insertion; clearly bounded by log(n). toList is O(n) and index is (at worst) log(i). I think the corresponding operations with tries are log(n), so asymptotically tries are identical for 'uniformly distributed' keys although fingertrees are faster if there is a bias towards elements near the ends of the lists. So, it's all in the constants, isn't it? Jules

Jules Bean
Jason Dusek wrote:
Jules Bean
wrote: Jason Dusek wrote:
I would much rather have a pure Trie that is foldable. If we have a Trie, we get a space efficient sorted list, too.
Well, Data.Sequence can be used as a space efficient sorted list which is Foldable - if you make the decision to insert elements into it in a sorted way, obviously.
What advantages would a Trie have over Data.Sequence?
A trie is guaranteed sorted -- so using a trie amounts to a "type level" guarantee for binary search and any other algorithm that relies on sortedness.
...No more so than a simple wrapper over a Data.Sequence which puts the elements in the right place.
If by a wrapper you mean a function, well that's not type level at all. A binary search that insists on tries will, with a correct trie implementation, behave correctly on every input; a binary search that works with `Data.Sequence` will fail on a substantial portion of the inputs accepted by the type system, regardless of the data structure's correctness. -- _jsn

Jason Dusek wrote:
Jules Bean
wrote: Jules Bean
wrote: Jason Dusek wrote:
I would much rather have a pure Trie that is foldable. If we have a Trie, we get a space efficient sorted list, too. Well, Data.Sequence can be used as a space efficient sorted list which is Foldable - if you make the decision to insert elements into it in a sorted way, obviously.
What advantages would a Trie have over Data.Sequence? A trie is guaranteed sorted -- so using a trie amounts to a "type level" guarantee for binary search and any other algorithm that relies on sortedness. ...No more so than a simple wrapper over a Data.Sequence which
Jason Dusek wrote: puts the elements in the right place.
If by a wrapper you mean a function, well that's not type level at all. A binary search that insists on tries will, with a correct trie implementation, behave correctly on every input; a binary search that works with `Data.Sequence` will fail on a substantial portion of the inputs accepted by the type system, regardless of the data structure's correctness.
No. I meant a newtype with a non-exported constructor and exporting only methods which preserve sortedness. Jules

Hello Richard, Thursday, August 28, 2008, 5:28:46 AM, you wrote:
trie: O(len)*O(width) hashed trie: O(len) hash: O(len)
If "width" here refers to the branching factor of the trie, it's actually O(len.lg(width)), and the width that matters is not the *possible* number of choices but the *actual* number.
i thought about using list to hold all variations at the trie node. with a (balanced) tree at each node we will have even more overheads
The great problem with hash tables is devising good hash functions. There is a vast literature about hash tables, but there is very little about how to design good hash functions for anything other than numbers and maybe strings.
1. tries also work only for strings and other lists 2. i don't want to go into discussing well-known pluses and minuses of data structures. my point was just that we have great alternative to trees/tries which should be implemented many years ago. i've used a lots of assoclists in my program, sometimes this really degraded performance (although it's yet another question - why tree/trie structures doesn't provide simple List.lookup replacement function and why i'm so lazy to still not learning their APIs) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Richard,
Thursday, August 28, 2008, 5:28:46 AM, you wrote:
trie: O(len)*O(width) hashed trie: O(len) hash: O(len)
If "width" here refers to the branching factor of the trie, it's actually O(len.lg(width)), and the width that matters is not the *possible* number of choices but the *actual* number.
i thought about using list to hold all variations at the trie node. with a (balanced) tree at each node we will have even more overheads
The great problem with hash tables is devising good hash functions. There is a vast literature about hash tables, but there is very little about how to design good hash functions for anything other than numbers and maybe strings.
1. tries also work only for strings and other lists 2. i don't want to go into discussing well-known pluses and minuses of data structures. my point was just that we have great alternative to trees/tries which should be implemented many years ago. i've used a lots of assoclists in my program, sometimes this really degraded performance (although it's yet another question - why tree/trie structures doesn't provide simple List.lookup replacement function and why i'm so lazy to still not learning their APIs)
Seems like you can make a pure hashtable by unsafePerformIO on the impure one, and exporting only 'lookup'.. -- Don

Hello Don, Thursday, August 28, 2008, 10:32:43 AM, you wrote:
Seems like you can make a pure hashtable by unsafePerformIO on the impure one, and exporting only 'lookup'..
probably yes, but it will lose a bit of performance due to incremental building of hashtable. actually, writing HT module from scratch is very easy - all we need is a prime searching function (in order to establish array size). everything else: data HT a b = HT (a->Int) (Array Int [(a,b)]) create size hash list = HT hashfunc (accumArray (flip (:)) [] (0, arrsize-1) (map (\(a,b) -> (hashfunc a,b)) list) ) where arrsize = head$ filter (>size)$ iterate (\x->3*x+1) 1 hashfunc a = hash a `mod` arrsize lookup a (HT hash arr) = List.lookup a (arr!hash a) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (5)
-
Bulat Ziganshin
-
Don Stewart
-
Jason Dusek
-
Jules Bean
-
Richard A. O'Keefe