
Thomas Conway wrote:
On 6/18/07, apfelmus
wrote: Do you need the hash function for a hash table or for fingerprints/signatures? In the former case, Tries are a much better choice. For launching your own trie, see also
I'm actually using them for bucket addressing for external indexing with a linear hash table. (Yes, the hashing does count, because buckets are cached in memory.)
It always seems something of a shame that if you want all the benefits of lazy functional programming, you too often have to settle for O(n log n) data structures.
Trie it is, not balanced tree. A logarithm in this would be new to me. :) As a side node, Mr. Exp says: 64 is large enough for the size needs of any logarithm.
type HashTable k v = TVar (Array Int (TVar [(k,v)]))
Don't you want a TArray Int [(k,v)]? In any case, you could be able to set up an infinite trie and have lazy evaluation allocate space as needed: type Trie a = Branch (TVar a) (Trie a) (Trie a) -- an infinite tree with different TVars at every branch {-# NOINLINE tree -#} tree :: a -> Trie a tree x = Binary (unsafePerformIO $ newTVarIO x) (tree x) (tree x) The intention is that the different threads concurrently evaluate the suspension as far as they need to lookup/insert a key. The associated value can be put into the fresh TVar they find there. The tree structure itself is left untouched. Of course, it is imperative that all threads see the same TVars. I don't know how thunk updates are handled in a concurrent setting, but as they are write-once only and referentially transparent, i see no major problem with them. Regards, apfelmus PS: Hm, it's probably safer to code the lazy evaluation in the trie manually. As a side effect, you are then able to garbage collect unused but expanded parts of the trie from time to time.