RE: Data.HashTable and duplicate keys

On 28 September 2004 12:49, George Russell wrote:
Einar wrote (snipped):
I recently noticed that Data.HashTable silently accepts duplicate keys and keeps all the values. This is very counterintuitive from the interface and makes updating a hashtable quite tedious (lookup + delete + insert).
Gosh, I'm glad *someone* told me of this bizarre behaviour. Why on earth hash tables should mimic association lists of all things is a mystery to me. (Should GHC also insert delay loops to simulate their poor performance?)
Sigh, off I go to fix my code. Just as well I only use hash tables in one place.
Making insert do delete+insert is pretty inefficient if you know that the key isn't in the table. Providing update (==delete+insert) seems a reasonable solution, no? Cheers, Simon

"Simon Marlow"
Making insert do delete+insert is pretty inefficient if you know that the key isn't in the table. Providing update (==delete+insert) seems a reasonable solution, no?
This depends whether the implementation uses buckets for elements with the same hash modulo size (like in OCaml), or stores elements directly in the array, with some conflict resolution strategy (like in Python). I don't know which scheme is more efficient in general. Buckets are easier to implement if deletion is to be supported, but a flat array may have a faster optimistic case. Their memory overhead is similar because a flat array can't be as dense as an array of buckets. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

Simon Marlow wrote (snipped):
Making insert do delete+insert is pretty inefficient if you know that the key isn't in the table. Providing update (==delete+insert) seems a reasonable solution, no?
You could provide a general function generalUpdate :: HashTable key value -> key -> (Maybe value -> IO (Maybe value,b)) -> IO b and then start from there. This would mean you could to lookup+delete+insert in just *one* operation (rather than 3). I really don't see the point of having multiple values for the same key; it's just confusing. If that's really what users want they can use a HashTable key [value] rather than a HashTable key value.
participants (4)
-
Christian Maeder
-
George Russell
-
Marcin 'Qrczak' Kowalczyk
-
Simon Marlow