
On Mar 6, 2008, at 9:28 AM, Johannes Waldmann wrote:
Hello.
When I insert 50 values in an empty hashtable, I get longest chains of length 10. Like this:
import Data.HashTable ( HashTable ) import qualified Data.HashTable as H import System.Random
main = do t <- H.new ( == ) H.hashInt sequence $ replicate 50 $ do x <- randomRIO ( 0 , 1000 ) H.update t x () lc <- H.longestChain t print lc
output: [(831,()),(346,()),(773,()),(475,()),(812,()),(307,()),(113,()),(637, ()),(100,())]
The Data.HashTable code was tuned to favor relatively high bucket loads. Is that a good idea (I ask, having done some of this tuning myself)? It's been a long time since the code was re-tuned, and it's likely some of the decisions ought to change in light of changes in GHC's compiler, GC, and so forth. At the moment: -- ----------------------------------------------------------------------------- -- Parameters tABLE_MAX :: Int32 tABLE_MAX = 32 * 1024 * 1024 -- Maximum size of hash table tABLE_MIN :: Int32 tABLE_MIN = 8 hLOAD :: Int32 hLOAD = 7 -- Maximum average load of a single hash bucket hYSTERESIS :: Int32 hYSTERESIS = 64 -- entries to ignore in load computation {- Hysteresis favors long association-list-like behavior for small tables. -} It's likely modern versions of GHC would fare better with smaller settings of tABLE_MIN, hLOAD, and hYSTERESIS. I seem to recall that doubling the table used to be rather hard on the GC. -Jan
This is not what I expected (quite the contrary). Did I miss something here? The access time is linear in the length of the longest chain? Perhaps the initial table (produced by "new") is too small, so the table is always overloaded?
Looking at http://java.sun.com/javase/6/docs/api/java/util/HashMap.html they start out with table size 16 by default *and* (more importantly) they have a constructor that allows to set the initial table size and the load factor.
Best regards, Johannes Waldmann. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries