Re: Data.HashTable

6 Mar 2008


      On Mar 6, 2008, at 2:15 PM, Don Stewart wrote:
...
jmaessen:
...
On Mar 6, 2008, at 9:28 AM, Johannes Waldmann wrote:
...
Hello.
When I insert 50 values in an empty hashtable,
I get longest chains of length 10. Like this:
import Data.HashTable ( HashTable )
import qualified Data.HashTable as H
import System.Random
main = do
 t <- H.new ( == ) H.hashInt
 sequence $ replicate 50 $ do
     x <- randomRIO ( 0 , 1000 )
     H.update t x ()
 lc <- H.longestChain t
 print lc
output:
[(831,()),(346,()),(773,()),(475,()),(812,()),(307,()),(113,()), 
(637,
()),(100,())]
The Data.HashTable code was tuned to favor relatively high bucket
loads.  Is that a good idea (I ask, having done some of this tuning
myself)?  It's been a long time since the code was re-tuned, and it's
likely some of the decisions ought to change in light of changes in
GHC's compiler, GC, and so forth.  At the moment:
After modifying it for the shootout, I noticed its missing a lot of
inline stuff (i.e. tends not to get inlined or specialised).
Agreed.  I'd be curious to see which is more important in practice:  
picking the right bucket and loading parameters or doing better  
specialization.  Because the hash function is wrapped by the table  
itself (as opposed to eg. using a Hashable class---I think HBC did  
this) it's a bit harder to get meaningful specialization: the table  
itself acts as the dictionary for the hash function, which is the  
thing one wants inlined.  A bit of hand-rolled worker/wrapper might do  
wonders, though.

-Jan
...
Needs a big overhaul.

Re: Data.HashTable

Jan-Willem Maessen