
On Mar 6, 2008, at 2:15 PM, Don Stewart wrote:
jmaessen:
On Mar 6, 2008, at 9:28 AM, Johannes Waldmann wrote:
Hello.
When I insert 50 values in an empty hashtable, I get longest chains of length 10. Like this:
import Data.HashTable ( HashTable ) import qualified Data.HashTable as H import System.Random
main = do t <- H.new ( == ) H.hashInt sequence $ replicate 50 $ do x <- randomRIO ( 0 , 1000 ) H.update t x () lc <- H.longestChain t print lc
output: [(831,()),(346,()),(773,()),(475,()),(812,()),(307,()),(113,()), (637, ()),(100,())]
The Data.HashTable code was tuned to favor relatively high bucket loads. Is that a good idea (I ask, having done some of this tuning myself)? It's been a long time since the code was re-tuned, and it's likely some of the decisions ought to change in light of changes in GHC's compiler, GC, and so forth. At the moment:
After modifying it for the shootout, I noticed its missing a lot of inline stuff (i.e. tends not to get inlined or specialised).
Agreed. I'd be curious to see which is more important in practice: picking the right bucket and loading parameters or doing better specialization. Because the hash function is wrapped by the table itself (as opposed to eg. using a Hashable class---I think HBC did this) it's a bit harder to get meaningful specialization: the table itself acts as the dictionary for the hash function, which is the thing one wants inlined. A bit of hand-rolled worker/wrapper might do wonders, though. -Jan
Needs a big overhaul.