
On 6/19/07, Jan-Willem Maessen
This looks like a version of the Bob Jenkins hash function from burtleburtle.net. I implemented the 32-bit version of this as follows:
Indeed. It's the 64-bit version. 32 bits is oh-so-last-century. ;-)
mix :: Int32 -> Int32 -> Int32 -> (Int32 -> Int32 -> Int32 -> a) -> a [deletia] I mention this because your code writes the whole thing out longhand---which might be faster, or might not, but certainly misses the highest-level structural patterns in the original. Your use of a data type to represent triples is probably better nowadays than my rather quirky use of CPS (in other words, this could have been a function Triple -> Triple instead of the rather odd type you see above).
Well, the main difference, is the CPS version just folds the uses of (.) into the individual groups of arithmetic. Actually, without knowing what GHC *actually* does, it is conceivable that a compiler could do better with the CPS version, precisely because there's one less layer of abstraction to inline/fold away. I'll have to give it a go if I get a chance (this is code for my Real Job (TM), and tuning the life out of the code isn't necessary right now, but I thought I'd float this, as much because I might learn something, as anything. The thinkon flux in this list is pretty favourable).
That said, I assume you instrumented your code and determined that hash collisions are actually a bottleneck for you, and that a hash table is the right structure to begin with?
I'm implementing a species of database (trade secrets, blah, blah, blah), which needs to handle *large* data sets, and actually, an external B-tree is probably better than an external hash table. I decided to do a hash table first though to iron out some of the issues to do with concurrent external structures. A linear hash table is pretty simple compared to a B-tree. The Jenkins' hash function comes into it because you really want to avoid overfull buckets. It's also one of those cases, where you'd like the compiler to do a good job. If the compiler can't do a good job of straight line operations on [essentially] built in data types, then what hope have we of convincing anyone, including ourselves, that Haskell is fit for Real Programs (TM).
When last I checked the result was faster than Data.Map, but not by much, and using strings probably wipes out that advantage vs. tries.
<edna-e-mode-voice> No Strings, darling! No Strings. </edna-e-mode-voice> cheers, T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.