
I wrote:
More interesting are Maybe, Either, lists, tuples, Data.Map, Data.Set, etc. These are far from trivial; I did some work on it for Python. If you think you can just do something fairly simple and reasonable and hope that it will have the right statistical properties, I promise you that it won't work.
Thomas Schilling wrote:
Right, murmurhash2 is a very good and well-tested hash function.
It's good for strings. So we can use it for String and Data.Text. But how about [Data.Map Data.Text (String, [(Int, Bool)])] or whatever. Do you have a hash function for that? When you have containers that can be used to build structures of arbitrary shape, writing a hash function that works in general is a hard problem. Were you thinking that you could serialize those things and then just use murmurhash on them? Try it, but my guess is it won't work well. Murmurhash is designed for strings, not for those things. It's not a cryptographic hash; it's designed to have good statistical properties for a certain very specific types of input. If you abuse it, you'll likely lose those good properties. The parameters for murmerhash were tweaked using keys that were English words, and very short strings of random bytes. There is no evidence it will work well with other kinds of keys. I'd be very happy to be proven wrong though. Regards, Yitz