Very imperfect hash function

I'm looking for some algorithmic suggestions: I have a set of several hundred key/value pairs. The keys are 32-bit integers, and are all distinct. The values are also integers, but the number of values is small (only six in my current problem). So, obviously, several keys map to the same value. For some subsets of keys, examining only a small portion of the key's bits is enough to determine the associated value. For example, there may be 250 keys that all have the same most-significant byte, and all 250 map to the same value. There are also keys at the other extreme, where two keys that differ in only one bit position map to different values. The data are currently in a large lookup table. To save space, I'd like to convert that into a sort of hash function: hash :: key -> value My question is this: Is there any kind of generic approach that can make use of the knowledge about the internal redundancy of the keys to come up with an efficient function? Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/

On Thu, 2010-01-28 at 14:07 -0500, Steve Schafer wrote:
I'm looking for some algorithmic suggestions:
I have a set of several hundred key/value pairs. The keys are 32-bit integers, and are all distinct. The values are also integers, but the number of values is small (only six in my current problem). So, obviously, several keys map to the same value.
For some subsets of keys, examining only a small portion of the key's bits is enough to determine the associated value. For example, there may be 250 keys that all have the same most-significant byte, and all 250 map to the same value. There are also keys at the other extreme, where two keys that differ in only one bit position map to different values.
The data are currently in a large lookup table. To save space, I'd like to convert that into a sort of hash function:
hash :: key -> value
My question is this: Is there any kind of generic approach that can make use of the knowledge about the internal redundancy of the keys to come up with an efficient function?
Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/
Maybe: data TTree a = TTree Int (TTree a) (TTree a) | TNode a -- | THashNode <some hash table> hash :: TTree a -> Int32 -> a hash (TNode v) _ = v hash (TTree b l r) k = if testBit k b then hash r k else hash l k -- hash (THashNode h) k = lookupHashTable h k Of course you need to code efficiently the tree. Regards

Am Donnerstag, den 28.01.2010, 19:37 +0000 schrieb Maciej Piechotka:
On Thu, 2010-01-28 at 14:07 -0500, Steve Schafer wrote:
I'm looking for some algorithmic suggestions:
I have a set of several hundred key/value pairs. The keys are 32-bit integers, and are all distinct. The values are also integers, but the number of values is small (only six in my current problem). So, obviously, several keys map to the same value.
For some subsets of keys, examining only a small portion of the key's bits is enough to determine the associated value. For example, there may be 250 keys that all have the same most-significant byte, and all 250 map to the same value. There are also keys at the other extreme, where two keys that differ in only one bit position map to different values.
The data are currently in a large lookup table. To save space, I'd like to convert that into a sort of hash function:
hash :: key -> value
My question is this: Is there any kind of generic approach that can make use of the knowledge about the internal redundancy of the keys to come up with an efficient function?
Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/
Maybe:
data TTree a = TTree Int (TTree a) (TTree a) | TNode a -- | THashNode <some hash table>
hash :: TTree a -> Int32 -> a hash (TNode v) _ = v hash (TTree b l r) k = if testBit k b then hash r k else hash l k -- hash (THashNode h) k = lookupHashTable h k
This looks like you have re-invented Binary Decision Diagrams (BDDs). :)
Of course you need to code efficiently the tree.
When you fix the order in which the bits are tested, you can take advantage of sharing. This way you reach an efficient representation called Reduced Ordered Binary Decision Diagram (ROBDD). Unfortunately, a bad order may lead to exponential size (in the number of bits), and finding a good order can be NP-hard. Regards, Holger

On Thu, 2010-01-28 at 14:07 -0500, Steve Schafer wrote:
I'm looking for some algorithmic suggestions:
I have a set of several hundred key/value pairs. The keys are 32-bit integers, and are all distinct. The values are also integers, but the number of values is small (only six in my current problem). So, obviously, several keys map to the same value.
Instead of mapping keys to values, map keys to sets of values, where each set of values is represented by a small bit string. In your present case, one byte would be enough.
For some subsets of keys, examining only a small portion of the key's bits is enough to determine the associated value. For example, there may be 250 keys that all have the same most-significant byte, and all 250 map to the same value. There are also keys at the other extreme, where two keys that differ in only one bit position map to different values.
On today's machines, "several hundred" pairs counts as trivial. Start by using a Data.IntMap of bytes and look for something else only if that doesn't pay off. This already takes advantage of the bit-string nature of your keys, by the way.

On 28 Jan 2010, at 20:07, Steve Schafer wrote:
The data are currently in a large lookup table. To save space, I'd like to convert that into a sort of hash function:
hash :: key -> value
My question is this: Is there any kind of generic approach that can make use of the knowledge about the internal redundancy of the keys to come up with an efficient function?
There are minimal perfect hash functions; there are some libraries mentioned here, though they are not in Haskell code: http://en.wikipedia.org/wiki/Perfect_hash_function This is suitable when you do a lot of lookups with few key updates. An alternative might be Data.Map, where lookups have time complexity O(log n), n = size of map. Hans
participants (5)
-
Hans Aberg
-
Holger Siegel
-
Maciej Piechotka
-
Richard O'Keefe
-
Steve Schafer