Re: [Haskell-cafe] Very imperfect hash function

John Lato

29 Jan 2010 29 Jan '10

11:52 a.m.

From: Hans Aberg

On 28 Jan 2010, at 20:07, Steve Schafer wrote:

...
The data are currently in a large lookup table. To save space, I'd like to convert that into a sort of hash function:

hash :: key -> value

My question is this: Is there any kind of generic approach that can make use of the knowledge about the internal redundancy of the keys to come up with an efficient function?

There are minimal perfect hash functions; there are some libraries mentioned here, though they are not in Haskell code: http://en.wikipedia.org/wiki/Perfect_hash_function

This is suitable when you do a lot of lookups with few key updates. An alternative might be Data.Map, where lookups have time complexity O(log n), n = size of map.

The minimal perfect hash function looks like the real algorithmic solution, but I would just use Data.IntMap for this. Cheers, John

Show replies by date

Hans Aberg

29 Jan 29 Jan

12:46 p.m.

New subject: Very imperfect hash function

On 29 Jan 2010, at 12:52, John Lato wrote:

...

...
There are minimal perfect hash functions; there are some libraries mentioned here, though they are not in Haskell code: http://en.wikipedia.org/wiki/Perfect_hash_function

This is suitable when you do a lot of lookups with few key updates. An alternative might be Data.Map, where lookups have time complexity O(log n), n = size of map.

The minimal perfect hash function looks like the real algorithmic solution, but I would just use Data.IntMap for this.

That looks interesting too. Yet another idea: use arrays http://haskell.org/haskellwiki/Arrays Then build a hash table, say just taking mod k n, and have values in some lookup map. If n > set of keys, average time complexity is O(1), and arrays should be very fast. Hans

John Lato

2:57 p.m.

New subject: Very imperfect hash function

On Fri, Jan 29, 2010 at 12:46 PM, Hans Aberg wrote:

...

On 29 Jan 2010, at 12:52, John Lato wrote:

...
...
There are minimal perfect hash functions; there are some libraries mentioned here, though they are not in Haskell code: http://en.wikipedia.org/wiki/Perfect_hash_function

This is suitable when you do a lot of lookups with few key updates. An alternative might be Data.Map, where lookups have time complexity O(log n), n = size of map.

The minimal perfect hash function looks like the real algorithmic solution, but I would just use Data.IntMap for this.

That looks interesting too. Yet another idea: use arrays http://haskell.org/haskellwiki/Arrays Then build a hash table, say just taking mod k n, and have values in some lookup map. If n > set of keys, average time complexity is O(1), and arrays should be very fast.

I just want to be sure I understand this. For this plan, you aren't intending to use a perfect hash function at all, correct? Are you basically just suggesting to stick everything in an array with the key as an index? Or are you suggesting an actual hash table? If it's the latter, I'm not certain where the array fits into the picture. I'm pretty sure I'm missing something here. In either case, I don't think this would be as good as what I thought was your original suggestion, i.e. using a minimal perfect hash function mphf that hashes keys to a value 0..v-1. With v=number of values, valArr = array of all values, then lookup k = valArr ! mphf k Cheers, John

Hans Aberg

3:26 p.m.

New subject: Very imperfect hash function

On 29 Jan 2010, at 15:57, John Lato wrote:

...

...
That looks interesting too. Yet another idea: use arrays http://haskell.org/haskellwiki/Arrays Then build a hash table, say just taking mod k n, and have values in some lookup map. If n > set of keys, average time complexity is O(1), and arrays should be very fast.

I just want to be sure I understand this. For this plan, you aren't intending to use a perfect hash function at all, correct?

Yes, this is another idea. In a hash table, it is not important to have different indices, only that that table entries are as flat as possible.

...

Are you basically just suggesting to stick everything in an array with the key as an index?

You still need to fold the key values onto some interval.

...

Or are you suggesting an actual hash table?

The hash function folds the keys onto an interval. Since you have Int values k, you might just use a mod k n function for that.

...

If it's the latter, I'm not certain where the array fits into the picture. I'm pretty sure I'm missing something here.

There is a module Data.HashTable. The array is just to make lookup fast. Like in: ST s (STArray s HashKey (Map Key Value))

...

In either case, I don't think this would be as good as what I thought was your original suggestion, i.e. using a minimal perfect hash function mphf that hashes keys to a value 0..v-1. With v=number of values, valArr = array of all values, then

lookup k = valArr ! mphf k

Right, but you may have to avoid implementing the perfect hash function by yourself, if it is only available in C. :-) There is a FFI, though. Hans

John Lato

31 Jan 31 Jan

7:07 p.m.

New subject: Very imperfect hash function

On Fri, Jan 29, 2010 at 9:26 AM, Hans Aberg wrote:

...

On 29 Jan 2010, at 15:57, John Lato wrote:

...
Are you basically just suggesting to stick everything in an array with the key as an index?

You still need to fold the key values onto some interval.

Not with a suitably large array ;-)

...

...
Or are you suggesting an actual hash table?

The hash function folds the keys onto an interval. Since you have Int values k, you might just use a mod k n function for that.

...
If it's the latter, I'm not certain where the array fits into the picture. I'm pretty sure I'm missing something here.

There is a module Data.HashTable. The array is just to make lookup fast. Like in: ST s (STArray s HashKey (Map Key Value))

I was misunderstanding your intention here. This is an interesting approach, but I would still try an IntMap first. Cheers, John

Hans Aberg

8:04 p.m.

New subject: Very imperfect hash function

On 31 Jan 2010, at 20:07, John Lato wrote:

...

...
...
Or are you suggesting an actual hash table?

The hash function folds the keys onto an interval. Since you have Int values k, you might just use a mod k n function for that.

...
If it's the latter, I'm not certain where the array fits into the picture. I'm pretty sure I'm missing something here.

There is a module Data.HashTable. The array is just to make lookup fast. Like in: ST s (STArray s HashKey (Map Key Value))

I was misunderstanding your intention here. This is an interesting approach, but I would still try an IntMap first.

A simple hash-function for strings is to simply exclusive-or the bytes and then reduce modulo a prime number, which will be the table size. The average lookup time complexity is the one of the hash table. You can then compare with the IntMap, which may have time complexity O(4), the number of bytes in an Int32. There might be a big difference in the constant factor, though: an array seems to be much faster. I first made multivariate polynomial division using lists, because it is easy to work with, and an example of some stuff built on top took tens of seconds to run in Hugs. When reworking it using STArray (mutable) and Array (immutable), it has no noticeable delay. Though I am using Map and Set, too, the choice of container seems otherwise be dictated by what is most convenient to program. Hans

Richard O'Keefe

2 Feb 2 Feb

2:05 a.m.

New subject: Very imperfect hash function

On Feb 1, 2010, at 9:04 AM, Hans Aberg wrote:

...

A simple hash-function for strings is to simply exclusive-or the bytes and then reduce modulo a prime number,

Simply exclusive-oring the bytes will give you at most 256 distinct results. (For an ASCII source, 128 distinct results.) After that, there hardly seems to be any point in reduction modulo a prime. This approach can't tell a CAT from an ACT or a DOG from a GOD, which is another strike against it. (It also can't tell a TITTLE from a TILE, or a BOTTLE from a BOLE, for obvious reasons.)

Hans Aberg

8:50 a.m.

New subject: Very imperfect hash function

On 2 Feb 2010, at 03:05, Richard O'Keefe wrote:

...

...
A simple hash-function for strings is to simply exclusive-or the bytes and then reduce modulo a prime number,

Simply exclusive-oring the bytes will give you at most 256 distinct results. (For an ASCII source, 128 distinct results.) After that, there hardly seems to be any point in reduction modulo a prime.

Right - I just gave an example of how simple hash functions may be created. The original question deals with Int32s, not strings. As already mentioned before, there are more advance libraries here http://en.wikipedia.org/wiki/Perfect_hash_function but they are not written in Haskell. So you have to rewrite them or use the FFI.

...

This approach can't tell a CAT from an ACT or a DOG from a GOD, which is another strike against it. (It also can't tell a TITTLE from a TILE, or a BOTTLE from a BOLE, for obvious reasons.)

The hash function just tries to produce random lookup values, which are used to flatten the average depth of the lookup table at the entries of the array. So there is a tradeoff between a fast hash function, and one that does a good job. Hans

5639

Age (days ago)

5643

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Hans Aberg
John Lato
Richard O'Keefe

Re: [Haskell-cafe] Very imperfect hash function

tags

participants (3)