Re: Cryptographic hash uniquness (was [Haskell-cafe] Simple network client)

1 Feb 2008

      Bulat Ziganshin wrote:
...
Hello Peter,
Thursday, January 31, 2008, 8:01:36 PM, you wrote:
...
files with different content generating the same hash)... My
intuition told me that the odds of two cryptographic hashes (on
meaningful content) colliding was much less than the earth being
destroyed by an asteroid... But this is just intuition... What does
computer science tell us about this?
you may be interested to know that widely used rsync algorithms relies
on 128-bit hashes and its author speculated about its reliability:
http://samba.org/~tridge/phd_thesis.pdf
Interesting paper. Thank you. I had a quick read of the bit relating to
hashes and my understanding is this. He uses a weak, quick and simple
hash to deal with 99.99% (I invented that percentage) of cases - if the
hash is different we know the files are definitely different - if the
hash collides he then does a strong, slow, secure hash and relies on
this as the answer. But he's relying on the strong hash rather than
doing a byte by byte comparison because there is a major cost (a network
transmission of the file) to doing the proper byte by byte comparison.
If you had both files accessible at a low cost it might be better to
byte by byte compare them when you get a collision rather than use
the strong hash.

The right approach may be to assume that collisions will occur and
cater for this properly in the program somehow. A good tip for testing
this sort of thing is to have the length of the hash (maximum size of
the array or whatever you want to test) as a parameter that you can
turn down to a very low value to induce collisions (overflows etc) to
see whether the program still works. And then turn it back up for live
use.

A cryptographic hash appears as a completely random function of the
input so the likelihood of a collision is simply 2^(bits in hash).

Richard.

Re: Cryptographic hash uniquness (was [Haskell-cafe] Simple network client)

Richard Kelsall