
Bulat Ziganshin wrote:
Hello Peter,
Thursday, January 31, 2008, 8:01:36 PM, you wrote:
files with different content generating the same hash)... My intuition told me that the odds of two cryptographic hashes (on meaningful content) colliding was much less than the earth being destroyed by an asteroid... But this is just intuition... What does computer science tell us about this?
you may be interested to know that widely used rsync algorithms relies on 128-bit hashes and its author speculated about its reliability:
Interesting paper. Thank you. I had a quick read of the bit relating to hashes and my understanding is this. He uses a weak, quick and simple hash to deal with 99.99% (I invented that percentage) of cases - if the hash is different we know the files are definitely different - if the hash collides he then does a strong, slow, secure hash and relies on this as the answer. But he's relying on the strong hash rather than doing a byte by byte comparison because there is a major cost (a network transmission of the file) to doing the proper byte by byte comparison. If you had both files accessible at a low cost it might be better to byte by byte compare them when you get a collision rather than use the strong hash. The right approach may be to assume that collisions will occur and cater for this properly in the program somehow. A good tip for testing this sort of thing is to have the length of the hash (maximum size of the array or whatever you want to test) as a parameter that you can turn down to a very low value to induce collisions (overflows etc) to see whether the program still works. And then turn it back up for live use. A cryptographic hash appears as a completely random function of the input so the likelihood of a collision is simply 2^(bits in hash). Richard.