
The data integrity checks is well-known problem. A common soluting is use of 'checksums'. Most of them , however, are built in quite obfuscated manner (like md5) that results in ugly and error-prone implementations (see reference implementation for same md5). So, the question is: is there a checksum, that is easy to implement over stream of bytes and may work as good checksum and is good in sence that creation of messages with same checksum that given message has is very hard problem (at least 2^128 tries) ? The reason is that I wish a good checksum to be implemented im my stream-oriented library.

2010/12/3 Permjacov Evgeniy
The data integrity checks is well-known problem. A common soluting is use of 'checksums'. Most of them , however, are built in quite obfuscated manner (like md5) that results in ugly and error-prone implementations (see reference implementation for same md5).
So, the question is: is there a checksum, that is easy to implement over stream of bytes and may work as good checksum and is good in sence that creation of messages with same checksum that given message has is very hard problem (at least 2^128 tries) ?
2^128 tries needed for hash size of 256 bits. See http://en.wikipedia.org/wiki/Birthday_attack Most of the time you can get away with usual block ciphers (and even with weaker parameters). There is a scheme that transforms block cipher into hash function: http://en.wikipedia.org/wiki/CRHF#Hash_functions_based_on_block_ciphers RC5, for example, parametrized by number of encryption rounds. RC5 with 12 rounds has sufficiently good avalanche (spread of bit change) so that you can use 12-round RC-5 instead of full death proof 20-round.

2010/12/3 Permjacov Evgeniy
: The data integrity checks is well-known problem. A common soluting is use of 'checksums'. Most of them , however, are built in quite obfuscated manner (like md5) that results in ugly and error-prone implementations (see reference implementation for same md5).
So, the question is: is there a checksum, that is easy to implement over stream of bytes and may work as good checksum and is good in sence that creation of messages with same checksum that given message has is very hard problem (at least 2^128 tries) ? 2^128 tries needed for hash size of 256 bits. See http://en.wikipedia.org/wiki/Birthday_attack Ok, I have to use at least 256 bit resulting value. This is four Word64 or 32 Word8 ... Well, maybe it will work Most of the time you can get away with usual block ciphers (and even with weaker parameters). There is a scheme that transforms block cipher into hash function: http://en.wikipedia.org/wiki/CRHF#Hash_functions_based_on_block_ciphers */me wrote it into to_read list. The problem is, however, that block ciphers are quite unfriendly to plain word8 streams. It is not a deadly
On 12/03/2010 12:33 AM, Serguey Zefirov wrote: problem, but i'd like to avoid block collections.
RC5, for example, parametrized by number of encryption rounds. RC5 with 12 rounds has sufficiently good avalanche (spread of bit change) so that you can use 12-round RC-5 instead of full death proof 20-round.

2010/12/3 Permjacov Evgeniy
Most of the time you can get away with usual block ciphers (and even with weaker parameters). There is a scheme that transforms block cipher into hash function: http://en.wikipedia.org/wiki/CRHF#Hash_functions_based_on_block_ciphers */me wrote it into to_read list. The problem is, however, that block ciphers are quite unfriendly to plain word8 streams. It is not a deadly problem, but i'd like to avoid block collections.
All one-way hashes do block collections. This is unavoidable.

On 12/03/2010 10:48 AM, Serguey Zefirov wrote:
2010/12/3 Permjacov Evgeniy
: Most of the time you can get away with usual block ciphers (and even with weaker parameters). There is a scheme that transforms block cipher into hash function: http://en.wikipedia.org/wiki/CRHF#Hash_functions_based_on_block_ciphers */me wrote it into to_read list. The problem is, however, that block ciphers are quite unfriendly to plain word8 streams. It is not a deadly problem, but i'd like to avoid block collections. All one-way hashes do block collections. This is unavoidable. Why ? Is there some math behind this proposition ?

2010/12/3 Permjacov Evgeniy
*/me wrote it into to_read list. The problem is, however, that block ciphers are quite unfriendly to plain word8 streams. It is not a deadly problem, but i'd like to avoid block collections. All one-way hashes do block collections. This is unavoidable. Why ? Is there some math behind this proposition ?
This is hard - to add single byte into the sum with cryptographic (or near cryptographic) security. To quote Wikipedia again: "The avalanche effect is evident if, when an input is changed slightly (for example, flipping a single bit) the output changes significantly (e.g., half the output bits flip)." http://en.wikipedia.org/wiki/Avalanche_effect This is true for hashes too. Hash should change about half of the random output bits when single bit of input changes. Especially if you aim to tamper-proof hashes. You have to have that property on every round of hashing, because you don't know when to stop. For bytes, you have to guarantee that you get an avalanche effect for every byte - it means, that you have to transform your entire block plus input byte in an expensive way. MD5 and all other hashes have internal state of various size, they all keep input blocks to make hashing transform less expensive. Fast methods like sum-modulo-poly (CRC variants) or linear congruential generators do not have good avalanche property when used for stream hashing or encryption. Even their combination (one in ZIP encryption) wasn't strong enough.

*/me wrote it into to_read list. The problem is, however, that block ciphers are quite unfriendly to plain word8 streams. It is not a deadly problem, but i'd like to avoid block collections. All one-way hashes do block collections. This is unavoidable. Why ? Is there some math behind this proposition ? This is hard - to add single byte into the sum with cryptographic (or near cryptographic) security. To quote Wikipedia again: "The avalanche effect is evident if, when an input is changed slightly (for example, flipping a single bit) the output changes significantly (e.g., half
2010/12/3 Permjacov Evgeniy
: the output bits flip)." This simply means, that active set of bits must be at least of the size of final value and value to be added must be added somehow to every byte in active set. The simplest way to do it is multiplication of vector [active-state-bits++current-byte] and some matrix of size [resulting bytes count|resulting bytes count + 1] (of cource, not in floating-point math, but, for example, using modulo-256 arithmetic or even hand-coded
On 12/03/2010 11:40 AM, Serguey Zefirov wrote: tables for "mul" and "sum"). This, of course, means, that byte-streaming hashes needs some initial seed (that must be paired with resulting value to check) and that every byte will cause much operations to perform, resulting in poor performance. So, the conclusion is: byte-streaming hashes are possible, but because of requirements definitly will have poor performance, much worse then block ones. Am I correct?
http://en.wikipedia.org/wiki/Avalanche_effect
This is true for hashes too. Hash should change about half of the random output bits when single bit of input changes. Especially if you aim to tamper-proof hashes. You have to have that property on every round of hashing, because you don't know when to stop. For bytes, you have to guarantee that you get an avalanche effect for every byte - it means, that you have to transform your entire block plus input byte in an expensive way. MD5 and all other hashes have internal state of various size, they all keep input blocks to make hashing transform less expensive.
Fast methods like sum-modulo-poly (CRC variants) or linear congruential generators do not have good avalanche property when used for stream hashing or encryption. Even their combination (one in ZIP encryption) wasn't strong enough.

2010/12/4 Permjacov Evgeniy
near cryptographic) security. To quote Wikipedia again: "The avalanche effect is evident if, when an input is changed slightly (for example, flipping a single bit) the output changes significantly (e.g., half the output bits flip)." This simply means, that active set of bits must be at least of the size of final value and value to be added must be added somehow to every byte in active set. The simplest way to do it is multiplication of vector [active-state-bits++current-byte] and some matrix of size [resulting bytes count|resulting bytes count + 1] (of cource, not in floating-point math, but, for example, using modulo-256 arithmetic or even hand-coded tables for "mul" and "sum"). This, of course, means, that byte-streaming hashes needs some initial seed (that must be paired with resulting value to check) and that every byte will cause much operations to perform, resulting in poor performance. So, the conclusion is: byte-streaming hashes are possible, but because of requirements definitly will have poor performance, much worse then block ones. Am I correct?
I think you are correct. PS The note about matrices is interesting one. The total matrix should be dense, but we could factor it. For example, by multiplying two N wide and M wide band matrices we will get (N+M) wide band matrix. You are free to choose multiplication and addition operations, like addition could be XOR and multiplication could be ROTATE_LEFT (like in RC5). I did a little experiment: http://thesz.mskhug.ru/svn/cryptomatrix/ Just to demonstrate interesting properties of your suggestion.

I may be missing something, but it is not clear to me if you want cryptographic security. If you do, then the only safe choice is to use a standard algorithm (or block cipher construction, perhaps). Sorry if that's already what you are discussing - I don't know whether there are any established algorithms that mix in a byte at a time. (though the argument that they are aiming for avalanche properties is pretty strong). (The history of the submissions to the SHA3 contest http://csrc.nist.gov/groups/ST/hash/sha-3/index.html shows it's not easy for even the experts to get it right, and that it can take a long time for problems to be noticed, even if you can convince tons of other experts to look over an algorithm) If you don't want cryptographic security, there may are probably cheap things you could consider. Brandon

On 02/12/2010 09:17 PM, Permjacov Evgeniy wrote:
The data integrity checks is well-known problem. A common soluting is use of 'checksums'. Most of them , however, are built in quite obfuscated manner (like md5) that results in ugly and error-prone implementations (see reference implementation for same md5).
So, the question is: is there a checksum, that is easy to implement over stream of bytes and may work as good checksum and is good in sence that creation of messages with same checksum that given message has is very hard problem (at least 2^128 tries) ?
The reason is that I wish a good checksum to be implemented im my stream-oriented library.
Designing something that detects accidental alterations reliably is quite easy. Designing something that detects malicious alterations reliably is absurdly hard. (Last time I checked, MD5, SHA-1 and SHA-256 are all fairly similar in design, and all have either had serious weaknesses found or actually been broken.) Cryptographic hash functions are like ciphers; their designs are almost always quite complicated, in order to make it harder to analyse (and thereby crack) the algorithm. So, depending on exactly which properties you do or don't need, the problem is either quite easy or absurdly hard.
participants (4)
-
Andrew Coppin
-
Brandon Moore
-
Permjacov Evgeniy
-
Serguey Zefirov