cryptohash and an incremental API

Thomas M. DuBuisson

12 Jul 2010 12 Jul '10

9:52 p.m.

Vincent, Due to spam-like comments on -cafe I hadn't been subscribed for a while and missed your cryptohash discussion! Particularly:

...

The main reason for this library is the lack of incremental api exposed by current digest libraries, and filling the void about some missing digest algorithms; Also the speed comes as a nice bonus.

I've been working on a new crypto library specifically to provide a unified API for packages implementing cryptographic algorithms. You can see the discussions on libraries@haskell.org [1] [2]. Please feel free to take a look, comment, contribute, and hopefully move to the interface. I should be finishing up BlockCipher modes and adding hash tests soon. Cheers, Thomas [1] http://www.haskell.org/pipermail/libraries/2010-May/013688.html [2] http://www.haskell.org/pipermail/libraries/2010-June/013782.html

Show replies by date

Vincent Hanquez

14 Jul 14 Jul

8:22 a.m.

On Mon, Jul 12, 2010 at 02:52:10PM -0700, Thomas M. DuBuisson wrote:

...

I've been working on a new crypto library specifically to provide a unified API for packages implementing cryptographic algorithms. You can see the discussions on libraries@haskell.org [1] [2]. Please feel free to take a look, comment, contribute, and hopefully move to the interface. I should be finishing up BlockCipher modes and adding hash tests soon.

Hi Thomas, first, I think that's a great efforts to standardize the crypto API ! couple of comments around the hashes interface: * updateCtx works on blockLength, instead of working on arbitrary size: while this does represent what the underlaying algorithm do, letting the algorithm implementation process any size is, I think, better. chunking the bytestring might have a significant cost (a rope based implementation would not suffer this), and in my case, processing as much as possible at each update call, prevent from suffering from the marshalling/unmarshalling cost of the mutable state. * hash is a generic operation based on the class Hash. In my case, it improve performance by not running the pure init/update/finalize exposed, but use the hidden impure function. I realized yesterday it's not as much as i though since i had a bug in my benchmark, but it's still there (100ms for 500mb of data). * Why is the digest of a specific type ? I like representing different things with different types, but i'm not sure what do you gain with digests though. * is strength really useful in the Hash class ? it might be accurate when the thing get implemented, but i'm not sure what would happens over time, and flaws are discovered. would people actually updates it ? The blockCipher should exposes the chaining modes as overridable typeclass functions, with default generic implementations that use encryptBlocks. For example the haskell AES package has different C implementations for each chaining modes (e.g. cbc, ebc), and i suspect that using a generic chaining implementation would slow things down. what about something like: -- each plaintext bytestring need to be a multiple of blockSize class (Binary k, Serialize k) => BlockCipher k where blockSize :: Tagged k BitLength encryptBlocks :: k -> ByteString -> ByteString decryptBlocks :: k -> ByteString -> ByteString encryptBlocksCBC :: k -> ByteString -> (k, ByteString) encryptBlocksCBC = genericCBC encryptBlocks decryptBlocksCBC :: k -> ByteString -> (k, ByteString) .. same for ebc, ... buildKey :: ByteString -> Maybe k keyLength :: k -> BitLength -- ^ keyLength may inspect... and my last comment, is that i don't understand the streamcipher interface you're proposing. I've got a (inefficient) RC4 implementation that has this interface: stream :: Ctx -> B.ByteString -> (Ctx, B.ByteString) streamlazy :: Ctx -> L.ByteString -> (Ctx, L.ByteString) I'm not sure how it would fit this interface (some kind of state monad ?): encryptStream :: k -> B.ByteString -> B.ByteString I hope that's useful comments, -- Vincent Hanquez

Thomas DuBuisson

6:43 p.m.

Vincent said:

...

couple of comments around the hashes interface:

* updateCtx works on blockLength, instead of working on arbitrary size...

So for performance reasons you seem to prefer Semantics 1.2? """ 1.2 Multiple of blockSize bytes Implementations are encouraged to consume data (continue updating, encrypting, or decrypting) until there is less than blockSize bits available. """ Also, I'll amend 1.2 and say the hashUpdate/encrypt/decrypt functions should only consume n * blockSize bytes, tracking the remainder will be done at the higher level. Also, the higher level default implementations should only pass n * blocksize inputs to these functions. I can see how that's reasonable and am strongly considering using these semantics instead of 1.1.

...

* hash is a generic operation based on the class Hash. In my case, it improve performance by not running the pure init/update/finalize exposed, but use the hidden impure function. I realized yesterday it's not as much as i though since i had a bug in my benchmark, but it's still there (100ms for 500mb of data).

Humm, 0.2 sections / GB is significant so again I can be swayed - it isn't like I can't have a default definition of hash (and others) when its part of the class instance.

...

* Why is the digest of a specific type ? I like representing different things with different types, but i'm not sure what do you gain with digests though.

This I am less flexible on. My thought on how people will use this library is centered around the instantiation of classes on the keys used or resulting digests. Anyone wanting ByteString results can simply use Data.[Serialize,Binary].encode. Here is a user getting a sha256 hash: let h = hash contents :: SHA256 or the type could be implicit due to context (not shown): let h = hash contents

...

* is strength really useful in the Hash class ? it might be accurate when the thing get implemented, but i'm not sure what would happens over time, and flaws are discovered. would people actually updates it ?

Will people actually update it? I hope so but if they don't are we really worse off than not having any strength numbers? People who care about strength will likely keep track of the algorithms on which they depend. I added strength largely because the Hash class came from DRBG (NIST SP 800-90) and that needed strength values. If we don't have strength then applications like DRBG need a way to know which algorithm each data type represents then to look up that algorithm their its own table of algorithm strength - very messy. I'd imaging crypto-api would have to look something like: \begin{code} data HashAlgorithm = MD5 | SHA1 | SHA256 | SHA512 | ... class Hash d c | d -> c, c -> d where ... algorithm :: Tagged d HashAlgorithm ... \end{code} I don't consider this a win - crypto-api now enumerating all hash algorithms wanting Hash instances.

...

The blockCipher should exposes the chaining modes as overridable typeclass functions, with default generic implementations that use encryptBlocks. For example the haskell AES package has different C implementations for each chaining modes (e.g. cbc, ebc), and i suspect that using a generic chaining implementation would slow things down.

As with "hash" being part of the hash typeclass, I don't have a strong objection here. It allows particular implementations to be slightly higher performance and does not preclude default definitions. This is rather messier than I wanted, but the reasoning seems sound. WRT your specific examples: encryptBlocksCBC :: k -> ByteString -> (k, ByteString) decryptBlocksCBC :: k -> ByteString -> (k, ByteString) These I do object to. The key does not change as the CBC algorithm progresses, but contextual information does. My initial mode implementations have types like: cbc :: (BlockCipher k) => k -> IV k -> ByteString -> (ByteString, IV k) In other words, initialization vectors are explicit and separate from the key. The type parameter on IV allows us to build an IV of proper size, something like: buildIV :: (BlockCipher k, MonadRandom m) => m (IV k) and it is always true that iv :: IV k iv <- buildIV B.length (encode iv) == blockSize `for` (undefined :: k)

...

and my last comment, is that i don't understand the streamcipher interface you're proposing. I've got a (inefficient) RC4 implementation that has this interface:

stream :: Ctx -> B.ByteString -> (Ctx, B.ByteString) streamlazy :: Ctx -> L.ByteString -> (Ctx, L.ByteString)

My interface was just a quick hack with me understanding it would likely change - I didn't know there was a Haskell RC4 binding or implementation and will happily follow your lead here. Is this implementation on hackage? Cheers, Thomas

Vincent Hanquez

15 Jul 15 Jul

9:24 p.m.

On Wed, Jul 14, 2010 at 11:43:45AM -0700, Thomas DuBuisson wrote:

...

Vincent said:

...
couple of comments around the hashes interface:

* updateCtx works on blockLength, instead of working on arbitrary size...

So for performance reasons you seem to prefer Semantics 1.2?

""" 1.2 Multiple of blockSize bytes Implementations are encouraged to consume data (continue updating, encrypting, or decrypting) until there is less than blockSize bits available. """

Also, I'll amend 1.2 and say the hashUpdate/encrypt/decrypt functions should only consume n * blockSize bytes, tracking the remainder will be done at the higher level. Also, the higher level default implementations should only pass n * blocksize inputs to these functions.

I can see how that's reasonable and am strongly considering using these semantics instead of 1.1.

I'm not sure which document you are refering to here. While thinking about it, i'm not sure if tracking the remainder should be left at the higher level; The change will trickle through the finalize function, which might not be very pratical.

...

...
* Why is the digest of a specific type ? I like representing different things with different types, but i'm not sure what do you gain with digests though.

This I am less flexible on. My thought on how people will use this library is centered around the instantiation of classes on the keys used or resulting digests. Anyone wanting ByteString results can simply use Data.[Serialize,Binary].encode.

Here is a user getting a sha256 hash: let h = hash contents :: SHA256

or the type could be implicit due to context (not shown): let h = hash contents

That's fine; I wasn't objecting but just wondering about it.

...

...
* is strength really useful in the Hash class ? it might be accurate when the thing get implemented, but i'm not sure what would happens over time, and flaws are discovered. would people actually updates it ?

Will people actually update it? I hope so but if they don't are we really worse off than not having any strength numbers? People who [snip] I don't consider this a win - crypto-api now enumerating all hash algorithms wanting Hash instances.

indeed, just that strenght looks like a warm-fuzzy-feeling value, that might be wrongly used to choose a hash algorithm automatically or something. Otherwise i really don't mind.

...

...
The blockCipher should exposes the chaining modes as overridable typeclass functions, with default generic implementations that use encryptBlocks. For example the haskell AES package has different C implementations for each chaining modes (e.g. cbc, ebc), and i suspect that using a generic chaining implementation would slow things down. [snip]

These I do object to. The key does not change as the CBC algorithm progresses, but contextual information does. My initial mode implementations have types like:

cbc :: (BlockCipher k) => k -> IV k -> ByteString -> (ByteString, IV k)

sorry, i got confused. Your type definition is clearly what should be done here.

...

In other words, initialization vectors are explicit and separate from the key. The type parameter on IV allows us to build an IV of proper size, something like:

buildIV :: (BlockCipher k, MonadRandom m) => m (IV k)

I'm not sure i understand this however. In particular the MonadRandom bits, why is it there ? Is this pulling bits from a random generator to construct an iv ?

...

My interface was just a quick hack with me understanding it would likely change - I didn't know there was a Haskell RC4 binding or implementation and will happily follow your lead here. Is this implementation on hackage?

No, it's not hackage yet, I'll clean it up and put somewhere accessible in the next couple of days. I'm not sure I can be considered a stream cipher expert yet though. I'll try to gather some different algorithms implementation to see how it can be generalized to. -- Vincent

5483

Age (days ago)

5486

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Thomas DuBuisson
Thomas M. DuBuisson
Vincent Hanquez