
I'm not clear what strength means. Wouldn't that be something the user should know before using a particular algorithm? And what about when it gets broken e.g. MD5 and SHA1?
A number (specified in bits) that represents the amount of work needed to break the algorithm. This number will be (manually) degraded as newer attacks are published. For SHA-{256,384,512} the numbers are all still 256, 384, 512 afaik (corrections welcome, I've not been following crypto news). This is motivated by NIST SP 800-90, where the DRBG (aka RNG) user can specify the desired strength and the RNG can select an appropriate hash algorithm.
Instead I'm thinking of just forcing all cipher text to be a strict ByteString (below). Higher level operations, such as modes, then can use these and produce results :: lazy ByteString. If someone wants to build a Cipher instance for an algorithm that is fundamentally a stream cipher (vs a block cipher + stream mode) then this won't suffice, but I don't see any such algorithm in common use.
RC4? Maybe you should have separate classes for stream and block ciphers.
I should have said "I don't see any such algorithm in common use in Haskell". But you're right, a StreamCipher class might be wise even though it probably won't be instantiated any time soon. I'm tempted to delay defining the class until there is some implementation that would instantiate it, just so we have a sanity check on the class definition.
Crypto Types (type Salt = Nonce, newtype Nonce, BitLength...) Classes (Digest, Cipher, Asymmetric?) -- what would Asymmetric look like?
I think an examples would be very helpful. E.g. what would MD5 look like, what would DES look like, what would RC4 look like and what would RSA look like.
Agreed - I'll work on examples. I notice comments on interface are still trickling in (ex: Adam) and warrants consideration.
HMAC (hmac) Modes (cbc, ofb, cfb)
Presumably ECB as well.
Yes, yes.
There are quite a few tests in the crypto package which ought to be re-used. I might be tempted to do the statistical tests on RNG.
Agreed. Also in algorithm specific packages there are tests worth stealing.
* For Crypto-Algs pure Haskell code may be preferred but should not be used when costing over ~20% performance degradation when compared to other available implementations.
It's always been a disappointment to me that Haskell crypto performs so badly compared to C given that it should be possible to generate a nice loop from a Haskell fold. It would be great if this could be achieved.
In my experience it hasn't been the numeric computation, but the coercing of words from the bytestring that costs the extra time. If you know of a zero copy way to access an (aligned) bytestring as an array of unboxed words then that would be great. Unaligned could probably just drop down to using Binary or Cereal without bothering too many users.
* Keys should receive their own data type declarations (no cryptographic information floating around in a naked ByteString)
How will you handle algorithms that allow different key sizes e.g. AES?
I envision key data types being specific to each algorithm so each algorithm can decide from itself how to handle things. A stab at AES might look like data AESKey = Key { raw :: B.ByteString, expanded :: B.ByteString } So in this cases the difference between AES128, 192, and 256 is rather hidden unless you call the "keyLength" routine. The interface doesn't have to be realized this way, its just one possibility. Another option would be data AESKey n = Key { raw :: B.ByteString, expanded :: B.ByteString } And have the constructors enforce the concept that "n = keyLength k ==> k :: AESKey Tn" where Tn is some empty data declaration "data T128", "data T192", "data T256". Perhaps non-ideal, but I'm just saying it's possible with the interface. Obviously having three data types is also possible, data AES128 = K128 ... data AES192 = K192 ... data AES258 = K258 ... but this implies three different cipher instances unless we drop back to the concept of instantiating the cipher for an empty data declaration and use functional dependencies to state the key implies the instance (and not the other way around). Exactly which method -Algs would agree on is up in the air, but I'm leaning toward the first one (key length is immaterial to the user / not available in the type system once you have the key).
* Data types should have Binary and Serialize instances.
I'm not clear why you would want this. I suspect some specifications will want data serialized in a specific way and then encrypted. It's not clear to me that this would necessarily be the way Binary had serialized it. Or have I missed the point?
By "data types" I was meaning digests (and thinking Keys too, in the back of my mind). I believe it should be a requirement of the type class that all digests are serializable because that is needed so often.
Having mentioned padding, I didn't see anything about it in your proposal.
Because I haven't figured out what I/we would want there yet. Feel free to help figure that out.
As an aside, I would have some concerns about using a Haskell crypto package for applications where failure could have serious consequences. For example, you probably want keys to be flushed from memory after they have been used. It's not clear what guarantees you would have that this had been done in a Haskell implementation. I'm hoping that FIPS certification of openssl would mean concerns like that had been addressed. I think aiming for this for a Haskell implementation would be further that anyone would want to go but there certainly ought to be a warning on the package about it's limitations in this respect.
Sounds reasonable. I have some ideas for another iteration (thanks to Adams comment for prompting some changes) and will post once I get time - in the final throws of the Spring term so I have no time this coming week. I should have ported DRBG to the new class before the next e-mail too. Cheers, Thomas