DRBG pre-announce and a discussion on RNG/Crypto infrastructure

The DRBG package [1] is a partial implementation of NIST SP 800-90 section 10, specifically the hash and HMAC based cryptographically secure random number generators are implemented. Before doing further work I feel the state of affairs of cryptography on hackage needs to change. As has became too common, DRBG implements (or should implement) several features that could be done once for all time in a quality Crypto package. Features such as type classes for hashes and ciphers, an hmac definition, and a MonadRandom or something similar for random number generators that can fail. What existing code implements one or more of these features that should be able to use a central Crypto package? Happstack, SHA2, DRBG, SHA, hS3 (actually uses crypto despite [Word8] where ByteString should be used), hoauth, and a few partial-bindings to OpenSSL satisfy several needs. Future Crypto Library Principles I propose two "new" libraries be built and receive some higher level of community attention. One is "Crypto" which will define type classes and a limited set of common generic algorithms (ex: HMAC, cipher modes of operation). The other is "crypto-algs" which implements (and instantiates type classes for) as many common algorithms as we can reasonably group together with similar interfaces. I feel the breakdown is important - if the community-accepted interface is separated from any algorithms then maintainers of alternate implementations are more likely to accept the interface. Enumerating principles I support: * Lazy ByteStrings should be used for all input data * While "over-typeclassification" should be avoided, two or three basic classes of "Digest", "Cipher", and "CryptoRandom" make good sense. * CryptoRandom, and related infrastructure would closely follow Random and MonadRandom except for allowing clean failure. * Digest result types should be unique to the digest (a la pureMD5, contrary to SHAs use of "Digest" for 4 different hashs and contrary to SHA2s use of ByteString for all digests) * For Crypto-Algs pure Haskell code may be preferred but should not be used when costing over ~20% performance degradation when compared to other available implementations. * No dependence on external libraries (e.g. no OpenSSL bindings) * Keys should receive their own data type declarations (no cryptographic information floating around in a naked ByteString) * Data types should have Binary and Serialize instances. * All code should be licensed BSD3 or equivalent Open Questions on Class Design What should be made instances of our classes. Should an empty data declaration have an instance, and the class have a way to "extract" the actual digest [2]? Perhaps it should be the data type containing the digest that is the instance, so the hash algorithm can be inferred based on the result type. Required methods: I'd propose we follow [3] (see [2] for instances) except removing the empty data declarations, using the digest/key types for the instance. In brief, this includes encrypt/decrypt (or 'hash), strength, outputLength, blockLength, What algorithms are generic enough to include in Crypto? I'd say anything that works for all instances of {Cipher, Digest}. Minimum quality for inclusion: A suite of KATs and some straight forward QuickCheck tests. Anticipating Concerns 1) (Re-) Implementing Crypto is hard / dangerous / a waste of time Most the benefit I aim for is in the common interface, not the algorithms. That said, the -Algs package can largely be a gathering of best-of-hackage with sanitized interface and continually-updated as higher-performance implementations come available. This might have been how the current "Crypto" package started, I don't know, but it is no longer in a sane or useful state. 2) OpenSSL is fine For some people, yes - absolutely. Not just fine but insane to replace, seeing as it's (partly) FIPS certified. For other people, such as basically all the packages I listed above, it's obviously lacking something be it portability, footprint, or interface. 3) The Monad class / MonadRandom can fail - it's called "error" and "fail". This is a matter of taste. I personally wish Haskell didn't have exceptions or had an explicitly exceptionless type. At any rate, the random work would probably come last, especially if I'm doing the implementation. I intend to tackle this once I get some personal time freed up (around mid June). If you're interested then please feel free to lay claim to some part of this problem or comment on the proposal. Cheers, Thomas [1] http://code.haskell.org/~tommd/DRBG/ Remaining tasks include optimization, test suite (NIST provides a large body of KATs), self test procedures, and the higher level interface (section 9). [2] http://code.haskell.org/~tommd/DRBG/Test/KAT.hs [3] http://code.haskell.org/~tommd/DRBG/Data/Crypto/Classes.hs

All,
So I've done some initial work on the type classes and module layout.
I feel the digest class is in good shape. DRGB hasn't been ported to
this interface, which would be decent validation, but I don't see any
major issues with either making instances or using the interface for
any generic algorithm that can operate with any hash function. I've
previously required "(Pretty d) =>", but that's probably excessive and
adds another dependency. Still, Digest authors __should__ have Pretty
instances for the digests!
===============
class (Binary d, Serialize d)
=> Hash ctx d | d -> ctx, ctx -> d where
outputLength :: d -> BitLength -- outputLength must ignore its argument
blockLength :: d -> BitLength -- blockLength must ignore its argument
hash :: ByteString -> d
initialCtx :: ctx
updateCtx :: ctx -> ByteString -> ctx
finalize :: ctx -> d
strength :: d -> Int -- strength must ignore its argument
===============
The Cipher instance is in somewhat worse shape. Aside from me using
the English term of 'Cipher' for the class then dropping to the
American terms of 'encrypt' and 'decrypt' for the functions, there is
also the issue of how to represent the cipher text type. Using the
type system to enforce (or aiding analysis of) information flow has
always been tempting, but I've never found a convenient way to express
this. Below is one attempt that falls apart if you A) want to receive
information (ByteString) and decrypt it or B) want your higher level
cryptographic operation implementations (cbc mode, etc) to look sane
(as I found out).
===============
class (Monoid ct, Binary ct, Serialize ct) => Cipher ct k | k -> ct where
blockSize :: k -> BitLength
encrypt :: k -> B.ByteString -> ct
decrypt :: k -> ct -> B.ByteString
buildKey :: B.ByteString -> Maybe k
keyLength :: k -> BitLength
================
Instead I'm thinking of just forcing all cipher text to be a strict
ByteString (below). Higher level operations, such as modes, then can
use these and produce results :: lazy ByteString. If someone wants to
build a Cipher instance for an algorithm that is fundamentally a
stream cipher (vs a block cipher + stream mode) then this won't
suffice, but I don't see any such algorithm in common use.
================
class Cipher k where
blockSize :: k -> BitLength -- ^ blockSize must ignore its argument
encrypt :: k -> B.ByteString -> B.ByteString
decrypt :: k -> B.ByteString -> B.ByteString
buildKey :: B.ByteString -> Maybe k
keyLength :: k -> BitLength -- ^ keyLength may inspect its
argument to return the length
================
I'm still debating how to implement "strength" for both Digest and
Cipher. Two obvious choices are A) use different names in each class
"cipherStrength" and "digestStrength" B) Have another type class that
defines "strength". I'm ambivalent here - if someone has an argument
one way or another please speak up.
The Module and Package breakdown:
PACKAGE: crypto
Data
Crypto
Types (type Salt = Nonce, newtype Nonce, BitLength...)
Classes (Digest, Cipher, Asymmetric?) -- what would
Asymmetric look like?
HMAC (hmac)
Modes (cbc, ofb, cfb)
Random? Still considering
System
Crypto
Random (getEntropy, getTrueEntropy)
-- urandom vs random. On Windows just use
cryptGenRandom for both
Test
Crypto
Hash (KATs, Props)
Cipher (KATs, Props)
Asymmetric (KATs, Props)
-- statistical tests for RNG if someone wants to do the work
PACKAGE: crypto-algs
Data.Crypto.Algs (all algorithms are exported from the base module)
.PBKDF2 -- from pbkdf2
.AES -- from simpleAES
.DES -- from ???
.SHA -- from SHA2 pkg
.MD5 -- perhaps from pureMD5, improved with FFI for accumulator
.arc4 -- ???
.DRBG.HMAC -- from DRBG
.DRBG.Hash -- from DRBG
PACKAGE: Cryptography
Data.Crypto exports all of crypto-algs and crypto. It's a metapackage.
Cheers,
Thomas
On Tue, May 25, 2010 at 3:11 PM, Thomas DuBuisson
The DRBG package [1] is a partial implementation of NIST SP 800-90 section 10, specifically the hash and HMAC based cryptographically secure random number generators are implemented. Before doing further work I feel the state of affairs of cryptography on hackage needs to change.
As has became too common, DRBG implements (or should implement) several features that could be done once for all time in a quality Crypto package. Features such as type classes for hashes and ciphers, an hmac definition, and a MonadRandom or something similar for random number generators that can fail. What existing code implements one or more of these features that should be able to use a central Crypto package? Happstack, SHA2, DRBG, SHA, hS3 (actually uses crypto despite [Word8] where ByteString should be used), hoauth, and a few partial-bindings to OpenSSL satisfy several needs.
Future Crypto Library Principles I propose two "new" libraries be built and receive some higher level of community attention. One is "Crypto" which will define type classes and a limited set of common generic algorithms (ex: HMAC, cipher modes of operation). The other is "crypto-algs" which implements (and instantiates type classes for) as many common algorithms as we can reasonably group together with similar interfaces. I feel the breakdown is important - if the community-accepted interface is separated from any algorithms then maintainers of alternate implementations are more likely to accept the interface.
Enumerating principles I support: * Lazy ByteStrings should be used for all input data * While "over-typeclassification" should be avoided, two or three basic classes of "Digest", "Cipher", and "CryptoRandom" make good sense. * CryptoRandom, and related infrastructure would closely follow Random and MonadRandom except for allowing clean failure. * Digest result types should be unique to the digest (a la pureMD5, contrary to SHAs use of "Digest" for 4 different hashs and contrary to SHA2s use of ByteString for all digests) * For Crypto-Algs pure Haskell code may be preferred but should not be used when costing over ~20% performance degradation when compared to other available implementations. * No dependence on external libraries (e.g. no OpenSSL bindings) * Keys should receive their own data type declarations (no cryptographic information floating around in a naked ByteString) * Data types should have Binary and Serialize instances. * All code should be licensed BSD3 or equivalent
Open Questions on Class Design What should be made instances of our classes. Should an empty data declaration have an instance, and the class have a way to "extract" the actual digest [2]? Perhaps it should be the data type containing the digest that is the instance, so the hash algorithm can be inferred based on the result type.
Required methods: I'd propose we follow [3] (see [2] for instances) except removing the empty data declarations, using the digest/key types for the instance. In brief, this includes encrypt/decrypt (or 'hash), strength, outputLength, blockLength,
What algorithms are generic enough to include in Crypto? I'd say anything that works for all instances of {Cipher, Digest}.
Minimum quality for inclusion: A suite of KATs and some straight forward QuickCheck tests.
Anticipating Concerns 1) (Re-) Implementing Crypto is hard / dangerous / a waste of time Most the benefit I aim for is in the common interface, not the algorithms. That said, the -Algs package can largely be a gathering of best-of-hackage with sanitized interface and continually-updated as higher-performance implementations come available. This might have been how the current "Crypto" package started, I don't know, but it is no longer in a sane or useful state.
2) OpenSSL is fine For some people, yes - absolutely. Not just fine but insane to replace, seeing as it's (partly) FIPS certified. For other people, such as basically all the packages I listed above, it's obviously lacking something be it portability, footprint, or interface.
3) The Monad class / MonadRandom can fail - it's called "error" and "fail". This is a matter of taste. I personally wish Haskell didn't have exceptions or had an explicitly exceptionless type. At any rate, the random work would probably come last, especially if I'm doing the implementation.
I intend to tackle this once I get some personal time freed up (around mid June). If you're interested then please feel free to lay claim to some part of this problem or comment on the proposal.
Cheers, Thomas
[1] http://code.haskell.org/~tommd/DRBG/ Remaining tasks include optimization, test suite (NIST provides a large body of KATs), self test procedures, and the higher level interface (section 9). [2] http://code.haskell.org/~tommd/DRBG/Test/KAT.hs [3] http://code.haskell.org/~tommd/DRBG/Data/Crypto/Classes.hs

On Mon, May 31, 2010 at 1:06 AM, Thomas DuBuisson
... outputLength :: d -> BitLength -- outputLength must ignore its argument blockLength :: d -> BitLength -- blockLength must ignore its argument strength :: d -> Int -- strength must ignore its argument blockSize :: k -> BitLength -- ^ blockSize must ignore its argument keyLength :: k -> BitLength -- ^ keyLength may inspect its ...
Why not use the Edward Kmett's 'tagged'[1] package for these methods? As in: outputLength :: Tagged d BitLength This way users don't have to write the dangerous 'undefined :: d'. Regards, Bas [1] http://hackage.haskell.org/package/tagged

Why not use the Edward Kmett's 'tagged'[1] package for these methods? As in:
outputLength :: Tagged d BitLength
This way users don't have to write the dangerous 'undefined :: d'.
Because I had not heard of that package. The package (and general method) look good, I'll probably switch to that. Thanks for the pointer. Cheers, Thomas

This is excellent news. It's been a while since I was involved with cryptography but here are a few comments.
=============== class (Binary d, Serialize d) => Hash ctx d | d -> ctx, ctx -> d where outputLength :: d -> BitLength -- outputLength must ignore its argument blockLength :: d -> BitLength -- blockLength must ignore its argument hash :: ByteString -> d initialCtx :: ctx updateCtx :: ctx -> ByteString -> ctx finalize :: ctx -> d strength :: d -> Int -- strength must ignore its argument ===============
I'm not clear what strength means. Wouldn't that be something the user should know before using a particular algorithm? And what about when it gets broken e.g. MD5 and SHA1?
Instead I'm thinking of just forcing all cipher text to be a strict ByteString (below). Higher level operations, such as modes, then can use these and produce results :: lazy ByteString. If someone wants to build a Cipher instance for an algorithm that is fundamentally a stream cipher (vs a block cipher + stream mode) then this won't suffice, but I don't see any such algorithm in common use.
RC4? Maybe you should have separate classes for stream and block ciphers.
Crypto Types (type Salt = Nonce, newtype Nonce, BitLength...) Classes (Digest, Cipher, Asymmetric?) -- what would Asymmetric look like?
I think an examples would be very helpful. E.g. what would MD5 look like, what would DES look like, what would RC4 look like and what would RSA look like.
HMAC (hmac) Modes (cbc, ofb, cfb)
Presumably ECB as well.
Test Crypto Hash (KATs, Props) Cipher (KATs, Props) Asymmetric (KATs, Props) -- statistical tests for RNG if someone wants to do the work
There are quite a few tests in the crypto package which ought to be re-used. I might be tempted to do the statistical tests on RNG.
As has became too common, DRBG implements (or should implement) several features that could be done once for all time in a quality Crypto package. Features such as type classes for hashes and ciphers, an hmac definition, and a MonadRandom or something similar for random number generators that can fail. What existing code implements one or more of these features that should be able to use a central Crypto package? Happstack, SHA2, DRBG, SHA, hS3 (actually uses crypto despite [Word8] where ByteString should be used), hoauth, and a few partial-bindings to OpenSSL satisfy several needs.
I couldn't agree more. For information, the crypto package predates ByteString so really didn't have much option.
* Digest result types should be unique to the digest (a la pureMD5, contrary to SHAs use of "Digest" for 4 different hashs and contrary to SHA2s use of ByteString for all digests)
Agree. After all there is really no relation between an MD5 hash and a SHA1 hash.
* For Crypto-Algs pure Haskell code may be preferred but should not be used when costing over ~20% performance degradation when compared to other available implementations.
It's always been a disappointment to me that Haskell crypto performs so badly compared to C given that it should be possible to generate a nice loop from a Haskell fold. It would be great if this could be achieved.
* Keys should receive their own data type declarations (no cryptographic information floating around in a naked ByteString)
How will you handle algorithms that allow different key sizes e.g. AES?
* Data types should have Binary and Serialize instances.
I'm not clear why you would want this. I suspect some specifications will want data serialized in a specific way and then encrypted. It's not clear to me that this would necessarily be the way Binary had serialized it. Or have I missed the point?
* All code should be licensed BSD3 or equivalent
Agreed. I recall with the crypto package one of the authors wanted GPL for their particular alrgorithm implementation which may have put some people off using it.
Minimum quality for inclusion: A suite of KATs and some straight forward QuickCheck tests.
Definitely some published test vectors and at least encrypt . decrypt = id and decrypt . encrypt = id although these may have to be modulo padding. Having mentioned padding, I didn't see anything about it in your proposal.
Anticipating Concerns 1) (Re-) Implementing Crypto is hard / dangerous / a waste of time Most the benefit I aim for is in the common interface, not the algorithms. That said, the -Algs package can largely be a gathering of best-of-hackage with sanitized interface and continually-updated as higher-performance implementations come available. This might have been how the current "Crypto" package started, I don't know, but it is no longer in a sane or useful state.
The current crypto package was my attempt to take the existing (at that time) cryptographic algorithms and put them in one package with some semblance of a common interface. It is now woefully past its sell-by date. I think the benefit of a Haskell only implementation is a) you can see how they work b) no dependencies.
2) OpenSSL is fine For some people, yes - absolutely. Not just fine but insane to replace, seeing as it's (partly) FIPS certified. For other people, such as basically all the packages I listed above, it's obviously lacking something be it portability, footprint, or interface.
As an aside, I would have some concerns about using a Haskell crypto package for applications where failure could have serious consequences. For example, you probably want keys to be flushed from memory after they have been used. It's not clear what guarantees you would have that this had been done in a Haskell implementation. I'm hoping that FIPS certification of openssl would mean concerns like that had been addressed. I think aiming for this for a Haskell implementation would be further that anyone would want to go but there certainly ought to be a warning on the package about it's limitations in this respect. Once again, I'd like to say it's great news that you are taking this on. Dominic.

I'm not clear what strength means. Wouldn't that be something the user should know before using a particular algorithm? And what about when it gets broken e.g. MD5 and SHA1?
A number (specified in bits) that represents the amount of work needed to break the algorithm. This number will be (manually) degraded as newer attacks are published. For SHA-{256,384,512} the numbers are all still 256, 384, 512 afaik (corrections welcome, I've not been following crypto news). This is motivated by NIST SP 800-90, where the DRBG (aka RNG) user can specify the desired strength and the RNG can select an appropriate hash algorithm.
Instead I'm thinking of just forcing all cipher text to be a strict ByteString (below). Higher level operations, such as modes, then can use these and produce results :: lazy ByteString. If someone wants to build a Cipher instance for an algorithm that is fundamentally a stream cipher (vs a block cipher + stream mode) then this won't suffice, but I don't see any such algorithm in common use.
RC4? Maybe you should have separate classes for stream and block ciphers.
I should have said "I don't see any such algorithm in common use in Haskell". But you're right, a StreamCipher class might be wise even though it probably won't be instantiated any time soon. I'm tempted to delay defining the class until there is some implementation that would instantiate it, just so we have a sanity check on the class definition.
Crypto Types (type Salt = Nonce, newtype Nonce, BitLength...) Classes (Digest, Cipher, Asymmetric?) -- what would Asymmetric look like?
I think an examples would be very helpful. E.g. what would MD5 look like, what would DES look like, what would RC4 look like and what would RSA look like.
Agreed - I'll work on examples. I notice comments on interface are still trickling in (ex: Adam) and warrants consideration.
HMAC (hmac) Modes (cbc, ofb, cfb)
Presumably ECB as well.
Yes, yes.
There are quite a few tests in the crypto package which ought to be re-used. I might be tempted to do the statistical tests on RNG.
Agreed. Also in algorithm specific packages there are tests worth stealing.
* For Crypto-Algs pure Haskell code may be preferred but should not be used when costing over ~20% performance degradation when compared to other available implementations.
It's always been a disappointment to me that Haskell crypto performs so badly compared to C given that it should be possible to generate a nice loop from a Haskell fold. It would be great if this could be achieved.
In my experience it hasn't been the numeric computation, but the coercing of words from the bytestring that costs the extra time. If you know of a zero copy way to access an (aligned) bytestring as an array of unboxed words then that would be great. Unaligned could probably just drop down to using Binary or Cereal without bothering too many users.
* Keys should receive their own data type declarations (no cryptographic information floating around in a naked ByteString)
How will you handle algorithms that allow different key sizes e.g. AES?
I envision key data types being specific to each algorithm so each algorithm can decide from itself how to handle things. A stab at AES might look like data AESKey = Key { raw :: B.ByteString, expanded :: B.ByteString } So in this cases the difference between AES128, 192, and 256 is rather hidden unless you call the "keyLength" routine. The interface doesn't have to be realized this way, its just one possibility. Another option would be data AESKey n = Key { raw :: B.ByteString, expanded :: B.ByteString } And have the constructors enforce the concept that "n = keyLength k ==> k :: AESKey Tn" where Tn is some empty data declaration "data T128", "data T192", "data T256". Perhaps non-ideal, but I'm just saying it's possible with the interface. Obviously having three data types is also possible, data AES128 = K128 ... data AES192 = K192 ... data AES258 = K258 ... but this implies three different cipher instances unless we drop back to the concept of instantiating the cipher for an empty data declaration and use functional dependencies to state the key implies the instance (and not the other way around). Exactly which method -Algs would agree on is up in the air, but I'm leaning toward the first one (key length is immaterial to the user / not available in the type system once you have the key).
* Data types should have Binary and Serialize instances.
I'm not clear why you would want this. I suspect some specifications will want data serialized in a specific way and then encrypted. It's not clear to me that this would necessarily be the way Binary had serialized it. Or have I missed the point?
By "data types" I was meaning digests (and thinking Keys too, in the back of my mind). I believe it should be a requirement of the type class that all digests are serializable because that is needed so often.
Having mentioned padding, I didn't see anything about it in your proposal.
Because I haven't figured out what I/we would want there yet. Feel free to help figure that out.
As an aside, I would have some concerns about using a Haskell crypto package for applications where failure could have serious consequences. For example, you probably want keys to be flushed from memory after they have been used. It's not clear what guarantees you would have that this had been done in a Haskell implementation. I'm hoping that FIPS certification of openssl would mean concerns like that had been addressed. I think aiming for this for a Haskell implementation would be further that anyone would want to go but there certainly ought to be a warning on the package about it's limitations in this respect.
Sounds reasonable. I have some ideas for another iteration (thanks to Adams comment for prompting some changes) and will post once I get time - in the final throws of the Spring term so I have no time this coming week. I should have ported DRBG to the new class before the next e-mail too. Cheers, Thomas

On 05/25/2010 03:11 PM, Thomas DuBuisson wrote:
Future Crypto Library Principles I propose two "new" libraries be built and receive some higher level of community attention. One is "Crypto" which will define type classes and a limited set of common generic algorithms (ex: HMAC, cipher modes of operation). The other is "crypto-algs" which implements (and instantiates type classes for) as many common algorithms as we can reasonably group together with similar interfaces. I feel the breakdown is important - if the community-accepted interface is separated from any algorithms then maintainers of alternate implementations are more likely to accept the interface.
Why two libraries instead of n+1? Wouldn't it make sense to just have one library (what you call "Crypto") define the interface as one package, and then have a number of packages that implement that interface as a series of other modules? This has the advantage that: (1) You only need to download the packages (and dependencies of said) for those crypto routines you need. (2) It removes the need for a separate authority to approve inclusion of newer / faster / better crypto implementations. (Thus speeding up the process and lowering the barrier to entry.)
Enumerating principles I support: * Lazy ByteStrings should be used for all input data
Really? Why? I've actually been considering going back to both the SHA and RSA packages and redoing them using strict ByteStrings. Recent experience has suggested that strict ByteStrings are almost always what I want, and building a fast lazy ByteString interface over strict ByteString routines seems like a pretty trivial task. - Adam

All, A new pair of typeclasses are below and in the repo [1]. Mostly this is just me tweaking the Hash class and updating DRBG [2] to use the new interface (tests not yet run I might have broken something, but that wouldn't be the interfaces fault). The classes include: Note L. is ByteString.Lazy while B. is strict bytestrings. ===== class (Binary d, Serialize d) => Hash ctx d | d -> ctx, ctx -> d where outputLength :: Tagged d BitLength blockLength :: Tagged d BitLength initialCtx :: ctx updateCtx :: ctx -> B.ByteString -> ctx finalize :: ctx -> B.ByteString -> d strength :: Tagged d BitLength ===== I was considering having a 'needAlignment :: Tagged d ByteLength' value for Hashes. The reasoning was [3]. ==== class BlockCipher k where blockSize :: Tagged k BitLength encryptBlock :: k -> B.ByteString -> B.ByteString decryptBlock :: k -> B.ByteString -> B.ByteString buildKey :: B.ByteString -> Maybe k keyLength :: k -> BitLength -- ^ keyLength may inspect its argument to return the length ==== Other helper functions exist that build on the class primitives to provide operations such as hash and hash'. The TODO list includes: - Look harder at the other classes including "BlockCipher", "AsymCipher", "StreamCipher" - example instances of each class - example uses of each class - Collecting tests, building a test framework - Move "for" and (.::.) into the Tagged library (?) - Decide what we want on padding - Decide what we want with crypto-related items that aren't directly a cipher or hash (ex: pbkdf2). - Decide on package name (replace "Crypto" or select a new name? Goes with another recent threads' topic) - Implement modes Individual responses: Bas said:
Why not use the Edward Kmett's 'tagged'[1] package for these methods? As in: outputLength :: Tagged d BitLength
Done. I like it.
Adam Wick
Why two libraries instead of n+1? Wouldn't it make sense to just have one library (what you call "Crypto") define the interface as one package, and then have a number of packages that implement that interface as a series of other modules?
It will start as just 1 (crypto) then I'm leaning toward targeting n+2 where n is the number of packages that have the desired interface and testing (currently zero). -Algs can simply re-export from alg specific packages (i.e. is a meta package) when such package exists and is maintained. I feel there is value in a well supported algorithm collection, namely uniform inclusion policy and maintenance; this doesn't stop algorithm specific packages from targeting the Crypto API, that is the whole point of having Crypto and Crypto-Algs separate.
Enumerating principles I support: * Lazy ByteStrings should be used for all input data
Really? Why? I've actually been considering going back to both the SHA and RSA packages and redoing them using strict ByteStrings. Recent experience has suggested that strict ByteStrings are almost always what I want, and building a fast lazy ByteString interface over strict ByteString routines seems like a pretty trivial task.
It was this comment that caused me to realize the class interface should all be strict bytestrings performing component operations (matches crypto definitions better anyway) and have helper functions that use these component functions to provide strict and lazy operations. For example, the Hash class defines initialContext, update, and finalize while helper functions use these to provide hash and hash'. Such design was already the idea behind cipher, just didn't consciously realize it. Cheers, Thomas [1] http://code.haskell.org/~tommd/crypto/ [2] http://code.haskell.org/~tommd/DRBG/ [3] Reasoning behind the currently excluded 'neededAlignment' value The 'needAlignment' value is the byte alignment assumed by the Hash for input data (presumably 1, 2, 4, or 8). The 'hash' helper function (or any users of 'finialize' or 'update') checks the alignment of the input data - if it is not aligned then it's copied into a newly allocated bytestring, allowing the implementation to assume 64 bit alignment (new allocation rule in Haskell 2010). Implementations that use alignment-safe word extraction (ex: Cereal) can just specify 1 while other implementations (ex: for performance reasons pureMD5 used to use an unsafePerformIO ... peekElem ...) can request proper alignment. But this is a hack job, we need to get a high performance way to extract unboxed words from a bytestring that will fall back to a safe method when the alignment isn't correct (Cereal is measurably lower performance than unsafePerformIO with peekElem).

On Wed, May 26, 2010 at 12:11 AM, Thomas DuBuisson
* For Crypto-Algs pure Haskell code may be preferred but should not be used when costing over ~20% performance degradation when compared to other available implementations.
20% is quite a large performance hit, but within the bounds of reason. Something to keep in mind, however: The newer Intel processors have some hardware support for AES. As I should be acquiring one of those in the near future, I intend to rewrite the AES package to take advantage; of course, preferably by exploiting existing working code. I don't believe any software implementation will get within 20% of the resulting performance. In other words, you will need to do the same. Though I wouldn't mind if you use mine.. -- Svein Ove Aas
participants (6)
-
Adam Wick
-
Bas van Dijk
-
Dominic Steinitz
-
Edward Kmett
-
Svein Ove Aas
-
Thomas DuBuisson