DRBG pre-announce and a discussion on RNG/Crypto infrastructure

25 May 2010

      The DRBG package [1] is a partial implementation of NIST SP 800-90
section 10, specifically the hash and HMAC based cryptographically
secure random number generators are implemented.  Before doing further
work I feel the state of affairs of cryptography on hackage needs to
change.

As has became too common, DRBG implements (or should implement)
several features that could be done once for all time in a quality
Crypto package.  Features such as type classes for hashes and ciphers,
an hmac definition, and a MonadRandom or something similar for random
number generators that can fail.  What existing code implements one or
more of these features that should be able to use a central Crypto
package?  Happstack, SHA2, DRBG, SHA, hS3 (actually uses crypto
despite [Word8] where ByteString should be used), hoauth, and a few
partial-bindings to OpenSSL satisfy several needs.

Future Crypto Library Principles
I propose two "new" libraries be built and receive some higher level
of community attention.  One is "Crypto" which will define type
classes and a limited set of common generic algorithms (ex: HMAC,
cipher modes of operation).  The other is "crypto-algs" which
implements (and instantiates type classes for) as many common
algorithms as we can reasonably group together with similar
interfaces.  I feel the breakdown is important - if the
community-accepted interface is separated from any algorithms then
maintainers of alternate implementations are more likely to accept the
interface.

Enumerating principles I support:
* Lazy ByteStrings should be used for all input data
* While "over-typeclassification" should be avoided, two or three
basic classes of "Digest", "Cipher", and "CryptoRandom" make good
sense.
* CryptoRandom, and related infrastructure would closely follow Random
and MonadRandom except for allowing clean failure.
* Digest result types should be unique to the digest (a la pureMD5,
contrary to SHAs use of "Digest" for 4 different hashs and contrary to
SHA2s use of ByteString for all digests)
* For Crypto-Algs pure Haskell code may be preferred but should not be
used when costing over ~20% performance degradation when compared to
other available implementations.
* No dependence on external libraries (e.g. no OpenSSL bindings)
* Keys should receive their own data type declarations (no
cryptographic information floating around in a naked ByteString)
* Data types should have Binary and Serialize instances.
* All code should be licensed BSD3 or equivalent

Open Questions on Class Design
What should be made instances of our classes.  Should an empty data
declaration have an instance, and the class have a way to "extract"
the actual digest [2]?  Perhaps it should be the data type containing
the digest that is the instance, so the hash algorithm can be inferred
based on the result type.

Required methods:  I'd propose we follow [3] (see [2] for instances)
except removing the empty data declarations, using the digest/key
types for the instance.  In brief, this includes encrypt/decrypt (or
'hash), strength, outputLength, blockLength,

What algorithms are generic enough to include in Crypto?  I'd say
anything that works for all instances of {Cipher, Digest}.

Minimum quality for inclusion: A suite of KATs and some straight
forward QuickCheck tests.

Anticipating Concerns
1) (Re-) Implementing Crypto is hard / dangerous / a waste of time
Most the benefit I aim for is in the common interface, not the
algorithms.  That said, the -Algs package can largely be a gathering
of best-of-hackage with sanitized interface and continually-updated as
higher-performance implementations come available.  This might have
been how the current "Crypto" package started, I don't know, but it is
no longer in a sane or useful state.

2) OpenSSL is fine
For some people, yes - absolutely.  Not just fine but insane to
replace, seeing as it's (partly) FIPS certified.  For other people,
such as basically all the packages I listed above, it's obviously
lacking something be it portability, footprint, or interface.

3) The Monad class /  MonadRandom can fail - it's called "error" and "fail".
This is a matter of taste.  I personally wish Haskell didn't have
exceptions or had an explicitly exceptionless type.  At any rate, the
random work would probably come last, especially if I'm doing the
implementation.

I intend to tackle this once I get some personal time freed up (around
mid June).  If you're interested then please feel free to lay claim to
some part of this problem or comment on the proposal.

Cheers,
Thomas

[1] http://code.haskell.org/~tommd/DRBG/
Remaining tasks include optimization, test suite (NIST provides a
large body of KATs), self test procedures, and the higher level
interface (section 9).
[2] http://code.haskell.org/~tommd/DRBG/Test/KAT.hs
[3] http://code.haskell.org/~tommd/DRBG/Data/Crypto/Classes.hs

DRBG pre-announce and a discussion on RNG/Crypto infrastructure

Thomas DuBuisson