Thomas, could we flesh out the performance argument for bulk random bit generation? It seems to me that if we do a good enough job with Data.Bits that an interface that produces a word at a time is not that bad in terms of throughput. (On the bit producer side, producing in bulk and then doling out word-at-a-time seemed to work pretty well for me when I was testing it in intel-aes.) The biggest problem I see is that a kind of redundant buffering could occur if both the bit-producer and the bit-consumer want bulk randomness but have to communicate it through individual words.

It seems that with newGen we could provide a default implementation of split and keep it in RandomGen. Yes, it would be poor. But it would mean that things like Mersenne don't have to throw an error. Further, if stdGen were to, say, provide cryptographically strong RNG with sound splitting -- would the quality of the default split be as much of a concern? Someone who chooses anything other than stdGen in that case would have to have a good reason for doing so, and I think can be saddled with the consequences.

On Tue, Jun 28, 2011 at 6:01 PM, Thomas DuBuisson <thomas.dubuisson@gmail.com> wrote:

Ryan Newton <rrnewton@gmail.com> wrote:
> There implementation has two major pieces:
> (1) Sources of random bits (class RandomGen)
> (2) Instances of class Random that map use random bits to create Haskell
> types

I'd say there are three pieces, initial entropy (clock, external seed,
system crypto random generator), deterministic generator interface
(PRNG, the RandomGen class), and the instances of that class.

I tried to answer the first item in System.Crypto.Random. The 'random' package
never really had an answer for entropy and I'm not sure what the
community thinks about that. Perhaps answering this problem in slightly
obscure packages is OK.

> The issue is
> that the legacy RandomGen API isn't very good at generating random bits.

Agreed. This is why CryptoRandomGen uses ByteString (so we can
generate any number of BYTES of random values).

As I tried to argue when I make RandomGen more polymorphic, this is an
issue with hardcoding the type as Int. Unfortunately, people didn't
accept that alteration and I feel continuing to generating Int while
allowing devs to discover the amount of entropy isn't taking things
far enough.

> I'd also be happy to hear other proposals for API changes and additions.

As I say below, I don't understand why we can't use ByteString. There
are (or should be) rather fast decodings of all the popular primitive
types from bytestrings, so this takes care of our polymorphism issue.

I accept that, unlike the CryptoRandomGen, we don't want an explicit
failure for RandomGen. But how about a common interface for
instantiating generators ('newGen')? Is there a reason not to have that?

--- CODE ---
class RandomGen g where
next :: g -> (Int, g)
nextBits :: g -> BitLength -> (ByteString, g)
genBits :: g -> Int
newGen :: ByteString -> g
--- END CODE ---

Also, perhaps we could still have an explicit reseed?

--- CODE ---
class Reseedable g where
reseed :: g -> Bytestring -> g
--- END CODE ---

I'll stop here before I entirely recreate CryptoRandomGen, but without
the explicit errors and in more classes (the next step would be a method
for querying how much entropy is needed for instantiation).

> System.Random can't depend on bytestring, so I doubt we want block RNG in
> the RandomGen class.

Why can't it? System.Ranomd isn't part of H2010 and H98 already needs
its own random module.

I'll try to make time to review the code, you'll hear if I do.

Cheers,
Thomas