Well, my main motivation in the work over the last few days was to fix correctness problems in System.Random (and performance as secondary). I think this matters whether or not it is shipped only for Haskell 98 or not. I've been focused mainly on the Random instances. Does anyone know good points of comparison for these? I've seen lots of PRNGs on Hackage but not lots of code that maps that randomness onto the usual types (Double, Integer, Char, etc).
My hope is to improve random to the point that it provides a good, simple default library. From a pedagogic perspective I imagine many new users want randomness but don't want to scour hackage and learn the more complex interfaces. So as long as 'random' is a clear default I am happy, whether or not it's part of the GHC distro or Haskell Platform.
In my mind a good default implementation has decent-to-good performance, generates uniform distributions (#5278, #5280), and has statistically sound splitting. If we had a those three things maybe there would not have been as much motivation to provide new RNGs!
Re "nextBits":
Thomas, could we flesh out the performance argument for bulk random bit generation? It seems to me that if we do a good enough job with Data.Bits that an interface that produces a word at a time is not that bad in terms of throughput. (On the bit producer side, producing in bulk and then doling out word-at-a-time seemed to work pretty well for me when I was testing it in intel-aes.) The biggest problem I see is that a kind of redundant buffering could occur if both the bit-producer and the bit-consumer want bulk randomness but have to communicate it through individual words.
That said, if bytestring dependency is ok -- why not? It then becomes a backwards-compatible change that may open an extra opportunity to some RNGs.
Re "newGen":
I like Thomas's interface and I've been thinking about the "newGen" issue as well:
newGen :: ByteString -> g
Here ByteString does seem most appropriate, anything else -- Int or [Int] -- seems unsatisfactory.
Just for arguments sake I'd like to reexamine the "SplittableGen" topic in light of "newGen":
It seems that with newGen we could provide a default implementation of split and keep it in RandomGen. Yes, it would be poor. But it would mean that things like Mersenne don't have to throw an error. Further, if stdGen were to, say, provide cryptographically strong RNG with sound splitting -- would the quality of the default split be as much of a concern? Someone who chooses anything other than stdGen in that case would have to have a good reason for doing so, and I think can be saddled with the consequences.
Adding newGen instead of subtracting split would not be Haskell-98 backwards compatible. However, it would create a different kind of breakage than subtracting split. Subtracting split breaks the client code, whereas adding newGen just breaks the instances of RandomGen (hopefully few).
Don't we want to be pushing splittable tree RNG more in the future? GHC for parallelism, right!? Let me give an example of the feature's value -- the Cilk folks are making significant changes to their runtime just to get deterministic parallel RNG:
My feeling is that if we could get the best of what's out there into one library with a simple interface and made it default that it would have a positive impact and avoid unnecessary balkanization and do-it-yourself in the random number department. In order to accomplish this maybe we would need two stdGens. One maximally fast and one maximally sound -- perhaps Mersenne Twister and crypto-based splittable RNG?
By the way, perhaps that "genBits" method should probably be rename "genRangeBits"?
-Ryan
On Tue, Jun 28, 2011 at 6:01 PM, Thomas DuBuisson
<thomas.dubuisson@gmail.com> wrote:
Ryan Newton <
rrnewton@gmail.com> wrote:
> There implementation has two major pieces:
> (1) Sources of random bits (class RandomGen)
> (2) Instances of class Random that map use random bits to create Haskell
> types
I'd say there are three pieces, initial entropy (clock, external seed,
system crypto random generator), deterministic generator interface
(PRNG, the RandomGen class), and the instances of that class.
I tried to answer the first item in System.Crypto.Random. The 'random' package
never really had an answer for entropy and I'm not sure what the
community thinks about that. Perhaps answering this problem in slightly
obscure packages is OK.
> The issue is
> that the legacy RandomGen API isn't very good at generating random bits.
Agreed. This is why CryptoRandomGen uses ByteString (so we can
generate any number of BYTES of random values).
As I tried to argue when I make RandomGen more polymorphic, this is an
issue with hardcoding the type as Int. Unfortunately, people didn't
accept that alteration and I feel continuing to generating Int while
allowing devs to discover the amount of entropy isn't taking things
far enough.
> I'd also be happy to hear other proposals for API changes and additions.
As I say below, I don't understand why we can't use ByteString. There
are (or should be) rather fast decodings of all the popular primitive
types from bytestrings, so this takes care of our polymorphism issue.
I accept that, unlike the CryptoRandomGen, we don't want an explicit
failure for RandomGen. But how about a common interface for
instantiating generators ('newGen')? Is there a reason not to have that?
--- CODE ---
class RandomGen g where
next :: g -> (Int, g)
nextBits :: g -> BitLength -> (ByteString, g)
genBits :: g -> Int
newGen :: ByteString -> g
--- END CODE ---
Also, perhaps we could still have an explicit reseed?
--- CODE ---
class Reseedable g where
reseed :: g -> Bytestring -> g
--- END CODE ---
I'll stop here before I entirely recreate CryptoRandomGen, but without
the explicit errors and in more classes (the next step would be a method
for querying how much entropy is needed for instantiation).
> System.Random can't depend on bytestring, so I doubt we want block RNG in
> the RandomGen class.
Why can't it? System.Ranomd isn't part of H2010 and H98 already needs
its own random module.
I'll try to make time to review the code, you'll hear if I do.
Cheers,
Thomas