
I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? I'm not sure where I would expect Random to go, perhaps Math or maybe the toplevel, but under System seems, well, random... I notice random-fu puts it under Data, which is also not where I'd look, except that you always look in Data because everything goes into Data... but algorithms dealing with random numbers aren't really data structures either, are they?

I think of it as natural for exactly the reason you stated (the data
comes from the OS). It seems even more natural to me in the entropy
package module 'System.Entropy' as I am accustom to the phrase system
entropy. Equally, I would fine a 'Network.Entropy' module acceptable
under the assumption it connects to one of the public random number
servers for it's data.
Perhaps a top level "Random." should be used, but that too can be
questioned. For example, when I import the module "Random" or perhaps
"Random.Generators" would I get fast prngs? Cryptographic prngs?
Both? Something else (both slow and weak, like what we have now ;-)
)?
Cheers,
Thomas
On Tue, Aug 16, 2011 at 1:04 PM, Evan Laforge
I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms?
I'm not sure where I would expect Random to go, perhaps Math or maybe the toplevel, but under System seems, well, random...
I notice random-fu puts it under Data, which is also not where I'd look, except that you always look in Data because everything goes into Data... but algorithms dealing with random numbers aren't really data structures either, are they?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Aug 16, 2011, at 4:04 PM, Evan Laforge wrote:
I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms?
I'm not sure where I would expect Random to go, perhaps Math or maybe the toplevel, but under System seems, well, random...
I notice random-fu puts it under Data, which is also not where I'd look, except that you always look in Data because everything goes into Data... but algorithms dealing with random numbers aren't really data structures either, are they?
System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the "system" is - as you say - to get an initial seed value for a global instance. When I wrote random-fu I chose to use Data.Random based on the perspective is that a random variable or process _is_ just a mathematical object, and can be represented by an abstract data structure. I'm sure there's a case to be made against that view too, and if someone were to present a good argument for something better I'd even consider changing it. It seems to me, though, that the line between data and not-data is pretty fuzzy in a functional language (which is one of the many things that makes them great), and for me it seems quite natural to think of random variables as data. At least, in practice it "feels" a lot more like manipulating data than anything else. But then I'm one of those weirdos who thinks of "IO t" as "just a data structure" too. -- James

On Tue, Aug 16, 2011 at 17:07, James Cook
On Aug 16, 2011, at 4:04 PM, Evan Laforge wrote:
I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms?
System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the "system" is - as you say - to get an initial seed value for a global instance.
I'd be tempted to guess that the whole reason it's under System is the IO component. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Yeah, fair enough about getting the seed. I think I like the idea of breaking them into System.Entropy and then Random or Data.Random. It feels odd to stick pure algorithm packages, which simply accept a random seed or stream from elsewhere, under System.Random. There are a fair number of alternate implementations on hackage which suggests people aren't satisfied with the stdlib one but haven't quite settled down on a standard alternative. James Cook, good point about Data. I suppose that's also why it seems to be the catch all for everything in haskell.

I don't like the idea of Data.Random because random numbers use ordinary number types, and the generator itself is not the object of interest, the numbers are. I'd much prefer Math.Random. As the Math prefix isn't used in the core libraries maybe Control.Random is the least unpalatable alternative. Regards, Niklas

Brandon Allbery
I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms?
System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the "system" is - as you say - to get an initial seed value for a global instance.
I'd be tempted to guess that the whole reason it's under System is the IO component.
That's not really valid, is it? After all the new 'time' package is also stationed under the Data tree, and it has a similarly large IO component. I have to say, it seems very intuitive to me to look for it under Data, even though I'm not sure why. Probably I'm just used to it. Time has a strong connection to the operating system and the hardware, so it could just as well go into the System tree. For (non-cryptographic) randomness however we are dealing with numerical data, for which the connection to the system is mere convenience, so I wouldn't mind finding it under Data at all. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Hi all,
I'm the maintainer of random. If people could decide on what the
alternative name would be we could put it through the library proposal
process. It seems that one problem at this moment is the lack of a single,
clear "right" answer. Replacing one debatable not-quite-right choice with
another may not be satisfying ;-).
Also, what Thomas says is right. The current implementation is SLOW and
WEAK, which would not seem to make a good default implementation. The goal
is to replace it with something better so that the default random package is
strong in at least one dimension. I think this is important because I
imagine many people use the default package, for example because they don't
want to scour hackage and try all the alternatives.
My proposal for this has been to use AES based crypto-prng. I think that is
fast enough (i.e. faster than what's currently there), very strong, and
splittable. New Intel and AMD hardware has hardware support for AES which
makes it even faster. The intel-aes package provides this functionality,
with and without hardware support. But there's work left to do in terms of
testing, making sure its cross platform, etc. Anyone who's interested in
helping (especially with Windows support) would be warmly welcomed!
Cheers,
-Ryan
On Wed, Aug 17, 2011 at 4:46 AM, Ertugrul Soeylemez
Brandon Allbery
wrote: I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms?
System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the "system" is - as you say - to get an initial seed value for a global instance.
I'd be tempted to guess that the whole reason it's under System is the IO component.
That's not really valid, is it? After all the new 'time' package is also stationed under the Data tree, and it has a similarly large IO component. I have to say, it seems very intuitive to me to look for it under Data, even though I'm not sure why. Probably I'm just used to it. Time has a strong connection to the operating system and the hardware, so it could just as well go into the System tree. For (non-cryptographic) randomness however we are dealing with numerical data, for which the connection to the system is mere convenience, so I wouldn't mind finding it under Data at all.
Greets, Ertugrul
-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ryan Newton
I'm the maintainer of random. If people could decide on what the alternative name would be we could put it through the library proposal process. It seems that one problem at this moment is the lack of a single, clear "right" answer. Replacing one debatable not-quite-right choice with another may not be satisfying ;-).
Also, what Thomas says is right. The current implementation is SLOW and WEAK, which would not seem to make a good default implementation. The goal is to replace it with something better so that the default random package is strong in at least one dimension. I think this is important because I imagine many people use the default package, for example because they don't want to scour hackage and try all the alternatives.
My proposal for this has been to use AES based crypto-prng. I think that is fast enough (i.e. faster than what's currently there), very strong, and splittable. New Intel and AMD hardware has hardware support for AES which makes it even faster. The intel-aes package provides this functionality, with and without hardware support. But there's work left to do in terms of testing, making sure its cross platform, etc. Anyone who's interested in helping (especially with Windows support) would be warmly welcomed!
Using a cryptographically strong random number generator here is probably a very bad idea. Two reasons: Firstly while being faster than the current implementation an AES-based implementation will still be considerably slower than the Mersenne Twister algorithm. This may or may not be true, if hardware AES support is there, but don't just assume that everybody has AES instructions now. For example I don't have them. Secondly there is no standard requiring that the default random number generator is cryptographically safe. Changing this particular implementation, which is the one most people use, to a CSPRNG will make people take for granted that System.Random is safe to use in security-related products, because it would be very convenient. This will render strong security products trivially weak, when compiled with the wrong Haskell distribution, and you will find packages with statements like: "We assume that you use Ryan Newton's distribution of the random package." I would rather propose the Mersenne Twister as the default random number generator. You could add AES as a secondary generator for people requiring cryptographic strength, but then do it properly, i.e. impure, because most people, when reading about a PRNG with "AES" anywhere in its name, will just assume that it's a CSPRNG. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

The problem with Mersenne twister is that it doesn't split well. The main
reason for crypto prng in this package would not be to advertise to people
that "System.Random can be used for security-related apps" *but to make
splitting reasonably safe*. It's not good enough to have a known-bad
generator under splitting provided as the default. And I think we need
splitting, especially as more Haskell programs become parallel. Would it
address your concerns to not mention the crypto nature of the standard
implementation in the System.Random documentation?
I think there's also a reasonable argument to lean towards *correctness* over
performance in Haskell's defaults. For example, unconstrained Num bindings
default to Integer and likewise random numbers could be as strong as
possible by default, and those looking for raw rands/sec throughput could
make other informed choices.
I had thought that maybe we could bifurcate the "stdgen" concept into a fast
and a strong version, which could be say Mersenne Twister and AES
respectively. But, again, the problem comes if the fast version is expected
to be splittable as well.
With "SplittableGen" factored out from "RandomGen" I suppose it would be
possible for the fast version to NOT offer splitting. However, Mersenne
Twister is best used with an imperative interface, you can see the tension
in the pure version of the mersenne package on hackage:
http://hackage.haskell.org/package/mersenne-random-pure64-0.2.0.3
Please also see Burton Smith's comments below in response to my proposal to
offer a MT + AES combination.
Best,
-Ryan
---------- Forwarded message ----------
From: Burton Smith
Ryan Newton
wrote:
Using a cryptographically strong random number generator here is probably a very bad idea. Two reasons:
Firstly while being faster than the current implementation an AES-based implementation will still be considerably slower than the Mersenne Twister algorithm. This may or may not be true, if hardware AES support is there, but don't just assume that everybody has AES instructions now. For example I don't have them.
Secondly there is no standard requiring that the default random number generator is cryptographically safe. Changing this particular implementation, which is the one most people use, to a CSPRNG will make people take for granted that System.Random is safe to use in security-related products, because it would be very convenient. This will render strong security products trivially weak, when compiled with the wrong Haskell distribution, and you will find packages with statements like: "We assume that you use Ryan Newton's distribution of the random package."
I would rather propose the Mersenne Twister as the default random number generator. You could add AES as a secondary generator for people requiring cryptographic strength, but then do it properly, i.e. impure, because most people, when reading about a PRNG with "AES" anywhere in its name, will just assume that it's a CSPRNG.
Greets, Ertugrul
-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Aug 17, 2011 at 11:10 AM, Ryan Newton
The problem with Mersenne twister is that it doesn't split well. The main reason for crypto prng in this package would not be to advertise to people that "System.Random can be used for security-related apps" *but to make splitting reasonably safe*.
The more fundamental problem is that splitting is neither well understood nor generally safe, and as such it should not be in the basic Random class. A more sensible API would have a Random class that lacks a split operation, and a SplittableRandom class that permits it, as you mention later in your message. Most current PRNGs could then be instances of Random, but not SplittableRandom. And I think we need splitting, especially as more Haskell programs become
parallel.
I do not agree here, I'm afraid. By the way, my mwc-random package is at least as fast as mersenne-twister, has smaller state, and is pure Haskell.

The more fundamental problem is that splitting is neither well understood nor generally safe, and as such it should not be in the basic Random class.
Would you mind elaborating? Splitting was not well-understood by the original authors of System.Random; that much is in the comments. Nor is it well understood by me. But I am under the impression that it is well understood by Burton Smith and others who have worked on the topic, and that they assure us that using AES, RNG's under any series of splits are as strong as those generated in a linear sequence. (And if you show otherwise, you have a crypto paper and quite a name for yourself.)
And I think we need splitting, especially as more Haskell programs become
parallel.
I do not agree here, I'm afraid.
Could you expound on this also? The people I know in the parallelism community seem to care a lot about deterministic PRNG in parallel programs. For example, the Cilk folks at MIT and Intel who I work with are *modifying their runtime system *just to get deterministic parallel PRNG. For example our in our "Monad Par" package splittable RNG will allow us to add a variant of the monad that provides randomness and transparently routes the state through the "forks" in the parallel computation, retaining the model's determinism. -Ryan

On Wed, Aug 17, 2011 at 12:27 PM, Ryan Newton
The more fundamental problem is that splitting is neither well understood
nor generally safe, and as such it should not be in the basic Random class.
Would you mind elaborating?
Certainly. The purpose of splitting a PRNG is to create two new PRNGs, with the following properties: - Long periods - The streams generated by each PRNG must be independent - Splitting must be fast, while preserving the above two properties It's very easy to write a split function that gets at least one of these considerations wrong.
But I am under the impression that it is well understood by Burton Smith and others who have worked on the topic, and that they assure us that using AES, RNG's under any series of splits are as strong as those generated in a linear sequence.
The trouble is that very few people have worked on this topic - there's almost no published literature on it. And Burton and his colleagues readily admit that the technique they suggest is a matter of folk wisdom in the crypto community. A typical technical application of a PRNG is for Monte Carlo processes, where you want to (a) have a very fast PRNG and (b) understand its properties. Burton's off-the-cuff suggestion is all very well, but we don't know important things like what the performance or period of that PRNG would be. For instance, if we don't know a PRNG's period, we don't know when we're going to start seeing autocorrelations in its output, or if it supports splitting, how many splits are safe to perform before we start seeing *cross *-stream correlation. This in turn means that we don't know whether, when, or why the consumers of our pseudo-random numbers are going to themselves start producing garbage results, and that's troubling to me. Could you expound on this also? The people I know in the parallelism
community seem to care a lot about deterministic PRNG in parallel programs.
Yep, but don't conflate determinism with splitting. In the imperative world, you normally know how many CPUs you have, so you initialize one PRNG per CPU, and simply go from there; there's no need for splitting. In the parallel community, people are going out of their way to *avoid* splitting. The only good treatments of this subject that I know are published by the SPRNG authors at FSU.

Yep, but don't conflate determinism with splitting. In the imperative world, you normally know how many CPUs you have, so you initialize one PRNG per CPU, and simply go from there; there's no need for splitting. In the parallel community, people are going out of their way to *avoid* splitting.
I'm having trouble thinking of scenarios where per-CPU does the trick. Do you mean one per pthread rather than one per CPU? In the Cilk case, you've got to deal with work stealing of course. So you want rand() to generate a result that is determined by the current stack-frame's position in the binary-tree of spawns. In the work I was referring to: http://groups.csail.mit.edu/sct/wiki/index.php?title=Other_Projects#Determin... ... they try a bunch of different methods and I can't remember if any of them split the RNG "eagerly" as they go down the spawn tree or if they just record the tree-index on the way down and then read it out when they generated randoms. (I think the latter.) Cheers, -Ryan

On Wed, Aug 17, 2011 at 8:56 AM, Ryan Newton
I'm the maintainer of random. If people could decide on what the alternative name would be we could put it through the library proposal process. It seems that one problem at this moment is the lack of a single, clear "right" answer. Replacing one debatable not-quite-right choice with another may not be satisfying ;-).
The entire premise of that discussion seems quite ridiculous. All that renaming the modules will achieve is the breakage of several hundred packages. The roots of the module naming hierarchy - Control, Data, System, and so on - are so muddled and inconsistently used that the best advice you could give to people who raise this topic is to pretend those roots are simply not there. My proposal for this has been to use AES based crypto-prng.
We'd be better off if you could seek consensus from PRNG maintainers on a fixed-up Random class before attacking this problem, so that we'd have a better chance of achieving cross-package compatibility.
participants (8)
-
Brandon Allbery
-
Bryan O'Sullivan
-
Ertugrul Soeylemez
-
Evan Laforge
-
James Cook
-
Niklas Larsson
-
Ryan Newton
-
Thomas DuBuisson