Mersenne-random and standard random API

newer
Error in installing dph-examples...

Yves Parès

9 Feb 2012 9 Feb '12

3:26 a.m.

Hi, I've been in the past told that mersenne-random was much better than the standard random package. However, System.Random.Mersenne doesn't follow the general API described in System.Random, MTGen is not an instance of RandomGen. But a sample on System.Random.Mersenne.getStdRandom documentation ( http://hackage.haskell.org/packages/archive/mersenne-random/1.0.0.1/doc/html...) indicates this: rollDice :: IO Int rollDice = getMTRandom (randomR (1,6)) It looks like wrong documentation (checking of older versions show that getMTRandom have never lived or been exposed by this module), but suggests MTRandom should be an instance of RandomGen (if I assume well that randomR is that of System.Random). So is it possible to use the fast and efficient mersenne generator with the convenient and general random API?

Attachments:

attachment.html (text/html — 1013 bytes)

Show replies by date

Ertugrul Söylemez

9 Feb 9 Feb

1:58 p.m.

Yves Parès wrote:

...

I've been in the past told that mersenne-random was much better than the standard random package.

This is relative. The Mersenne Twister algorithm has a large memory footprint, but in relation to the size of the average Haskell program, this is probably not much of an issue.

...

So is it possible to use the fast and efficient mersenne generator with the convenient and general random API?

No, because mersenne-random provides an impure generator with a global state, as is also pointed out by the Haddock documentation. As an alternative consider the mersenne-random-pure64 and the cprng-aes packages. Both provide the interface you want. Personally I prefer the cprng-aes package, because it is reasonably fast and convenient for non-cryptographic applications, but also provides a strong Crypto-API interface, if you need cryptographically strong random numbers. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Aleksey Khudyakov

4:29 p.m.

On 09.02.2012 01:56, Yves Parès wrote:

...

Hi, I've been in the past told that mersenne-random was much better than the standard random package.

...

So is it possible to use the fast and efficient mersenne generator with the convenient and general random API?

I think design of Random type class basically precludes efficient generators with large periods and consequently large state. Look at next function:

...

next :: g -> (Int, g)

It means that state has to be copied but for efficiency we want to mutate it in place. I consider Random type class a failure and ignore it. P.S. For monte-carlo and thing like that I'd recommend mwc-random it more featureful than mersenne-random and don't rely on global state

Jerzy Karczmarczuk

5:02 p.m.

Aleksey Khudyakov :

...

I think design of Random type class basically precludes efficient generators with large periods and consequently large state. Look at next function:

...
next :: g -> (Int, g)

It means that state has to be copied but for efficiency we want to mutate it in place. I consider Random type class a failure and ignore it.

P.S. For monte-carlo and thing like that I'd recommend mwc-random it more featureful than mersenne-random and don't rely on global state I am afraid that Aleksey Khudakov confuses a few things.

1. Mersenne Twister, AND congruential generators AND the Marsaglia stuff, all use some kind of "seed", all are stateful. There are no miracles. Just look the agressive monadization, the form of defaultSeed, etc. within MWC.hs, before saying that this generator doesn't rely on some global state. 2. In the standard generator stuff the state is stored in IoRefs and is not "copied". Did Aleksey ever look inside the sources of these generators? In any case, the seed changes after each generation, and must be stored somewhere. 3. The API question is a conventional one. People who are really unhappy, may make their own interfaces, or give such a mini-project as a students' assignment, instead of weeping that the library is lousy and should be ignored. E. g., I wanted random numers in some purely functional, lazy context, and I didn't want the existing interface ; I manufactured a lazy stream interface, and that was all. "Look Ma!: no global state..." 4. L'Ecuyer signalled some 15 years ago that MWC generators introduce some bias on the most significant bits (complementary MWC are safer). This is less annoying that the last bits periodicity of simple congruential generators, but for SOME Monte-Carlo it may be harmful. ======= In general, I believe that saying publicly that some part of the available library is a failure, should be avoided, unless accompanied by some SERIOUS analysis, and - if possible - some constructive suggestions. With my thanks to all people who made those generators however imperfect they are. Only Mister Nobody is perfect. Jerzy Karczmarczuk

Aleksey Khudyakov

5:39 p.m.

On 09.02.2012 15:32, Jerzy Karczmarczuk wrote:

...

Aleksey Khudyakov : 1. Mersenne Twister, AND congruential generators AND the Marsaglia stuff, all use some kind of "seed", all are stateful. There are no miracles. Just look the agressive monadization, the form of defaultSeed, etc. within MWC.hs, before saying that this generator doesn't rely on some global state.

I think you are missing the point here. Surely all PRNG carry some state around. But both StdGen and mwc-random (and likely many others) allow to have many generators at once (for use in different threads) mersenne-random is just wrapper around vastly impure library (as documentation says) and allows only one genrator per program. This is why I said it uses *global* state

...

2. In the standard generator stuff the state is stored in IoRefs and is not "copied". Did Aleksey ever look inside the sources of these generators? In any case, the seed changes after each generation, and must be stored somewhere.

No. It doesn't and cannot

...

data StdGen = StdGen Int32 Int32

If generator state is stored in IORef it's not possible to implement `next :: g → (Int,g)'. To do something useful with it one have to go to IO monad but we can't. So state have to be copied.

...

3. The API question is a conventional one. People who are really unhappy, may make their own interfaces, or give such a mini-project as a students' assignment, instead of weeping that the library is lousy and should be ignored. E. g., I wanted random numers in some purely functional, lazy context, and I didn't want the existing interface ; I manufactured a lazy stream interface, and that was all. "Look Ma!: no global state..."

Usually. But sometimes it's not possible to implement some algorithms for particular API. For example if PRNG rely on in-place array updates it cannot implement Random type class.

...

4. L'Ecuyer signalled some 15 years ago that MWC generators introduce some bias on the most significant bits (complementary MWC are safer). This is less annoying that the last bits periodicity of simple congruential generators, but for SOME Monte-Carlo it may be harmful.

Thank you for reminder. I wanted to read their paper for some time.

Jerzy Karczmarczuk

6:58 p.m.

Aleksey Khudyakov:

...

On 09.02.2012 15:32, Jerzy Karczmarczuk wrote:

...
1. Mersenne Twister, AND congruential generators AND the Marsaglia stuff, all use some kind of "seed", all are stateful. There are no miracles. Just look the agressive monadization, the form of defaultSeed, etc. within MWC.hs, before saying that this generator doesn't rely on some global state.

I think you are missing the point here. Surely all PRNG carry some state around. But both StdGen and mwc-random (and likely many others) allow to have many generators at once (for use in different threads) This is irrelevant. I believe that there is a misunderstanding in terminology. When I say "global state" it means not a local variable in some function. You seem to say "one object per programme". This is confirmed by:

...

mersenne-random is just wrapper around vastly impure library (as documentation says) and allows only one genrator per program. This is why I said it uses *global* state

...

...
In any case, the seed changes after each generation, and must be stored somewhere.

No. It doesn't and cannot

...
data StdGen = StdGen Int32 Int32

If generator state is stored in IORef it's not possible to implement `next :: g → (Int,g)'. To do something useful with it one have to go to IO monad but we can't. So state have to be copied.

We can't WHAT? Look, all data that change or are created MUST be stored somewhere, don't say dubious things. Your next function is a "threading generator", which makes another StdGen, OK, but this is not a "copy". This is a creation of a new seed. When I spoke about IORefs, I thought about the global generator, which USES the l'Ecuyer stuff, newStdGen and its friends. The threading becomes implicit. Try, say r=newStdGen r >>= return . next and you will see, it works, and you don't keep explicitly your seed. From the efficiency point of view all this is comparable. With IOrefs you do not "pollute" the memory which has to be garbage-collected, but its administration is heavier than the standard heap operations. StdGen with l'Ecuyer two-number seed, or some 600 of the Mersenne, I don't see a conceptual difference. The Marsaglia generator has a global seed quite voluminous as well.

...

...sometimes it's not possible to implement some algorithms for particular API. For example if PRNG rely on in-place array updates it cannot implement Random type class.

No idea what do you mean. In the Random library you will find the generators using IORefs, AND the class Random, with the member random (or randomR, etc.) and you may write getStdRandom random getStdRandom random ... as you wish, getting different results. What's wrong with that? Jerzy Karczmarczuk

Aleksey Khudyakov

10:39 p.m.

On 09.02.2012 17:28, Jerzy Karczmarczuk wrote:

...

Aleksey Khudyakov:

...
On 09.02.2012 15:32, Jerzy Karczmarczuk wrote:

...
1. Mersenne Twister, AND congruential generators AND the Marsaglia stuff, all use some kind of "seed", all are stateful. There are no miracles. Just look the agressive monadization, the form of defaultSeed, etc. within MWC.hs, before saying that this generator doesn't rely on some global state.

I think you are missing the point here. Surely all PRNG carry some state around. But both StdGen and mwc-random (and likely many others) allow to have many generators at once (for use in different threads) This is irrelevant. I believe that there is a misunderstanding in terminology. When I say "global state" it means not a local variable in some function. You seem to say "one object per programme". This is confirmed by:

...
mersenne-random is just wrapper around vastly impure library (as documentation says) and allows only one genrator per program. This is why I said it uses *global* state

...
...
In any case, the seed changes after each generation, and must be stored somewhere.

No. It doesn't and cannot

...
data StdGen = StdGen Int32 Int32

If generator state is stored in IORef it's not possible to implement `next :: g → (Int,g)'. To do something useful with it one have to go to IO monad but we can't. So state have to be copied.

We can't WHAT? Look, all data that change or are created MUST be stored somewhere, don't say dubious things. Your next function is a "threading generator", which makes another StdGen, OK, but this is not a "copy". This is a creation of a new seed. When I spoke about IORefs, I thought about the global generator, which USES the l'Ecuyer stuff, newStdGen and its friends.

The threading becomes implicit. Try, say r=newStdGen r >>= return . next

and you will see, it works, and you don't keep explicitly your seed. From the efficiency point of view all this is comparable. With IOrefs you do not "pollute" the memory which has to be garbage-collected, but its administration is heavier than the standard heap operations. StdGen with l'Ecuyer two-number seed, or some 600 of the Mersenne, I don't see a conceptual difference. The Marsaglia generator has a global seed quite voluminous as well.

I didn't quite understand you In order to implement RandomGen one have to implement `next' function

...

next :: g → (Int,g)

Lets assume that g stores internal generator state (In case of NWC256 it's 258 Word32s). We cannot modify it in place. Someone may hold this g and changing it behind the scenes will violate referential transparence. We have to create new array and this is expensive. There still way out as Duncan Coutts pointed out. We may generate stream of random numbers in lazy ST monad and use them as random generator.

...

No idea what do you mean. In the Random library you will find the generators using IORefs, AND the class Random, with the member random (or randomR, etc.) and you may write getStdRandom random getStdRandom random ... as you wish, getting different results. What's wrong with that?

Nothing. I was talking about problems with `next' function. One always can use IORefs to create global generator but that's irrelevant

Yves Parès

10 Feb 10 Feb

8:08 p.m.

I just thought about something: basically all these APIs provides a "IO [a]" (where a is a randomly generable type) function. Is there a problem with the approach that is to rely on lazy evaluation to pass to pure code (either explicitely or through State) the infinite list generated in IO and consume its head each time we need a random value? Because there is no issue such as resource holding, like in the case of file reading and enumerators, which would make lazy IO a not so good approach... 2012/2/9 Aleksey Khudyakov

...

On 09.02.2012 17:28, Jerzy Karczmarczuk wrote:

...
Aleksey Khudyakov:

...
On 09.02.2012 15:32, Jerzy Karczmarczuk wrote:

...
1. Mersenne Twister, AND congruential generators AND the Marsaglia stuff, all use some kind of "seed", all are stateful. There are no miracles. Just look the agressive monadization, the form of defaultSeed, etc. within MWC.hs, before saying that this generator doesn't rely on some global state.

I think you are missing the point here. Surely all PRNG carry some state around. But both StdGen and mwc-random (and likely many others) allow to have many generators at once (for use in different threads)

This is irrelevant. I believe that there is a misunderstanding in terminology. When I say "global state" it means not a local variable in some function. You seem to say "one object per programme". This is confirmed by:

mersenne-random is just wrapper around vastly impure library (as

...
documentation says) and allows only one genrator per program. This is why I said it uses *global* state

In any case, the seed changes after each generation, and

...
...
must be stored somewhere.

No. It doesn't and cannot

...
data StdGen = StdGen Int32 Int32

If generator state is stored in IORef it's not possible to implement `next :: g → (Int,g)'. To do something useful with it one have to go to IO monad but we can't. So state have to be copied.

We can't WHAT? Look, all data that change or are created MUST be stored somewhere, don't say dubious things. Your next function is a "threading generator", which makes another StdGen, OK, but this is not a "copy". This is a creation of a new seed. When I spoke about IORefs, I thought about the global generator, which USES the l'Ecuyer stuff, newStdGen and its friends.

The threading becomes implicit. Try, say r=newStdGen r >>= return . next

and you will see, it works, and you don't keep explicitly your seed. From the efficiency point of view all this is comparable. With IOrefs you do not "pollute" the memory which has to be garbage-collected, but its administration is heavier than the standard heap operations. StdGen with l'Ecuyer two-number seed, or some 600 of the Mersenne, I don't see a conceptual difference. The Marsaglia generator has a global seed quite voluminous as well.

I didn't quite understand you

In order to implement RandomGen one have to implement `next' function

...
next :: g → (Int,g)

Lets assume that g stores internal generator state (In case of NWC256 it's 258 Word32s). We cannot modify it in place. Someone may hold this g and changing it behind the scenes will violate referential transparence. We have to create new array and this is expensive.

There still way out as Duncan Coutts pointed out. We may generate stream of random numbers in lazy ST monad and use them as random generator.

No idea what do you mean. In the Random library you will find the

...
generators using IORefs, AND the class Random, with the member random (or randomR, etc.) and you may write getStdRandom random getStdRandom random ... as you wish, getting different results. What's wrong with that?

Nothing. I was talking about problems with `next' function. One always can use IORefs to create global generator but that's irrelevant

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

Aleksey Khudyakov

10:35 p.m.

On 10.02.2012 18:38, Yves Parès wrote:

...

I just thought about something: basically all these APIs provides a "IO [a]" (where a is a randomly generable type) function. Is there a problem with the approach that is to rely on lazy evaluation to pass to pure code (either explicitely or through State) the infinite list generated in IO and consume its head each time we need a random value? Because there is no issue such as resource holding, like in the case of file reading and enumerators, which would make lazy IO a not so good approach...

It's more like "Seed → [a]". IO is not really needed here. I can see following problems. None are real showstoppers (except maybe 2) 1. One may want to generate values of different types and/or distibutions. This is easily solved by state monad. But I can't see advantage over wrapped ST. 2. No way to make snapshot of generator state. 3. Laziness overhead. It may be significant if you try to sqeeze last bit of performance.

Yves Parès

11:15 p.m.

...

It's more like "Seed → [a]". IO is not really needed here. You're absolutely right. I stand corrected.

...

1. One may want to generate values of different types and/or distibutions. Yes, I saw this limitation before. But you can just generate one infinite list (and then one seed) per distribution, right? Or could you generate something homomorphic to [(Random a) => a]? (Each element would then basically be a closure containing the current Seed)

...

2. No way to make snapshot of generator state. You mean the current seed? You don't need to access that if you only consume PRNs from the infinite list(s), and that during the whole program.

...

3. Laziness overhead. It may be significant if you try to squeeze last bit of performance. I'm afraid that if you're frightened by the laziness overhead then you won't use Haskell. I'm kidding, but it's just I don't picture the overhead being so huge in that case. What so big would the runtime need to keep in memory except for the closure that generates the infinite list?

2012/2/10 Aleksey Khudyakov

...

On 10.02.2012 18:38, Yves Parès wrote:

...
I just thought about something: basically all these APIs provides a "IO [a]" (where a is a randomly generable type) function. Is there a problem with the approach that is to rely on lazy evaluation to pass to pure code (either explicitely or through State) the infinite list generated in IO and consume its head each time we need a random value? Because there is no issue such as resource holding, like in the case of file reading and enumerators, which would make lazy IO a not so good approach...

It's more like "Seed → [a]". IO is not really needed here. I can see following problems. None are real showstoppers (except maybe 2)

1. One may want to generate values of different types and/or distibutions. This is easily solved by state monad. But I can't see advantage over wrapped ST.

2. No way to make snapshot of generator state.

3. Laziness overhead. It may be significant if you try to sqeeze last bit of performance.

Duncan Coutts

9 Feb 9 Feb

7:57 p.m.

On 9 February 2012 10:59, Aleksey Khudyakov wrote:

...

...
So is it possible to use the fast and efficient mersenne generator with the convenient and general random API?

I think design of Random type class basically precludes efficient generators with large periods and consequently large state. Look at next function:

...
next :: g -> (Int, g)

It means that state has to be copied but for efficiency we want to mutate it in place. I consider Random type class a failure and ignore it.

Actually it is not true that the state has to be copied. Using the lazy ST monad we can implement this interface and internally use mutable ST arrays. See for example http://web.archive.org/web/20090108050217/http://www.augustsson.net/Darcs/MT... It ends up with this function that generates the infinite lazy sequence of words from a seed word. This can be then made to fit the next :: g -> (Int, g) interface, with g = [W]. mersenneTwister :: W -> [W] mersenneTwister s = L.runST (mersenneTwisterST s) For reference: import Control.Monad.ST.Lazy as L type W = Word32 mersenneTwisterST :: W -> L.ST s [W] So yes this looses a small constant factor because of the extra lazy list (which could be reduced somewhat), so it's not suitable for the absolutely max performance interface, but it certainly allows the use of traditional mutable PRNGs with the nice interface with decent performance. Certainly none of these PRNGs support split, but that's a separate issue. Overall, I think the Random type class is salvageable given a few changes. In the end, the Random module probably needs both an ST-monadic interface and a pure one. People can use the ST one if it happens to be more convenient or if they need to squeeze the last drop of performance out. Duncan

Aleksey Khudyakov

10:41 p.m.

On 09.02.2012 18:27, Duncan Coutts wrote:

...

Actually it is not true that the state has to be copied. Using the lazy ST monad we can implement this interface and internally use mutable ST arrays.

See for example http://web.archive.org/web/20090108050217/http://www.augustsson.net/Darcs/MT...

It ends up with this function that generates the infinite lazy sequence of words from a seed word. This can be then made to fit the next :: g -> (Int, g) interface, with g = [W].

This is very elegant trick! Ability to snapshot and restore generator is lost.

...

So yes this looses a small constant factor because of the extra lazy list (which could be reduced somewhat), so it's not suitable for the absolutely max performance interface, but it certainly allows the use of traditional mutable PRNGs with the nice interface with decent performance.

Certainly none of these PRNGs support split, but that's a separate issue. Overall, I think the Random type class is salvageable given a few changes. In the end, the Random module probably needs both an ST-monadic interface and a pure one. People can use the ST one if it happens to be more convenient or if they need to squeeze the last drop of performance out.

There is another problem. Functions like `next' aren't meant to be used by humans. One have to thread state manually and this is tedious and error-prone. So PRNG should be wrapped into monad. Are `next'-like function needed at all? I'd expect that creation of generic and useful type class would be very nontrivial.

5082

Age (days ago)

5084

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

Aleksey Khudyakov
Duncan Coutts
Ertugrul Söylemez
Jerzy Karczmarczuk
Yves Parès