Splittable random numbers

Hi Cafe A while back there was a threadhttp://www.mail-archive.com/haskell-cafe@haskell.org/msg79633.html about a good implementation of a (pseudo) random number generator with a good "split" operation. There's lots of material on generators that generate a linear sequence of random numbers, but much less on how to generate a tree of random numbers, which is what Haskell's System.Random API requires. I happened to meet Burton Smith recently, who wrote some early papers about this stuff (eg "Pseudo random trees in Monte-Carlohttp://portal.acm.org/citation.cfm?id=1746034"), so I asked him. His reply is below, along with some follow-up comments from his colleagues Tolga Acar and Gideon Yuval. The generator uses crypto functions, so it's probably more computationally expensive than common linear-sequence generators, but in exchange you get robust splitting. Does anyone feel like taking the idea and turning it into a Haskell library? (Or even a Haskell Wiki page?) I'm taking the liberty of cross-posting to the libraries list. Simon From: Burton Smith Sent: Tuesday, November 02, 2010 3:58 PM To: Simon Peyton-Jones Cc: Gideon Yuval (Gideon Yuval); Tolga Acar Subject: Random number generation With some help from Gideon and Tolga, I think the solution to the "arbitrary tree of random numbers problem" is as follows: The generator G is a pair comprising a crypto key G.k and an integer counter (the "message") G.c. The (next G) operation returns a pair: 1. a random integer r obtained by encrypting G.c with G.k, and 2. a new generator G' such that G'.k = G.k and G'.c = G.c + 1. The (split G) operation is similar, returning the same G', except that instead of returning a random integer r it returns a third generator G'' such that G''.k = r and G''.c = 0. A suitable block cipher system might be 128-bit AES (Rijndael). Unencumbered implementations exist in a variety of languages, and performance is pretty good and will improve dramatically as hardware support improves. I'd pick both crypto key size and the size of the result r to be 128 bits, and employ a 64 bit counter c. Other crypto options exist. From: Simon Peyton-Jones Sent: Wednesday, November 03, 2010 3:11 AM To: Burton Smith; Gideon Yuval (Gideon Yuval) Cc: Tolga Acar; Simon Peyton-Jones Subject: RE: Random number generation Burton, Gideon, Tolga Aha, that's interesting. I'd never seen a random number generator based on crypto, but it seems like an attractive idea. As I understand it, successive calls to 'next' will give you encrypt(0), encrypt(1), encrypt(2), encrypt(3),.... Is this standard? Does it have provably good randomness properties, (cycle length, what else?) like other RNGs? Or does it simply seem very plausible? Can I send it round to the Haskell mailing list, in the hope that someone will turn the idea into a library? (Ideally I'd like to make claims about the randomness properties in doing so, hence my qns above.) From: Gideon Yuval (Gideon Yuval) Sent: Wednesday, November 03, 2010 7:15 AM To: Simon Peyton-Jones; Burton Smith Cc: Tolga Acar Subject: RE: Random number generation As long as the key, and the non-counting part of the counter, are kept" secret", anyone who can distinguish these pseudorandoms from real random, in less than 2^128 steps, has a nice paper for crypto-2011 (this is known as "provable security") concerning a weakness in AES128. One exception: real randoms have a birthday paradox; the pseudorandoms suggested do not. If you care, you can: (1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or (2) XOR 2 such encrypted counters, with different keys; or (3) XOR 3 successive values for the same counter (just possibly cheaper; top-of-head idea). More hard-core: swap the position of key & message: encrypting a constant "secret" with 1,2,3,4.... Gives pseudorandoms with no birthday paradox. From: Tolga Acar Sent: 03 November 2010 15:50 To: Gideon Yuval (Gideon Yuval); Simon Peyton-Jones; Burton Smith Subject: RE: Random number generation Simon, The general idea is not really that new in the crypto area with constraints Gideon describes, of course. That is typically called a PRNG - Pseudo Random Number Generator, or in another parlance, Deterministic Random Bit Generators (DRBG). The DRBG constructions based on hash functions and block ciphers are even standardized in NIST publication SP800-90 (even though I may not recommend every one of them). As for the construction below, that is based on the AES block cipher, that essentially takes advantage of the PRP (Pseudo Random Permutation) property of the AES block cipher, as each block cipher ought to be. So, as Gideon outlines below, if you fix the key, the PRP gives you a random-looking (or, in other terms, indistinguishable from random) output that no one without the secret key and the "state" can generate or easily predict. Assuming an ideal cipher (and AES is a close approximation to it), the probability is believed to be the birthday paradox - no counterexample or a proof exists without assumptions; so we stick to the birthday bound. There ought to be papers on this somewhere. If not, that condensed information is spread across many papers and is part of the crypto folklore, I'd say. From: Burton Smith Sent: 03 November 2010 19:03 To: Simon Peyton-Jones Cc: Gideon Yuval (Gideon Yuval); Tolga Acar Subject: RE: Random number generation Just two points of further clarification: 1. All PRNGs used in the technical computing space fail the birthday paradox criterion (i.e. have full period), so we really need not worry about this. Also, there are other mitigating factors, e.g. suppose we are using the PRNG to generate pseudorandom residues mod n << 2^128; the paradox is happily present there. 2. The big innovation in this scheme is that the rekeying operation done by split creates a new generator with independence guaranteed by "provable security" in the sense Gideon mentioned - if someone can find something nonrandom in the correlation between G' and G'', say, then it amounts to a weakness in AES128 and is publishable. So it's yet another example of reducibility, common in our field: "if you can easily transform a known/famous hard problem P into this other problem Q, Q must be hard".

On Thu, Nov 04, 2010 at 05:38:12PM +0000, Simon Peyton-Jones wrote:
The generator uses crypto functions,
Does that mean it couldn't be used in some countries?
so it's probably more computationally expensive than common linear-sequence generators, but in exchange you get robust splitting.
I wonder if you can make a splittable generator that uses crypto functions when you split it, but is a common linear-sequence generator otherwise? Thanks Ian

I might have a use for this, so I could give it a go. I'll have a look through this post in detail tomorrow morning. RS On 04/11/10 17:38, Simon Peyton-Jones wrote:
Hi Cafe
A while back there was a thread http://www.mail-archive.com/haskell-cafe@haskell.org/msg79633.html about a good implementation of a (pseudo) random number generator with a good "split" operation. There's lots of material on generators that generate a linear *sequence* of random numbers, but much less on how to generate a *tree* of random numbers, which is what Haskell's System.Random API requires.
I happened to meet Burton Smith recently, who wrote some early papers about this stuff (eg "Pseudo random trees in Monte-Carlo http://portal.acm.org/citation.cfm?id=1746034"), so I asked him.
His reply is below, along with some follow-up comments from his colleagues Tolga Acar and Gideon Yuval. The generator uses crypto functions, so it's probably more computationally expensive than common linear-sequence generators, but in exchange you get robust splitting.
Does anyone feel like taking the idea and turning it into a Haskell library? (Or even a Haskell Wiki page?) I'm taking the liberty of cross-posting to the libraries list.
Simon
*From:* Burton Smith *Sent:* Tuesday, November 02, 2010 3:58 PM *To:* Simon Peyton-Jones *Cc:* Gideon Yuval (Gideon Yuval); Tolga Acar *Subject:* Random number generation
With some help from Gideon and Tolga, I think the solution to the "arbitrary tree of random numbers problem" is as follows:
The generator G is a pair comprising a crypto key G.k and an integer counter (the "message") G.c. The (next G) operation returns a pair: 1. a random integer r obtained by encrypting G.c with G.k, and 2. a new generator G' such that G'.k = G.k and G'.c = G.c + 1. The (split G) operation is similar, returning the same G', except that instead of returning a random integer r it returns a third generator G'' such that G''.k = r and G''.c = 0.
A suitable block cipher system might be 128-bit AES (Rijndael). Unencumbered implementations exist in a variety of languages, and performance is pretty good and will improve dramatically as hardware support improves. I'd pick both crypto key size and the size of the result r to be 128 bits, and employ a 64 bit counter c. Other crypto options exist.
*From:* Simon Peyton-Jones *Sent:* Wednesday, November 03, 2010 3:11 AM *To:* Burton Smith; Gideon Yuval (Gideon Yuval) *Cc:* Tolga Acar; Simon Peyton-Jones *Subject:* RE: Random number generation
Burton, Gideon, Tolga
Aha, that's interesting. I'd never seen a random number generator based on crypto, but it seems like an attractive idea. As I understand it, successive calls to 'next' will give you
encrypt(0), encrypt(1), encrypt(2), encrypt(3),....
Is this standard? Does it have provably good randomness properties, (cycle length, what else?) like other RNGs? Or does it simply seem very plausible?
Can I send it round to the Haskell mailing list, in the hope that someone will turn the idea into a library? (Ideally I'd like to make claims about the randomness properties in doing so, hence my qns above.)
*From:* Gideon Yuval (Gideon Yuval) *Sent:* Wednesday, November 03, 2010 7:15 AM *To:* Simon Peyton-Jones; Burton Smith *Cc:* Tolga Acar *Subject:* RE: Random number generation
As long as the key, and the non-counting part of the counter, are kept" secret", anyone who can distinguish these pseudorandoms from real random, in less than 2^128 steps, has a nice paper for crypto-2011 (this is known as "provable security") concerning a weakness in AES128.
One exception: real randoms have a birthday paradox; the pseudorandoms suggested do not. If you care, you can:
(1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or
(2) XOR 2 such encrypted counters, with different keys; or
(3) XOR *_3_* successive values for the same counter (just possibly cheaper; top-of-head idea).
More hard-core: swap the position of key & message: encrypting a constant "secret" with 1,2,3,4.... Gives pseudorandoms with *_no_* birthday paradox.
*From:* Tolga Acar *Sent:* 03 November 2010 15:50 *To:* Gideon Yuval (Gideon Yuval); Simon Peyton-Jones; Burton Smith *Subject:* RE: Random number generation
Simon,
The general idea is not really that new in the crypto area with constraints Gideon describes, of course. That is typically called a PRNG -- Pseudo Random Number Generator, or in another parlance, Deterministic Random Bit Generators (DRBG). The DRBG constructions based on hash functions and block ciphers are even standardized in NIST publication SP800-90 (even though I may not recommend every one of them).
As for the construction below, that is based on the AES block cipher, that essentially takes advantage of the PRP (Pseudo Random Permutation) property of the AES block cipher, as each block cipher ought to be. So, as Gideon outlines below, if you fix the key, the PRP gives you a random-looking (or, in other terms, indistinguishable from random) output that no one without the secret key and the "state" can generate or easily predict. Assuming an ideal cipher (and AES is a close approximation to it), the probability is believed to be the birthday paradox - no counterexample or a proof exists without assumptions; so we stick to the birthday bound.
There ought to be papers on this somewhere. If not, that condensed information is spread across many papers and is part of the crypto folklore, I'd say.
*From:* Burton Smith *Sent:* 03 November 2010 19:03 *To:* Simon Peyton-Jones *Cc:* Gideon Yuval (Gideon Yuval); Tolga Acar *Subject:* RE: Random number generation
Just two points of further clarification:
1. All PRNGs used in the technical computing space fail the birthday paradox criterion (/i.e. /have full period), so we really need not worry about this. Also, there are other mitigating factors, /e.g./ suppose we are using the PRNG to generate pseudorandom residues mod n << 2^128; the paradox is happily present there.
2. The big innovation in this scheme is that the rekeying operation done by split creates a new generator with independence guaranteed by "provable security" in the sense Gideon mentioned -- if someone can find something nonrandom in the correlation between G' and G'', say, then it amounts to a weakness in AES128 and is publishable. So it's yet another example of reducibility, common in our field: "if you can easily transform a known/famous hard problem P into this other problem Q, Q must be hard".
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On November 4, 2010 13:38:12 Simon Peyton-Jones wrote:
From: Gideon Yuval (Gideon Yuval) Sent: Wednesday, November 03, 2010 7:15 AM To: Simon Peyton-Jones; Burton Smith Cc: Tolga Acar Subject: RE: Random number generation
As long as the key, and the non-counting part of the counter, are kept" secret", anyone who can distinguish these pseudorandoms from real random, in less than 2^128 steps, has a nice paper for crypto-2011 (this is known as "provable security") concerning a weakness in AES128.
One exception: real randoms have a birthday paradox; the pseudorandoms suggested do not. If you care, you can:
(1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or
(2) XOR 2 such encrypted counters, with different keys; or
(3) XOR 3 successive values for the same counter (just possibly cheaper; top-of-head idea).
More hard-core: swap the position of key & message: encrypting a constant "secret" with 1,2,3,4.... Gives pseudorandoms with no birthday paradox.
Am I correct in understanding that the birthday paradox reference is that a pseudo random permutation (which this must be as the block crypto function has to be one-to-one) can't repeat numbers, unlike a random sequence. I would think this is quite related to a lot of what is discussed on Wikipedia under "block cipher modes of operation" http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation and is presumably well researched. In particular, I see there are some common solutions (e.g., cipher-block chaining -- xor your previous ciphertext [random value] with your next bit of plain text [the incrementing counter]). Cheers! -Tyson

On Thu, Nov 04, 2010 at 05:38:12PM +0000, Simon Peyton-Jones wrote:
There's lots of material on generators that generate a linear sequence of random numbers, but much less on how to generate a tree of random numbers, which is what Haskell's System.Random API requires.
I'd like to understand what the fundamental difficulty is. I thought that e.g. cryptographic PRNGs have the requirement that the output of the PRNG cannot be used to guess its internal state (and thus the future output). Hence one would think that a sufficiently strong PRNG can be used to generate the seed for a new PRNG, which then shouldn't have any observable similarity to the old PRNG. So a naive implementation of split would be: split g = (mkGen seed, g') where (seed, g') = random g (Where mkGen creates a new state from some sufficiently big seed data.) So what is the problem here? What kinds of observable interdependencies between split streams would come up with the above definition using common PRNGs? Are my assumptions about the security of cryptographic PRNGs incorrect, or is the issue simply that they are too expensive for "ordinary" random number generation? Lauri

On Wed, Nov 10, 2010 at 11:33 AM, Lauri Alanko
So a naive implementation of split would be:
split g = (mkGen seed, g') where (seed, g') = random g
(Where mkGen creates a new state from some sufficiently big seed data.)
So what is the problem here? What kinds of observable interdependencies between split streams would come up with the above definition using common PRNGs? Are my assumptions about the security of cryptographic PRNGs incorrect, or is the issue simply that they are too expensive for "ordinary" random number generation?
Yeah, I was thinking for any "good" PRNG this should be fine. We probably want to pull as much internal state as we can from one generator to the other so we may want to use a specialized seed routine that is optimized for a specific PRNG rather than using an Int or something. John

On Wed, Nov 10, 2010 at 11:33 AM, Lauri Alanko
So a naive implementation of split would be:
split g = (mkGen seed, g') where (seed, g') = random g
Just to be clear, that is the same as Burton Smith's original proposal that Simon mentioned at the outset, right? Specifically, Smith's proposal is yours instantiated with a crypto based PRNG? So, except for the silliness of generating 128 random bits to make an Int, the following would implement the strategy using the "crypto" package on Hackage, correct? -------------------------------------------------- import Codec.Encryption.AES import Data.LargeWord import System.Random data RNG = RNG Word128 Word128 deriving Show next128 (RNG k c) = (encrypt k c, RNG k (c+1)) instance RandomGen RNG where next g = (fromIntegral n, g') where (n,g') = next128 g split g@(RNG k c) = (g', RNG n 0) where (n,g') = next128 g -------------------------------------------------- The reason I brought up AES-NI was that doing AES in hardware would allow about an 8X improvement over the best software implementation (~2 cycles per byte encrypted). Comparison would be warranted, but perhaps it could make the crypto based PRNG efficient enough for day-to-day use. Best, -Ryan

On Sat, Jan 22, 2011 at 12:40:04AM -0500, Ryan Newton wrote:
On Wed, Nov 10, 2010 at 11:33 AM, Lauri Alanko
wrote: So a naive implementation of split would be:
split g = (mkGen seed, g') where (seed, g') = random g
Just to be clear, that is the same as Burton Smith's original proposal that Simon mentioned at the outset, right? Specifically, Smith's proposal is yours instantiated with a crypto based PRNG?
Yes, I realized this in hindsight. I hadn't read Smith's proposal fully and didn't expect it to be so simple. My own interest in this discussion actually came from an OCaml project, where I needed an operation to generate new RNGs: val split : Random.State.t -> Random.State.t The obvious idea was just to create a new RNG by using a randomly generated seed: let split s = Random.State.make (Array.init 55 (fun _ -> Random.State.bits s)) However, I remembered years ago reading in the source of the Haskell Random module that splitting RNGs robustly is supposed to be an open problem so I wondered what the issue is. Since random numbers came up on the Haskell mailing list at the same time, I decided to ask. But since Burton apparently came up with the same solution, using AES in counter mode as the RNG, maybe there is no issue after all. Except perhaps performance? Is this approach less robust with a faster, non-cryptographic RNG?
So, except for the silliness of generating 128 random bits to make an Int, the following would implement the strategy using the "crypto" package on Hackage, correct?
next128 (RNG k c) = (encrypt k c, RNG k (c+1))
To my understandind that's indeed using AES in counter mode, but I'm no crypto expert. Lauri

perhaps performance? Is this approach less robust with a faster, non-cryptographic RNG?
Yes, I don't understand that either. Is there a reason that using a weaker PRNG in this case is WORSE than using it in the non-splitting case? Is that why there is more of an impetus to use the cryptographic approach in this case? Anyway, taking for granted that the Burton approach is a useful thing to have implemented, I started developing a package for this stuff -- AES based RNG including both a haskell implementation and wrapping an AESNI-based C one . I haven't posted it to Hackage yet, but you can find the git repository here: https://github.com/rrnewton/intel-aes If you build with cabal and run the benchmark-intel-aes-rng executable, it will give you a breakdown like this: How many random numbers can we generate in a second on one thread? Cost of rdtsc (ffi call): 83 Approx getCPUTime calls per second: 205,640 Approx clock frequency: 3,306,891,339 First, timing with System.Random interface: 193,178,901 random ints generated [constant zero gen] 14,530,358 random ints generated [System.Random stdGen] 16,346 random ints generated [BurtonGenSlow/reference] 32,965 random ints generated [BurtonGenSlow] Comparison to C's rand(): 118,766,285 random ints generated [rand/store in C loop] 114,668,028 random ints generated [rand / Haskell loop] 114,675,116 random ints generated [rand/store Haskell] At the moment this is Haskell-only, I haven't included the wrapped Intel assembly code library yet. As you can see, the pure-Haskell AES based RNG (BurtonGenSlow) is pretty slow. Would anyone else be interested in running those RNG testing tools (diehard, big crush) on this to make sure it is working correctly? Also I'd be happy if anyone with a performance-oriented-eye would like to take a look at what's going on. Both for the sake of the serial performance (above) and because the parallel performance is currently *mysterious* (see below). I figure one of the main reasons for splittable RNG is deterministic parallel computations. Thus it's important that all threads be able to run the RNG efficiently. Right now, if you look at SimpleRNGBench.hs I'm just running the same RNG on numCapabilities threads. Yet with that simple test I'm running into problems, summarized thus: * I get substantial (3X) variance in program performance on consecutive runs. * I see a minor hit in performance adding -threaded, but going from -N1 to -N4 (even with a single-thread of work) yields a big hit in performance and increase in variance. * -N4 with four actual threads of work is actually pretty good for the pure haskell version. All four threads on my nehalem 3.33ghz can maintain 93% of their throughput in the serial case. BUT the variance problem persists. * I run a busy-wait loop that measures cpu frequency... and this seems to get messed up in threaded mode (even with -qm -qa). I don't know why. * I cannot killThread a haskell thread (forkIO or forkOS) that is currently running a divergent FFI call (safe or unsafe). (See "time_c".) You can find the details in the DEVLOG here: https://github.com/rrnewton/intel-aes/blob/master/CHANGELOG Let me know if you have any ideas. I'm going to leave the Haskell version how it is and focus on wrapping the Intel asm (which has a permissive license). Cheers, -Ryan P.S. Regarding this benchmarking -- would it be appropriate to use Criterion for this? Or is it sufficient to measure aggregate throughput as I've been doing?

Hi Cafe,
I've included Gladman's efficient, portable C implementation of AES
generating random numbers now, and the hardware-accelerated version is
working too. I'm currently seeing higher throughput for even the software
based version than the builtin stdGen:
First, timing with System.Random interface:
13,051,964 random ints generated [System.Random stdGen] ~ 252
cycles/int
15,635 random ints generated [PureHaskell/reference] ~ 210,763
cycles/int
31,159 random ints generated [PureHaskell] ~ 105,757
cycles/int
2,180,488 random ints generated [Gladman inefficient] ~ 1,511
cycles/int
15,015,095 random ints generated [Gladman] ~ 219
cycles/int
That seems like a good argument for cryptographic RNGs to me!
I'm having a lot of trouble getting cabal to build/install it successfully.
You can see what I've got there now. I'd be interested to know if anyone
else can build it successfully. It should work -- but only by building the
assembly code into a .so and assuming the build directory is /opt/intel-aes
;-).
I don't have real numbers for the hardware version yet because the Westmere
machine I'm logged into is redhat 5.4 and is giving me "GLIBC_2.7 not found"
errors. You can run it for correctness purposes using an emulation tool
called sde (software development
emulator)http://software.intel.com/en-us/articles/intel-software-development-emulator...that's
based on dynamic binary translation.
-Ryan
P.S. Checkout command:
git clone git://github.com/rrnewton/intel-aes.git
On Sat, Jan 29, 2011 at 8:52 AM, Ryan Newton
perhaps performance? Is this approach less robust with a faster,
non-cryptographic RNG?
Yes, I don't understand that either. Is there a reason that using a weaker PRNG in this case is WORSE than using it in the non-splitting case? Is that why there is more of an impetus to use the cryptographic approach in this case?
Anyway, taking for granted that the Burton approach is a useful thing to have implemented, I started developing a package for this stuff -- AES based RNG including both a haskell implementation and wrapping an AESNI-based C one . I haven't posted it to Hackage yet, but you can find the git repository here:
https://github.com/rrnewton/intel-aes
If you build with cabal and run the benchmark-intel-aes-rng executable, it will give you a breakdown like this:
How many random numbers can we generate in a second on one thread? Cost of rdtsc (ffi call): 83 Approx getCPUTime calls per second: 205,640 Approx clock frequency: 3,306,891,339 First, timing with System.Random interface: 193,178,901 random ints generated [constant zero gen] 14,530,358 random ints generated [System.Random stdGen] 16,346 random ints generated [BurtonGenSlow/reference] 32,965 random ints generated [BurtonGenSlow] Comparison to C's rand(): 118,766,285 random ints generated [rand/store in C loop] 114,668,028 random ints generated [rand / Haskell loop] 114,675,116 random ints generated [rand/store Haskell]
At the moment this is Haskell-only, I haven't included the wrapped Intel assembly code library yet. As you can see, the pure-Haskell AES based RNG (BurtonGenSlow) is pretty slow.
Would anyone else be interested in running those RNG testing tools (diehard, big crush) on this to make sure it is working correctly?
Also I'd be happy if anyone with a performance-oriented-eye would like to take a look at what's going on. Both for the sake of the serial performance (above) and because the parallel performance is currently *mysterious*(see below).
I figure one of the main reasons for splittable RNG is deterministic parallel computations. Thus it's important that all threads be able to run the RNG efficiently. Right now, if you look at SimpleRNGBench.hs I'm just running the same RNG on numCapabilities threads. Yet with that simple test I'm running into problems, summarized thus:
* I get substantial (3X) variance in program performance on consecutive runs. * I see a minor hit in performance adding -threaded, but going from -N1 to -N4 (even with a single-thread of work) yields a big hit in performance and increase in variance. * -N4 with four actual threads of work is actually pretty good for the pure haskell version. All four threads on my nehalem 3.33ghz can maintain 93% of their throughput in the serial case. BUT the variance problem persists. * I run a busy-wait loop that measures cpu frequency... and this seems to get messed up in threaded mode (even with -qm -qa). I don't know why. * I cannot killThread a haskell thread (forkIO or forkOS) that is currently running a divergent FFI call (safe or unsafe). (See "time_c".)
You can find the details in the DEVLOG here:
https://github.com/rrnewton/intel-aes/blob/master/CHANGELOG
Let me know if you have any ideas. I'm going to leave the Haskell version how it is and focus on wrapping the Intel asm (which has a permissive license).
Cheers, -Ryan
P.S. Regarding this benchmarking -- would it be appropriate to use Criterion for this? Or is it sufficient to measure aggregate throughput as I've been doing?

Small update:
I got the first results from the hardware accelerated version on a 3.33 ghz
westmere machine. Right now it does twice as well as the Gladman software
version, which is also twice as well as the System.Random stdgen, and 1000
times faster than a the Haskell implementation of AES that I got from the
Crypto package:
How many random numbers can we generate in a second on one thread?
Cost of rdtsc (ffi call): 84
Approx getCPUTime calls per second: 209,798
Approx clock frequency: 3,331,093,772
First, timing with System.Random interface:
76,811,104 random ints generated [constant zero gen]
14,482,725 random ints generated [System.Random stdGen]
16,061 random ints generated [PureHaskell/reference]
32,309 random ints generated [PureHaskell]
2,401,893 random ints generated [Gladman inefficient]
15,980,625 random ints generated [Gladman]
2,329,500 random ints generated [IntelAES inefficient]
32,383,799 random ints generated [IntelAES]
Comparison to C's rand():
71,392,408 random ints generated [rand/store in C loop]
71,347,778 random ints generated [rand in Haskell loop]
71,324,158 random ints generated [rand/store in Haskell loop]
This is what Burton Smith originally thought, that AES based RNG would be
pretty fast and even faster with hardware acceleration.
-Ryan
On Mon, Jan 31, 2011 at 1:25 AM, Ryan Newton
Hi Cafe,
I've included Gladman's efficient, portable C implementation of AES generating random numbers now, and the hardware-accelerated version is working too. I'm currently seeing higher throughput for even the software based version than the builtin stdGen:
First, timing with System.Random interface: 13,051,964 random ints generated [System.Random stdGen] ~ 252 cycles/int 15,635 random ints generated [PureHaskell/reference] ~ 210,763 cycles/int 31,159 random ints generated [PureHaskell] ~ 105,757 cycles/int 2,180,488 random ints generated [Gladman inefficient] ~ 1,511 cycles/int 15,015,095 random ints generated [Gladman] ~ 219 cycles/int
That seems like a good argument for cryptographic RNGs to me!
I'm having a lot of trouble getting cabal to build/install it successfully. You can see what I've got there now. I'd be interested to know if anyone else can build it successfully. It should work -- but only by building the assembly code into a .so and assuming the build directory is /opt/intel-aes ;-).
I don't have real numbers for the hardware version yet because the Westmere machine I'm logged into is redhat 5.4 and is giving me "GLIBC_2.7 not found" errors. You can run it for correctness purposes using an emulation tool called sde (software development emulator)http://software.intel.com/en-us/articles/intel-software-development-emulator...that's based on dynamic binary translation.
-Ryan
P.S. Checkout command: git clone git://github.com/rrnewton/intel-aes.git
On Sat, Jan 29, 2011 at 8:52 AM, Ryan Newton
wrote: perhaps performance? Is this approach less robust with a faster,
non-cryptographic RNG?
Yes, I don't understand that either. Is there a reason that using a weaker PRNG in this case is WORSE than using it in the non-splitting case? Is that why there is more of an impetus to use the cryptographic approach in this case?
Anyway, taking for granted that the Burton approach is a useful thing to have implemented, I started developing a package for this stuff -- AES based RNG including both a haskell implementation and wrapping an AESNI-based C one . I haven't posted it to Hackage yet, but you can find the git repository here:
https://github.com/rrnewton/intel-aes
If you build with cabal and run the benchmark-intel-aes-rng executable, it will give you a breakdown like this:
How many random numbers can we generate in a second on one thread? Cost of rdtsc (ffi call): 83 Approx getCPUTime calls per second: 205,640 Approx clock frequency: 3,306,891,339 First, timing with System.Random interface: 193,178,901 random ints generated [constant zero gen] 14,530,358 random ints generated [System.Random stdGen] 16,346 random ints generated [BurtonGenSlow/reference] 32,965 random ints generated [BurtonGenSlow] Comparison to C's rand(): 118,766,285 random ints generated [rand/store in C loop] 114,668,028 random ints generated [rand / Haskell loop] 114,675,116 random ints generated [rand/store Haskell]
At the moment this is Haskell-only, I haven't included the wrapped Intel assembly code library yet. As you can see, the pure-Haskell AES based RNG (BurtonGenSlow) is pretty slow.
Would anyone else be interested in running those RNG testing tools (diehard, big crush) on this to make sure it is working correctly?
Also I'd be happy if anyone with a performance-oriented-eye would like to take a look at what's going on. Both for the sake of the serial performance (above) and because the parallel performance is currently *mysterious*(see below).
I figure one of the main reasons for splittable RNG is deterministic parallel computations. Thus it's important that all threads be able to run the RNG efficiently. Right now, if you look at SimpleRNGBench.hs I'm just running the same RNG on numCapabilities threads. Yet with that simple test I'm running into problems, summarized thus:
* I get substantial (3X) variance in program performance on consecutive runs. * I see a minor hit in performance adding -threaded, but going from -N1 to -N4 (even with a single-thread of work) yields a big hit in performance and increase in variance. * -N4 with four actual threads of work is actually pretty good for the pure haskell version. All four threads on my nehalem 3.33ghz can maintain 93% of their throughput in the serial case. BUT the variance problem persists. * I run a busy-wait loop that measures cpu frequency... and this seems to get messed up in threaded mode (even with -qm -qa). I don't know why. * I cannot killThread a haskell thread (forkIO or forkOS) that is currently running a divergent FFI call (safe or unsafe). (See "time_c".)
You can find the details in the DEVLOG here:
https://github.com/rrnewton/intel-aes/blob/master/CHANGELOG
Let me know if you have any ideas. I'm going to leave the Haskell version how it is and focus on wrapping the Intel asm (which has a permissive license).
Cheers, -Ryan
P.S. Regarding this benchmarking -- would it be appropriate to use Criterion for this? Or is it sufficient to measure aggregate throughput as I've been doing?

I got hold of, and looked through the paper suggested in the root of this thread "Pseudo random trees in Monte-Carlo http://portal.acm.org/citation.cfm?id=1746034", and based upon this I have thrown together a version of the binary tree based random number generator suggested. I would like to point out that I do not know very much about random number generators, the underlying mathematics or any subsequent papers on this subject, this is just a very naive implementation based upon this one paper. As a question, the following code actually generates a stream of numbers that is more random than I was expecting, if anyone can explain why I would be very interested. import System.Random data LehmerTree = LehmerTree {nextInt :: Int, leftBranch :: LehmerTree, rightBranch :: LehmerTree} instance Show LehmerTree where show g = "LehmerTree, current root = "++(show $ nextInt g) mkLehmerTree :: Int->Int->Int->Int->Int->Int->LehmerTree mkLehmerTree aL aR cL cR m x0 = innerMkTree x0 where mkLeft x = (aL * x + cL) `mod` m mkRight x = (aR * x + cR) `mod` m innerMkTree x = let l = innerMkTree (mkLeft x) r = innerMkTree (mkRight x) in LehmerTree x l r mkLehmerTreeFromRandom :: IO LehmerTree mkLehmerTreeFromRandom = do gen<-getStdGen let a:b:c:d:e:f:_ = randoms gen return $ mkLehmerTree a b c d e f instance RandomGen LehmerTree where next g = (fromIntegral.nextInt $ g, leftBranch g) split g = (leftBranch g, rightBranch g) genRange _ = (0, 2147483562) -- duplicate of stdRange test :: IO() test = do gen<-mkLehmerTreeFromRandom print gen let (g1,g2) = split gen let p = take 10 $ randoms gen :: [Int] let p' = take 10 $ randoms g1 :: [Int] -- let p'' = take 10 $ randoms g2 :: [Float] print p print p' -- print p''

On Thu, Nov 11, 2010 at 3:13 AM, Richard Senington
I got hold of, and looked through the paper suggested in the root of this thread “Pseudo random trees in Monte-Carlo", and based upon this I have thrown together a version of the binary tree based random number generator suggested.
I would like to point out that I do not know very much about random number generators, the underlying mathematics or any subsequent papers on this subject, this is just a very naive implementation based upon this one paper.
As a question, the following code actually generates a stream of numbers that is more random than I was expecting, if anyone can explain why I would be very interested.
What do you mean more random than you were expecting? Shouldn't they be "maximally random"? BTW, nice module. Do you want to hackage it up? If not, I will.
import System.Random
data LehmerTree = LehmerTree {nextInt :: Int, leftBranch :: LehmerTree, rightBranch :: LehmerTree}
instance Show LehmerTree where show g = "LehmerTree, current root = "++(show $ nextInt g)
mkLehmerTree :: Int->Int->Int->Int->Int->Int->LehmerTree mkLehmerTree aL aR cL cR m x0 = innerMkTree x0 where mkLeft x = (aL * x + cL) `mod` m mkRight x = (aR * x + cR) `mod` m innerMkTree x = let l = innerMkTree (mkLeft x) r = innerMkTree (mkRight x) in LehmerTree x l r
mkLehmerTreeFromRandom :: IO LehmerTree mkLehmerTreeFromRandom = do gen<-getStdGen let a:b:c:d:e:f:_ = randoms gen return $ mkLehmerTree a b c d e f
This can be pure: mkLehmerTreeFromRandom :: (RandomGen g) => g -> LehmerTree
instance RandomGen LehmerTree where next g = (fromIntegral.nextInt $ g, leftBranch g) split g = (leftBranch g, rightBranch g) genRange _ = (0, 2147483562) -- duplicate of stdRange
test :: IO() test = do gen<-mkLehmerTreeFromRandom print gen let (g1,g2) = split gen let p = take 10 $ randoms gen :: [Int] let p' = take 10 $ randoms g1 :: [Int] -- let p'' = take 10 $ randoms g2 :: [Float] print p print p' -- print p''
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 11/11/10 21:34, Luke Palmer wrote:
On Thu, Nov 11, 2010 at 3:13 AM, Richard Senington
wrote: I got hold of, and looked through the paper suggested in the root of this thread “Pseudo random trees in Monte-Carlo", and based upon this I have thrown together a version of the binary tree based random number generator suggested.
I would like to point out that I do not know very much about random number generators, the underlying mathematics or any subsequent papers on this subject, this is just a very naive implementation based upon this one paper.
As a question, the following code actually generates a stream of numbers that is more random than I was expecting, if anyone can explain why I would be very interested.
What do you mean more random than you were expecting? Shouldn't they be "maximally random"?
My issue is how it should react, given how the underlying data structure works. If you just use this tree to generate numbers, you are taking Left,Left,Left ..... If you split the tree, you get Left and Right. So, in my test code at the bottom I have taken a generator and then generated 10 numbers from it. Then we split. The left hand branch (g1 in the test) should generate numbers that are just "tail.randoms $ gen" but this is not what happens, at least not for raw integers. I have been doing some more testing, and if you limit the range (0-1000 seems to be stable) then it works as described above, however if you use wider ranges, or even the maximum range, then the sequences do not match as expected. This worries me, as one advantage of PRNGs (see paper as I am paraphrasing) is repeatability, or certain expected properties. The underlying system is working, so I probably have the range or data type conversion wrong somewhere. You can test the underlying tree like so. rawTest :: LehmerTree->IO() rawTest t = do print $ myTake 10 t print $ myTake 10 $ leftBranch t print $ myTake 10 $ rightBranch t where myTake 0 _ = [] myTake x t = nextInt t : (myTake (x-1) (leftBranch t))
BTW, nice module. Do you want to hackage it up? If not, I will.
I would be happy to hackage it up, but I think this is a bit premature. I started to read a bit more about PRNGs, and I came across tests for randomness. It seemed that a library of test systems for RandomGens would be quite cool, so I started coding last night. That is far too premature to even post up here, but in short, this system gives some very odd results. For example, mean averages (I tried medians but that did not tell me much, I am going to look at modals some time this weekend). mean :: [Float]->Double mean [] = error "empty list has no mean?" mean xs = ((sum.(map realToFrac)) xs) / (fromIntegral.length $ xs) rangedMeanTest :: RandomGen g=>g->Int->(Int,Int)->Double rangedMeanTest g count range = let p = take count $ randomRs range g in mean (map fromIntegral p) So, I am testing discrete randomness, ints. We take a range we wish to generate (0-3 for example), and generate some number of test values (I used 1000). This list is converted into floats, and averaged. We can then predict what we think the average should be, given that this is an unbiased uniform (or nearly uniform) system. It does not give the results you would want. This may have something to do with picking "good" parameters for the mkLehmerTree function. For example, using a random setup, I just got these results result expected range 16.814 expected = 16.0 (1,31) 16.191 expected = 16.5 (1,32) 16.576 expected = 17.0 (1,33) 17.081 expected = 17.5 (1,34) 17.543 expected = 18.0 (1,35) In short, I am worried by the properties of this random number generator. I propose improving the testing system, and then posting both the test suite and this random generator to Hackage, unless you really want it up now in a very very preliminary form. RS
import System.Random
data LehmerTree = LehmerTree {nextInt :: Int, leftBranch :: LehmerTree, rightBranch :: LehmerTree}
instance Show LehmerTree where show g = "LehmerTree, current root = "++(show $ nextInt g)
mkLehmerTree :: Int->Int->Int->Int->Int->Int->LehmerTree mkLehmerTree aL aR cL cR m x0 = innerMkTree x0 where mkLeft x = (aL * x + cL) `mod` m mkRight x = (aR * x + cR) `mod` m innerMkTree x = let l = innerMkTree (mkLeft x) r = innerMkTree (mkRight x) in LehmerTree x l r
mkLehmerTreeFromRandom :: IO LehmerTree mkLehmerTreeFromRandom = do gen<-getStdGen let a:b:c:d:e:f:_ = randoms gen return $ mkLehmerTree a b c d e f
This can be pure:
mkLehmerTreeFromRandom :: (RandomGen g) => g -> LehmerTree
instance RandomGen LehmerTree where next g = (fromIntegral.nextInt $ g, leftBranch g) split g = (leftBranch g, rightBranch g) genRange _ = (0, 2147483562) -- duplicate of stdRange
test :: IO() test = do gen<-mkLehmerTreeFromRandom print gen let (g1,g2) = split gen let p = take 10 $ randoms gen :: [Int] let p' = take 10 $ randoms g1 :: [Int] -- let p'' = take 10 $ randoms g2 :: [Float] print p print p' -- print p''
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Nov 12, 2010 at 3:33 AM, Richard Senington
In short, I am worried by the properties of this random number generator. I propose improving the testing system, and then posting both the test suite and this random generator to Hackage, unless you really want it up now in a very very preliminary form.
Yeah I think a package of randomness tests could be really useful. Cool :-)
RS
import System.Random
data LehmerTree = LehmerTree {nextInt :: Int, leftBranch :: LehmerTree, rightBranch :: LehmerTree}
instance Show LehmerTree where show g = "LehmerTree, current root = "++(show $ nextInt g)
mkLehmerTree :: Int->Int->Int->Int->Int->Int->LehmerTree mkLehmerTree aL aR cL cR m x0 = innerMkTree x0 where mkLeft x = (aL * x + cL) `mod` m mkRight x = (aR * x + cR) `mod` m innerMkTree x = let l = innerMkTree (mkLeft x) r = innerMkTree (mkRight x) in LehmerTree x l r
mkLehmerTreeFromRandom :: IO LehmerTree mkLehmerTreeFromRandom = do gen<-getStdGen let a:b:c:d:e:f:_ = randoms gen return $ mkLehmerTree a b c d e f
This can be pure:
mkLehmerTreeFromRandom :: (RandomGen g) => g -> LehmerTree
instance RandomGen LehmerTree where next g = (fromIntegral.nextInt $ g, leftBranch g) split g = (leftBranch g, rightBranch g) genRange _ = (0, 2147483562) -- duplicate of stdRange
test :: IO() test = do gen<-mkLehmerTreeFromRandom print gen let (g1,g2) = split gen let p = take 10 $ randoms gen :: [Int] let p' = take 10 $ randoms g1 :: [Int] -- let p'' = take 10 $ randoms g2 :: [Float] print p print p' -- print p''
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 12/11/10 20:56, Bryan O'Sullivan wrote:
On Fri, Nov 12, 2010 at 12:34 PM, Luke Palmer
mailto:lrpalmer@gmail.com> wrote: Yeah I think a package of randomness tests could be really useful. Cool :-)
There are already well-established suites of very thorough PRNG tests, such as Diehard and Big Crush. Please don't invent another. Thankyou for the advice, but since I am just learning about some of this stuff, how about I have ago at implementing some of their tests?

On Fri, Nov 12, 2010 at 1:28 PM, Richard Senington
Thankyou for the advice, but since I am just learning about some of this stuff, how about I have ago at implementing some of their tests?
Sure. See http://www.iro.umontreal.ca/~simardr/testu01/tu01.html for the current state of the art.

On 11/12/10 5:33 AM, Richard Senington wrote:
It does not give the results you would want. This may have something to do with picking "good" parameters for the mkLehmerTree function. For example, using a random setup, I just got these results result expected range 16.814 expected = 16.0 (1,31) 16.191 expected = 16.5 (1,32) 16.576 expected = 17.0 (1,33) 17.081 expected = 17.5 (1,34) 17.543 expected = 18.0 (1,35)
Have you run any significance tests? I wouldn't be surprised to see +/-0.5 as within the bounds of expected randomness. I'm more worried about it seeming to be consistently on the -0.5 end of things, than I am about it not matching expectation (how many samples did you take again?). For small ranges like this, a consistent -0.5 (or +0.5) tends to indicate off-by-one errors in the generator. -- Live well, ~wren

Hi cafe,
I want to add the ability to use AES-NI instructions on Intel architectures
to GHC. Mainly I'd like to do splittable random number generators based on
AES as was suggested at the outset of this email. (I met Burton Smith last
week and this topic came up.)
I was just reading the below thread about the plugin architecture which got
me thinking about what the right way to add AES-NI is. (Disregarding for a
moment portability and the issue of where to test cpuid...)
http://www.haskell.org/pipermail/glasgow-haskell-users/2011-January/019874.h...
The FFI is always an option. But after reading the first N pages I could
come across from google I'm still not totally clear on whether "unsafe"
foreign calls can happen simultaneously from separate Haskell threads (and
with sufficiently low overhead for this purpose).
I also ran across the phrase "compiler primitive" somewhere wrt GHC:
http://hackage.haskell.org/trac/ghc/wiki/AddingNewPrimitiveOperations
Is that the right way to go? Or would the compiler plugin mechanism
possibly allowing doing this without modifying mainline GHC?
Thanks,
-Ryan
On Fri, Nov 12, 2010 at 6:26 PM, wren ng thornton
On 11/12/10 5:33 AM, Richard Senington wrote:
It does not give the results you would want. This may have something to do with picking "good" parameters for the mkLehmerTree function. For example, using a random setup, I just got these results result expected range 16.814 expected = 16.0 (1,31) 16.191 expected = 16.5 (1,32) 16.576 expected = 17.0 (1,33) 17.081 expected = 17.5 (1,34) 17.543 expected = 18.0 (1,35)
Have you run any significance tests? I wouldn't be surprised to see +/-0.5 as within the bounds of expected randomness. I'm more worried about it seeming to be consistently on the -0.5 end of things, than I am about it not matching expectation (how many samples did you take again?). For small ranges like this, a consistent -0.5 (or +0.5) tends to indicate off-by-one errors in the generator.
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (11)
-
Bryan O'Sullivan
-
Ian Lynagh
-
John Meacham
-
Lauri Alanko
-
Luke Palmer
-
Richard Senington
-
Ryan Newton
-
Ryan Newton
-
Simon Peyton-Jones
-
Tyson Whitehead
-
wren ng thornton