Splittable random numbers

Hi Cafe A while back there was a threadhttp://www.mail-archive.com/haskell-cafe@haskell.org/msg79633.html about a good implementation of a (pseudo) random number generator with a good "split" operation. There's lots of material on generators that generate a linear sequence of random numbers, but much less on how to generate a tree of random numbers, which is what Haskell's System.Random API requires. I happened to meet Burton Smith recently, who wrote some early papers about this stuff (eg "Pseudo random trees in Monte-Carlohttp://portal.acm.org/citation.cfm?id=1746034"), so I asked him. His reply is below, along with some follow-up comments from his colleagues Tolga Acar and Gideon Yuval. The generator uses crypto functions, so it's probably more computationally expensive than common linear-sequence generators, but in exchange you get robust splitting. Does anyone feel like taking the idea and turning it into a Haskell library? (Or even a Haskell Wiki page?) I'm taking the liberty of cross-posting to the libraries list. Simon From: Burton Smith Sent: Tuesday, November 02, 2010 3:58 PM To: Simon Peyton-Jones Cc: Gideon Yuval (Gideon Yuval); Tolga Acar Subject: Random number generation With some help from Gideon and Tolga, I think the solution to the "arbitrary tree of random numbers problem" is as follows: The generator G is a pair comprising a crypto key G.k and an integer counter (the "message") G.c. The (next G) operation returns a pair: 1. a random integer r obtained by encrypting G.c with G.k, and 2. a new generator G' such that G'.k = G.k and G'.c = G.c + 1. The (split G) operation is similar, returning the same G', except that instead of returning a random integer r it returns a third generator G'' such that G''.k = r and G''.c = 0. A suitable block cipher system might be 128-bit AES (Rijndael). Unencumbered implementations exist in a variety of languages, and performance is pretty good and will improve dramatically as hardware support improves. I'd pick both crypto key size and the size of the result r to be 128 bits, and employ a 64 bit counter c. Other crypto options exist. From: Simon Peyton-Jones Sent: Wednesday, November 03, 2010 3:11 AM To: Burton Smith; Gideon Yuval (Gideon Yuval) Cc: Tolga Acar; Simon Peyton-Jones Subject: RE: Random number generation Burton, Gideon, Tolga Aha, that's interesting. I'd never seen a random number generator based on crypto, but it seems like an attractive idea. As I understand it, successive calls to 'next' will give you encrypt(0), encrypt(1), encrypt(2), encrypt(3),.... Is this standard? Does it have provably good randomness properties, (cycle length, what else?) like other RNGs? Or does it simply seem very plausible? Can I send it round to the Haskell mailing list, in the hope that someone will turn the idea into a library? (Ideally I'd like to make claims about the randomness properties in doing so, hence my qns above.) From: Gideon Yuval (Gideon Yuval) Sent: Wednesday, November 03, 2010 7:15 AM To: Simon Peyton-Jones; Burton Smith Cc: Tolga Acar Subject: RE: Random number generation As long as the key, and the non-counting part of the counter, are kept" secret", anyone who can distinguish these pseudorandoms from real random, in less than 2^128 steps, has a nice paper for crypto-2011 (this is known as "provable security") concerning a weakness in AES128. One exception: real randoms have a birthday paradox; the pseudorandoms suggested do not. If you care, you can: (1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or (2) XOR 2 such encrypted counters, with different keys; or (3) XOR 3 successive values for the same counter (just possibly cheaper; top-of-head idea). More hard-core: swap the position of key & message: encrypting a constant "secret" with 1,2,3,4.... Gives pseudorandoms with no birthday paradox. From: Tolga Acar Sent: 03 November 2010 15:50 To: Gideon Yuval (Gideon Yuval); Simon Peyton-Jones; Burton Smith Subject: RE: Random number generation Simon, The general idea is not really that new in the crypto area with constraints Gideon describes, of course. That is typically called a PRNG - Pseudo Random Number Generator, or in another parlance, Deterministic Random Bit Generators (DRBG). The DRBG constructions based on hash functions and block ciphers are even standardized in NIST publication SP800-90 (even though I may not recommend every one of them). As for the construction below, that is based on the AES block cipher, that essentially takes advantage of the PRP (Pseudo Random Permutation) property of the AES block cipher, as each block cipher ought to be. So, as Gideon outlines below, if you fix the key, the PRP gives you a random-looking (or, in other terms, indistinguishable from random) output that no one without the secret key and the "state" can generate or easily predict. Assuming an ideal cipher (and AES is a close approximation to it), the probability is believed to be the birthday paradox - no counterexample or a proof exists without assumptions; so we stick to the birthday bound. There ought to be papers on this somewhere. If not, that condensed information is spread across many papers and is part of the crypto folklore, I'd say. From: Burton Smith Sent: 03 November 2010 19:03 To: Simon Peyton-Jones Cc: Gideon Yuval (Gideon Yuval); Tolga Acar Subject: RE: Random number generation Just two points of further clarification: 1. All PRNGs used in the technical computing space fail the birthday paradox criterion (i.e. have full period), so we really need not worry about this. Also, there are other mitigating factors, e.g. suppose we are using the PRNG to generate pseudorandom residues mod n << 2^128; the paradox is happily present there. 2. The big innovation in this scheme is that the rekeying operation done by split creates a new generator with independence guaranteed by "provable security" in the sense Gideon mentioned - if someone can find something nonrandom in the correlation between G' and G'', say, then it amounts to a weakness in AES128 and is publishable. So it's yet another example of reducibility, common in our field: "if you can easily transform a known/famous hard problem P into this other problem Q, Q must be hard".

Does anyone feel like taking the idea and turning it into a Haskell library? (Or even a Haskell Wiki page?) I’m taking the liberty of cross-posting to the libraries list.
If no one else does, and the pseudo-code is made sufficiently concrete, I will probably do this in December.
The generator G is a pair comprising a crypto key G.k and an integer counter (the “message”) G.c. The (next G) operation returns a pair: 1. a random integer r obtained by encrypting G.c with G.k, and 2. a new generator G’ such that G’.k = G.k and G’.c = G.c + 1. The (split G) operation is similar, returning the same G’, except that instead of returning a random integer r it returns a third generator G’’ such that G’’.k = r and G’’.c = 0.
In short (pseudo Haskell): ------- type G g = (g, Integer) gen k (G (g,c)) = let (bytes, g') = genBytes k g in (bytes, G g' (c+1)) split gen@(G (g,c)) = let (bytes, g') = genBytes (genSeedLength `for` gen) componentGen = G (newGen bytes, 0) in (componentGen, G g' (c+1)) ------- At first blush, this could probably be done in an afternoon in much the same manner as BufferedGen, GenXor and GenAutoReseed from the DRBG package I released earlier this week. It would take a bit longer to decide on a good API.
A suitable block cipher system might be 128-bit AES (Rijndael).
DRBG only support hash based CPRNGs right now, but I guess that's not really important in this discussion.
Other crypto options exist.
Right.
Is this standard?
NIST SP 800-90 section 10.2... but I see he said that later.
(1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or
The CryptoRandomGen class allows for failure (Either GenError g) so this is doable.
(2) XOR 2 such encrypted counters, with different keys; or
This already exists for any generator that is an instance of CryptoRandomGen. See GenXor in Crypto.Random.DRBG in the DRBG package.
(3) XOR 3 successive values for the same counter (just possibly cheaper; top-of-head idea).
And composition of GenXor is possible. Tolga said:
The DRBG constructions based on hash functions and block ciphers are even standardized in NIST publication SP800-90 (even though I may not recommend every one of them).
This intrigues me, was there any more discussion here? Cheers, Thomas

On Thu, Nov 04, 2010 at 05:38:12PM +0000, Simon Peyton-Jones wrote:
The generator uses crypto functions,
Does that mean it couldn't be used in some countries?
so it's probably more computationally expensive than common linear-sequence generators, but in exchange you get robust splitting.
I wonder if you can make a splittable generator that uses crypto functions when you split it, but is a common linear-sequence generator otherwise? Thanks Ian

On Thu, Nov 4, 2010 at 11:25 AM, Ian Lynagh
On Thu, Nov 04, 2010 at 05:38:12PM +0000, Simon Peyton-Jones wrote:
The generator uses crypto functions,
Does that mean it couldn't be used in some countries?
IANAL, but the restrictions can be extremely specific. For example, some software used hash based mixing algorithms instead of cipher based ones which was sufficient to work around this issue in the important cases.
I wonder if you can make a splittable generator that uses crypto functions when you split it, but is a common linear-sequence generator otherwise?
Before we bother to do that I think it would be worth deciding what level of performance we are trying to achieve. On my laptop (Core2 2.5Ghz) I generate 4MB of random values in less than 900ms (HashDRBG). What is StdGen getting, which I know people consider slow? Also, how much do we care about the splitting speed? Cheers, Thomas

On Thu, Nov 4, 2010 at 11:39 AM, Thomas DuBuisson
Before we bother to do that I think it would be worth deciding what level of performance we are trying to achieve. On my laptop (Core2 2.5Ghz) I generate 4MB of random values in less than 900ms (HashDRBG). What is StdGen getting, which I know people consider slow?
I measured StdGen last year as generating around a million values per second. My mwc-random library is about 100 times faster when running in the ST monad.

On November 4, 2010 13:38:12 Simon Peyton-Jones wrote:
From: Gideon Yuval (Gideon Yuval) Sent: Wednesday, November 03, 2010 7:15 AM To: Simon Peyton-Jones; Burton Smith Cc: Tolga Acar Subject: RE: Random number generation
As long as the key, and the non-counting part of the counter, are kept" secret", anyone who can distinguish these pseudorandoms from real random, in less than 2^128 steps, has a nice paper for crypto-2011 (this is known as "provable security") concerning a weakness in AES128.
One exception: real randoms have a birthday paradox; the pseudorandoms suggested do not. If you care, you can:
(1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or
(2) XOR 2 such encrypted counters, with different keys; or
(3) XOR 3 successive values for the same counter (just possibly cheaper; top-of-head idea).
More hard-core: swap the position of key & message: encrypting a constant "secret" with 1,2,3,4.... Gives pseudorandoms with no birthday paradox.
Am I correct in understanding that the birthday paradox reference is that a pseudo random permutation (which this must be as the block crypto function has to be one-to-one) can't repeat numbers, unlike a random sequence. I would think this is quite related to a lot of what is discussed on Wikipedia under "block cipher modes of operation" http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation and is presumably well researched. In particular, I see there are some common solutions (e.g., cipher-block chaining -- xor your previous ciphertext [random value] with your next bit of plain text [the incrementing counter]). Cheers! -Tyson

For anyone still interested, I've implemented something very much like what Gideon et al suggested to SPJ. It's in (but not exported from) DRBG's Crypto.Random.DRBG module [1]: split :: CryptoRandomGen g => g -> Either GenError (g,g) split g = do (ent,g') <- genBytes (genSeedLength `for` g) g new <- newGen ent return (g', new) Their words were:
The generator G is a pair comprising a crypto key G.k and an integer counter (the “message”) G.c.
That is embodied in the CryptoRandomGen, typically in the manner recommended by NIST SP 800-90 though more instances are possible.
The (next G) operation returns a pair: 1. a random integer r obtained by encrypting G.c with G.k, and 2. a new generator G’ such that G’.k = G.k and G’.c = G.c + 1.
Specifically, this is the CTR DRBG. Someone noted "other cryptographic options exist" which I take to mean the hash or HMAC based generators could be acceptable.
The (split G) operation is similar, returning the same G’, except that instead of returning a random integer r it returns a third generator G’’ such that G’’.k = r and G’’.c = 0.
The exact starting value of the counter is arbitrary (NIST uses 1, I think), but the use of the output of one generator to seed the new generator is what's important. This is exactly what the above 'split' does. Some questions: 1) Do we want to go down this path and replace StdGen with something that has stronger theoretical backing for splits? Or perhaps we should just let things be and users looking for something stronger can use a hackage package? 2) What level of performance are we be looking for? 3) What deps do we consider acceptable for a replacement? 'random' has very few deps, DRBG has too many (for a core library). The proposed solution necessitates a crypto algorithm which is either slow (Crypto) or requries bytestring and maybe FFI+C (SHA, SHA2, cryptohash, AES). Also, Ian asked if we could split a non-cryptographic generator using a cryptographic one. I don't see why not (pretend the tuple is a fitting data structure): split2 :: RandomGen g, CrytpoRandomGen c => (g,c) -> ((g,c), (g,c)) split2 (g1,c1) = do (c2,c3) <- split c1 (int,c4) <- crandom c2 let g2 = newStdGen int return ( (g1, c3), (g2, c4) ) Cheers, Thomas [1] http://hackage.haskell.org/packages/archive/DRBG/0.1.1/doc/html/src/Crypto-R...
A suitable block cipher system might be 128-bit AES (Rijndael). Unencumbered implementations exist in a variety of languages, and performance is pretty good and will improve dramatically as hardware support improves. I’d pick both crypto key size and the size of the result r to be 128 bits, and employ a 64 bit counter c. Other crypto options exist.
From: Simon Peyton-Jones Sent: Wednesday, November 03, 2010 3:11 AM To: Burton Smith; Gideon Yuval (Gideon Yuval) Cc: Tolga Acar; Simon Peyton-Jones Subject: RE: Random number generation
Burton, Gideon, Tolga
Aha, that’s interesting. I’d never seen a random number generator based on crypto, but it seems like an attractive idea. As I understand it, successive calls to ‘next’ will give you
encrypt(0), encrypt(1), encrypt(2), encrypt(3),....
Is this standard? Does it have provably good randomness properties, (cycle length, what else?) like other RNGs? Or does it simply seem very plausible?
Can I send it round to the Haskell mailing list, in the hope that someone will turn the idea into a library? (Ideally I’d like to make claims about the randomness properties in doing so, hence my qns above.)
From: Gideon Yuval (Gideon Yuval) Sent: Wednesday, November 03, 2010 7:15 AM To: Simon Peyton-Jones; Burton Smith Cc: Tolga Acar Subject: RE: Random number generation
As long as the key, and the non-counting part of the counter, are kept” secret”, anyone who can distinguish these pseudorandoms from real random, in less than 2^128 steps, has a nice paper for crypto-2011 (this is known as “provable security”) concerning a weakness in AES128.
One exception: real randoms have a birthday paradox; the pseudorandoms suggested do not. If you care, you can:
(1) Limit the counter to 2^32 steps (paradox has 2^-64 probability) or even 2^16 (2^-96), then rekey; or
(2) XOR 2 such encrypted counters, with different keys; or
(3) XOR 3 successive values for the same counter (just possibly cheaper; top-of-head idea).
More hard-core: swap the position of key & message: encrypting a constant “secret” with 1,2,3,4…. Gives pseudorandoms with no birthday paradox.
From: Tolga Acar Sent: 03 November 2010 15:50 To: Gideon Yuval (Gideon Yuval); Simon Peyton-Jones; Burton Smith Subject: RE: Random number generation
Simon,
The general idea is not really that new in the crypto area with constraints Gideon describes, of course. That is typically called a PRNG – Pseudo Random Number Generator, or in another parlance, Deterministic Random Bit Generators (DRBG). The DRBG constructions based on hash functions and block ciphers are even standardized in NIST publication SP800-90 (even though I may not recommend every one of them).
As for the construction below, that is based on the AES block cipher, that essentially takes advantage of the PRP (Pseudo Random Permutation) property of the AES block cipher, as each block cipher ought to be. So, as Gideon outlines below, if you fix the key, the PRP gives you a random-looking (or, in other terms, indistinguishable from random) output that no one without the secret key and the “state” can generate or easily predict. Assuming an ideal cipher (and AES is a close approximation to it), the probability is believed to be the birthday paradox - no counterexample or a proof exists without assumptions; so we stick to the birthday bound.
There ought to be papers on this somewhere. If not, that condensed information is spread across many papers and is part of the crypto folklore, I’d say.
From: Burton Smith Sent: 03 November 2010 19:03 To: Simon Peyton-Jones Cc: Gideon Yuval (Gideon Yuval); Tolga Acar Subject: RE: Random number generation
Just two points of further clarification:
1. All PRNGs used in the technical computing space fail the birthday paradox criterion (i.e. have full period), so we really need not worry about this. Also, there are other mitigating factors, e.g. suppose we are using the PRNG to generate pseudorandom residues mod n << 2^128; the paradox is happily present there.
2. The big innovation in this scheme is that the rekeying operation done by split creates a new generator with independence guaranteed by “provable security” in the sense Gideon mentioned – if someone can find something nonrandom in the correlation between G’ and G’’, say, then it amounts to a weakness in AES128 and is publishable. So it’s yet another example of reducibility, common in our field: “if you can easily transform a known/famous hard problem P into this other problem Q, Q must be hard”.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (5)
-
Bryan O'Sullivan
-
Ian Lynagh
-
Simon Peyton-Jones
-
Thomas DuBuisson
-
Tyson Whitehead