Infelicity in StdGen?

Either I am misunderstanding something or there is an infelicity in the implementation of StdGen. The documentation on mkStdGen says that distinct arguments should be likely to produce distinct generators. This made me think that I would get a reasonable pseudo-random function to simulate n rolls of a die by using n to seed the random number generator: import System.Random roll :: Int -> String roll n = take n . randomRs ('1', '6') . mkStdGen $ n However, this produces a string beginning with a '6' for 0 <= n <= 53667. In fact the dependency of the first value on the seed seems to be far from random: map (\l -> (head l, length l)) . group . map (fst . randomR (1, 6) . mkStdGen) $ [0..25*53668+6] returns: [(6,53668),(5,53668),(4,53668),(3,53669),(2,53668),(1,53668),(6,53669),(5,53668),(4,53668),(3,53669),(2,53668),(1,53668),(6,53668),(5,53669),(4,53668),(3,53668),(2,53669),(1,53668),(6,53668),(5,53669),(4,53668),(3,53668),(2,53669),(1,53668),(6,53668)] The behaviour seems to be related to the length of the range. You get similar behaviour for ranges of length 2, 3, 6 and 9 for example, but not for 4, 5, 7 or 8. If it is relevant, I am using ghc version 7.6.3. Regards, Rob.

I think the confusion may be come from the understanding of "distinct". The documentation is right that the generators are not equal which is easily checked e.g. using their Show instance. They *will* produce different random numbers. The user of the library might OTOH assume that "distinct" mean "producing uncorrelated output". This is harder and may simply not hold, especially that it doesn't mention sequentially increasing integers or any other kinds of sequences. The property you seem to be looking for is "have vastly different output for similar numbers". Sounds a lot like a hash function to me.
import Data.Hashable -- hashable import System.Random -- random hGen :: (Hashable a) => a -> StdGen hGen = mkStdGen . hash
GHCi:
mkStdGen <$> [1..10] [2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 1,10 1,11 1] hGen <$> [1..10] [2113803271 1,707778093 1,377439146 1,354368904 1,1689773631 1,1515981814 1,1419492475 1,1232077631 1,2037530173 1,1078099554 1]
You can also ask System.Random for a required set of random numbers for use as seeds.
map mkStdGen <$> replicateM 10 randomIO [817657009 1,491059977 1,1962205061 1,375413047 1,626395891 1,1694342924 1,1145131839 1,441215930 1,1278080790 1,1524285256 1]
Finally, StdGen provides 'split' since it implements RandomGen typeclass. Using Data.List.unfoldr:
take 10 $ unfoldr (Just . split) (mkStdGen 1) [3 40692,80029 2147442707,1054756830 2147402015,545291968 2147361323,879767459 2147320631,1464499717 2147279939,2107652444 2147239247,1777851630 2147198555,1414574869 2147157863,1574498162 2147117171]
The question is what really are your needs here. Different applications
will require different properties. I hope the above will give some hints.
On Fri, Jan 3, 2014 at 5:28 PM, Rob Arthan
Either I am misunderstanding something or there is an infelicity in the implementation of StdGen. The documentation on mkStdGen says that distinct arguments should be likely to produce distinct generators. This made me think that I would get a reasonable pseudo-random function to simulate n rolls of a die by using n to seed the random number generator:
import System.Random roll :: Int -> String roll n = take n . randomRs ('1', '6') . mkStdGen $ n
However, this produces a string beginning with a '6' for 0 <= n <= 53667. In fact the dependency of the first value on the seed seems to be far from random:
map (\l -> (head l, length l)) . group . map (fst . randomR (1, 6) . mkStdGen) $ [0..25*53668+6]
returns:
[(6,53668),(5,53668),(4,53668),(3,53669),(2,53668),(1,53668),(6,53669),(5,53668),(4,53668),(3,53669),(2,53668),(1,53668),(6,53668),(5,53669),(4,53668),(3,53668),(2,53669),(1,53668),(6,53668),(5,53669),(4,53668),(3,53668),(2,53669),(1,53668),(6,53668)]
The behaviour seems to be related to the length of the range. You get similar behaviour for ranges of length 2, 3, 6 and 9 for example, but not for 4, 5, 7 or 8.
If it is relevant, I am using ghc version 7.6.3.
Regards,
Rob.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 3 Jan 2014, at 19:22, Krzysztof Skrzętnicki wrote:
I think the confusion may be come from the understanding of "distinct". The documentation is right that the generators are not equal which is easily checked e.g. using their Show instance. They will produce different random numbers. The user of the library might OTOH assume that "distinct" mean "producing uncorrelated output".
I agree that my example doesn't refute the precise meaning of the words. May I suggest that the statement that the generators are likely to be distinct on distinct inputs isn't really that useful to someone using StdGen.
This is harder and may simply not hold, especially that it doesn't mention sequentially increasing integers or any other kinds of sequences.
The property you seem to be looking for is "have vastly different output for similar numbers". Sounds a lot like a hash function to me.
I think most people would expect the function that maps the seed of a pseudo-random number generator to the first (or second or third or ...) value it generates to be a reasonably good hash function. As this turns out not to be the case for the algorithm used by StdGen for certain lengths of the range, the statement about distinct generators is somewhat misleading. I had a look at the StdGen source and don't know enough about the algorithm it is using to comment on why it has this surprising behaviour for some lengths of the range. It really is surprising in my view: one way of describing the behaviour is that if you use StdGen to generate a random boolean with seeds n and n+1, the probability that the two values are different is less than 1/50,000. Thanks for suggesting the useful work-arounds. Regards, Rob.

On 14-01-03 11:28 AM, Rob Arthan wrote:
roll n = take n . randomRs ('1', '6') . mkStdGen $ n
However, this produces a string beginning with a '6' for 0 <= n <= 53667.
It seems to me such small numbers do not have enough entropy to be worthy seeds to begin with. Say, in the 64-bit binary form of 53667, how many consecutive 0's are there?

On 3 Jan 2014, at 22:57, Albert Y. C. Lai wrote:
On 14-01-03 11:28 AM, Rob Arthan wrote:
roll n = take n . randomRs ('1', '6') . mkStdGen $ n
However, this produces a string beginning with a '6' for 0 <= n <= 53667.
It seems to me such small numbers do not have enough entropy to be worthy seeds to begin with. Say, in the 64-bit binary form of 53667, how many consecutive 0's are there?
I don't think the entropy of the number considered as a string of bits is relevant. The later part of my post strongly suggests that there is a pattern. And in fact this pattern turns out to repeat indefinitely. The following calculates about 10^7+1 values starting with a large integer that I obtained from /dev/urandom: map (\l -> (head l, length l)) . group . map (fst . randomR (1, 6) . mkStdGen) $ [0x383b0d54718ac75f..0x383b0d54718ac75f+1000000] The result is: [(2,36076),(1,53669),(6,53668),(5,53668),(4,53669),(3,53668),(2,53668),(1,53669),(6,53668),(5,53668),(4,53668),(3,53669),(2,53668),(1,53668),(6,53669),(5,53668),(4,53668),(3,53669),(2,51563)] So we keep on getting long runs of seed values that produce the same value on the first call of randomR even when we start with a seed that will have around 60 bits of entropy (if /dev/urandom on my heavily used iMac is doing its job). Regards, Rob.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (3)
-
Albert Y. C. Lai
-
Krzysztof Skrzętnicki
-
Rob Arthan