Proposal: Add getStdGenState and stdGenFromState

Dear Ryan, all, To quote the documentation of System.Random: "The Show and Read instances of StdGen provide a primitive way to save the state of a random number generator." Primitive it is indeed. Currently the only way to serialize (using cereal or binary) a StdGen is to convert it to a String and then serialize this String. This takes up many more bytes than are actually needed. I would like to propose a more efficient way of serializing the state of the random number generator by providing the following two functions: getStdGenState :: StdGen -> (Int32, Int32) getStdGenState (StdGen s1 s2) = (s1, s2) stdGenFromState :: (Int32, Int32) -> StdGen stdGenFromState (s1, s2) = StdGen s1 s2 Of course this satisfies: (stdGenFromState . getStdGenState) g == g Alternatively, instead of using (Int32, Int32) we can use an Int64: getStdGenState :: StdGen -> Int64 getStdGenState (StdGen s1 s2) = fromIntegral s1 `unsafeShiftL` 32 .|. fromIntegral s2 #if !MIN_VERSION_base(4,5,0) where unsafeShiftL = shiftL #endif stdGenFromState :: Int64 -> StdGen stdGenFromState s = StdGen (fromIntegral (s `unsafeShiftR` 32)) (fromIntegral s) #if !MIN_VERSION_base(4,5,0) where unsafeShiftR = shiftR #endif I don't really care which one we use but I think the latter is a bit easier to use. Oh, I don't really care about the names either. Regards, Bas

Maybe the original intent was to avoid exposing too many internals? That being said, +1. -- Felipe.

64 bits of entropy doesn't seem like much at all for a RNG. I thought
the idea was StdGen could be fully abstract.
John
On Mon, Feb 6, 2012 at 9:07 AM, Bas van Dijk
Dear Ryan, all,
To quote the documentation of System.Random: "The Show and Read instances of StdGen provide a primitive way to save the state of a random number generator."
Primitive it is indeed. Currently the only way to serialize (using cereal or binary) a StdGen is to convert it to a String and then serialize this String. This takes up many more bytes than are actually needed.
I would like to propose a more efficient way of serializing the state of the random number generator by providing the following two functions:
getStdGenState :: StdGen -> (Int32, Int32) getStdGenState (StdGen s1 s2) = (s1, s2)
stdGenFromState :: (Int32, Int32) -> StdGen stdGenFromState (s1, s2) = StdGen s1 s2
Of course this satisfies: (stdGenFromState . getStdGenState) g == g
Alternatively, instead of using (Int32, Int32) we can use an Int64:
getStdGenState :: StdGen -> Int64 getStdGenState (StdGen s1 s2) = fromIntegral s1 `unsafeShiftL` 32 .|. fromIntegral s2 #if !MIN_VERSION_base(4,5,0) where unsafeShiftL = shiftL #endif
stdGenFromState :: Int64 -> StdGen stdGenFromState s = StdGen (fromIntegral (s `unsafeShiftR` 32)) (fromIntegral s) #if !MIN_VERSION_base(4,5,0) where unsafeShiftR = shiftR #endif
I don't really care which one we use but I think the latter is a bit easier to use.
Oh, I don't really care about the names either.
Regards,
Bas
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 6 February 2012 18:58, John Meacham
64 bits of entropy doesn't seem like much at all for a RNG. I thought the idea was StdGen could be fully abstract.
Well it's still sort of abstract since its constructor is not exported. My idea was that since StdGen already has a Read and Show instance we can just as well provide a more efficient serialization method. But I agree that the API should accommodate for changing the amount of bits. What about the following API: getStdGenState :: StdGen -> [Int32] getStdGenState (StdGen s1 s2) = [s1, s2] stdGenFromState :: [Int32] -> Maybe StdGen stdGenFromState [s1, s2] = Just (StdGen s1 s2) stdGenFromState _ = Nothing Bas

On February 6, 2012 13:14:41 Bas van Dijk wrote:
But I agree that the API should accommodate for changing the amount of bits. What about the following API:
getStdGenState :: StdGen -> [Int32] getStdGenState (StdGen s1 s2) = [s1, s2]
stdGenFromState :: [Int32] -> Maybe StdGen stdGenFromState [s1, s2] = Just (StdGen s1 s2) stdGenFromState _ = Nothing
Jumping in without having paid much attention to the thread or API, but I expect [Word8] (ByteString?) would fit the general case better than [Int32] Cheers! -Tyson

On 6 February 2012 19:36, Tyson Whitehead
Jumping in without having paid much attention to the thread or API, but I expect [Word8] (ByteString?) would fit the general case better than [Int32]
I like ByteString. It does add a dependency on bytestring. But since bytestring is a boot package I guess that doesn't really matter. [Word8] would be a good second alternative. Bas

On 2/6/12 1:36 PM, Tyson Whitehead wrote:
Jumping in without having paid much attention to the thread or API, but I expect [Word8] (ByteString?) would fit the general case better than [Int32]
I'd second the vote for ByteString. If the goal is to dump a bunch of bits without declaring how many there are beforehand, then ByteString is exactly the right interface for that. If the goal is to serialize things efficiently, then ByteString is one of the better target representations for that[1]. I don't know what exactly StdGen is storing under the hood, but I don't imagine that it really has the semantic structure of a list of Int32. [1] Depending on the details of what all you need exactly, it may not be the best. For example, if the semantic interpretation of the bits is non-uniform, then you'll probably want an intermediate structure which expresses the non-uniform chunks; and from there you'll almost surely need to start worrying about things like versioning of the format etc. -- Live well, ~wren

yes, bytestring makse sense. Just include a note that when presented a
bytestring not generated by stdGenToState stdGenFromState will utilize
the data in the bytestring to seed the RNG in an unspecified
implementation defined fashion. So we can use it to feed arbitrary
seed data into the RNG as well as re-create a specific previous state.
So if it isn't of just the right size, feed it to a hash to get an
appropriate one. I'd recommend the 64 bit mix function from this page
http://www.concentric.net/~ttwang/tech/inthash.htm, nice and simple
and reversible. quite handy.
John
On Tue, Feb 7, 2012 at 2:04 PM, wren ng thornton
On 2/6/12 1:36 PM, Tyson Whitehead wrote:
Jumping in without having paid much attention to the thread or API, but I expect [Word8] (ByteString?) would fit the general case better than [Int32]
I'd second the vote for ByteString. If the goal is to dump a bunch of bits without declaring how many there are beforehand, then ByteString is exactly the right interface for that. If the goal is to serialize things efficiently, then ByteString is one of the better target representations for that[1]. I don't know what exactly StdGen is storing under the hood, but I don't imagine that it really has the semantic structure of a list of Int32.
[1] Depending on the details of what all you need exactly, it may not be the best. For example, if the semantic interpretation of the bits is non-uniform, then you'll probably want an intermediate structure which expresses the non-uniform chunks; and from there you'll almost surely need to start worrying about things like versioning of the format etc.
-- Live well, ~wren
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (5)
-
Bas van Dijk
-
Felipe Almeida Lessa
-
John Meacham
-
Tyson Whitehead
-
wren ng thornton