Generating repeatable arbitrary values with QuickCheck 2

I would like to generate an arbitrary (large) value to benchmark the performance of constructing that value with isomorphic types. It seems like QuickCheck might be useful in this regards. Has anyone done something similar? In versions 1.*, there was a generate function: generate :: Int -> StdGen -> Gen a -> a
generate n rnd (Gen m) = m size rnd' where (size, rnd') = randomR (0, n) rnd
That seems to have disappeared in versions 2.*, and I didn't find a clear replacement. I came up with using the destructor for Gen: unGen :: Gen a -> StdGen -> Int -> a
The function generate seems to have a little something extra, though I'm not sure if it's necessary. Is this true, or should I write an equivalent generate function? As an aside, it would be nice to have a generate function in the library, even if it is only a wrapper for unGen. In the end, I would write something like the following: unGen arbitrary (mkStdGen 11) 5 :: [Int]
This produces, for example, [5,1,-2,-4,2]. I also want to generate the same value for a type isomorphic to [Int]. unGen arbitrary (mkStdGen 11) 5 :: List Int
Unfortunately, this produces Cons 4 (Cons 3 (Cons (-2) (Cons 0 (Cons (-1) Nil)))): same length but different values. The Arbitrary instances are the same. I had similar results with generate from QC 1. Any suggestions on how to do this? With another library perhaps? Thanks, Sean

Correction about the latter part...
In the end, I would write something like the following:
unGen arbitrary (mkStdGen 11) 5 :: [Int]
This produces, for example, [5,1,-2,-4,2]. I also want to generate the same value for a type isomorphic to [Int].
unGen arbitrary (mkStdGen 11) 5 :: List Int
Unfortunately, this produces Cons 4 (Cons 3 (Cons (-2) (Cons 0 (Cons (-1) Nil)))): same length but different values. The Arbitrary instances are the same.
The Arbitrary instance were _slightly_ different, but different enough. ;) Now, the values are isomorphic. Thankfully, purity is restored. Sean

Gen slightly breaks the monad laws:
arbitrary >>= return is not the same as return () >>= const arbitrary because each bind splits the generator, so you end up with a different seed passed to arbitrary in these two cases.
If the observable value is "some random object" this is a safe fudge,
but if you want repeatable, it doesn't quite work. You need your
instances to be exactly identical, down to the associativity of binds,
in order to get the same results.
-- ryan
On Tue, Feb 2, 2010 at 4:34 AM, Sean Leather
Correction about the latter part...
In the end, I would write something like the following:
unGen arbitrary (mkStdGen 11) 5 :: [Int]
This produces, for example, [5,1,-2,-4,2]. I also want to generate the same value for a type isomorphic to [Int].
unGen arbitrary (mkStdGen 11) 5 :: List Int
Unfortunately, this produces Cons 4 (Cons 3 (Cons (-2) (Cons 0 (Cons (-1) Nil)))): same length but different values. The Arbitrary instances are the same.
The Arbitrary instance were _slightly_ different, but different enough. ;) Now, the values are isomorphic. Thankfully, purity is restored.
Sean
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Feb 2, 2010 at 1:48 PM, Ryan Ingram
Gen slightly breaks the monad laws:
arbitrary >>= return is not the same as return () >>= const arbitrary because each bind splits the generator, so you end up with a different seed passed to arbitrary in these two cases.
If the observable value is "some random object" this is a safe fudge, but if you want repeatable, it doesn't quite work. You need your instances to be exactly identical, down to the associativity of binds, in order to get the same results.
We could avoid that problem by redefining Gen as a state transformer monad.
newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }
instance Monad Gen where
return a = MkGen $ \r _ -> (a,r)
MkGen m >>= k = MkGen $ \r n -> let (a,r') = m r n in unGen (k a) r' n
I'm pretty sure all the Gen primitives can be similarly redefined.
--
Dave Menendez

On Tue, Feb 2, 2010 at 20:25, David Menendez wrote:
On Tue, Feb 2, 2010 at 1:48 PM, Ryan Ingram wrote:
Gen slightly breaks the monad laws:
arbitrary >>= return is not the same as return () >>= const arbitrary because each bind splits the generator, so you end up with a different seed passed to arbitrary in these two cases.
Ah yes, that was exactly the problem.
If the observable value is "some random object" this is a safe fudge, but if you want repeatable, it doesn't quite work. You need your instances to be exactly identical, down to the associativity of binds, in order to get the same results.
We could avoid that problem by redefining Gen as a state transformer monad.
newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }
instance Monad Gen where return a = MkGen $ \r _ -> (a,r) MkGen m >>= k = MkGen $ \r n -> let (a,r') = m r n in unGen (k a) r' n
I'm pretty sure all the Gen primitives can be similarly redefined.
And I'm guessing I haven't been or won't be the only one running into this issue. Sean

On Tue, Feb 2, 2010 at 11:25 AM, David Menendez
We could avoid that problem by redefining Gen as a state transformer monad.
newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }
Unfortunately, this makes things like
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.
Which is a pretty significant change. -- ryan

Although now that I think about it, most of these could be pretty easily fixed by a new primitive:
splitGen :: Gen a -> Gen a splitGen m = MkGen spg where spg g n = (a, g2) where (g1, g2) = split g (a, _) = unGen m g1 n
Then you could do
infinite_xs <- splitGen $ sequence (repeat arbitrary)
-- ryan
On Tue, Feb 2, 2010 at 2:04 PM, Ryan Ingram
On Tue, Feb 2, 2010 at 11:25 AM, David Menendez
wrote: We could avoid that problem by redefining Gen as a state transformer monad.
newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }
Unfortunately, this makes things like
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.
Which is a pretty significant change.
-- ryan

Ryan Ingram wrote:
Unfortunately, this makes things like
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.
You're asking to execute an infinite number of monadic actions. How can this ever terminate at all? Martijn.

Martijn, Ryan wrote:
Unfortunately, this makes things like
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.
You replied:
You're asking to execute an infinite number of monadic actions. How can this ever terminate at all?
There is this thing called lazy evaluation, you know. ;-) Try for yourself: import System.Random import Test.QuickCheck foo :: Gen [Int] foo = do ns <- sequence (repeat arbitrary) return (take 5 ns) main :: IO () main = do stdGen <- newStdGen print (generate 42 stdGen foo) Cheers, Stefan

On Fri, Feb 5, 2010 at 5:19 AM, Martijn van Steenbergen
Ryan Ingram wrote:
Unfortunately, this makes things like
infinite_xs <- sequence (repeat arbitrary)
no longer work, since the state never comes out the other side.
You're asking to execute an infinite number of monadic actions. How can this ever terminate at all?
Stefan already gave an example, but to explain slightly further -- There's nothing "magical" about monadic actions. It's just another function call. In the case of QuickCheck, Gen is a reader monad with a "broken" >>= that changes the state of the generator passed to each side:
newtype Gen a = Gen (Int -> StdGen -> a) generate n g (Gen f) = f n g
return x = Gen (\_ _ -> x) m >>= f = Gen mbindf where mbindf n g = b where (g1,g2) = split g a = generate n g1 m b = generate n g2 (f a)
Now, to see how this generates data for an infinite list, just consider
sequence [arbitrary, ... which we can represent as sequence (arbitrary:undefined)
Recall the definition of sequence:
sequence [] = return [] sequence (a:as) = do x <- a xs <- sequence as return (x:xs)
If we are ever required to evaluate the rest of the list, we'll get undefined and computation will fail. The goal is to get something out of the computation without needing to do so; if that works, then it will work for (arbitrary:arbitrary:undefined) and so on up to an infinite list of actions. Let's try it! generate 42 g $ sequence (aribtrary : undefined) = generate 42 sg $ do x <- arbitrary xs <- sequence undefined return (x:xs) = generate 42 sg ( arbitrary >>= \x -> sequence undefined >>= \xs -> return (x:xs) ) = let m = arbitrary f = \x -> sequence undefined >>= \xs -> return (x:xs) mbindf n g = b where (g1,g2) = split g a = generate n g m b = generate n g (f a) in generate 42 sg (Gen mbindf) = let ... in mbindf 42 sg = let m = arbitrary f = \x -> sequence undefined >>= \xs -> return (x:xs) n = 42 g = sg (g1,g2) = split g a = generate n g1 m b = generate n g2 (f a) in b = let ... in generate n g2 (f a) = let ... in generate n g2 (sequence undefined >>= \xs -> return (a:xs) = let m = arbitrary n = 42 g = sg (g1,g2) = split g a = generate n g1 m m1 = sequence undefined f = \xs -> return (a:xs) mbindf n1 g3 = b where (g4,g5) = split g3 a1 = generate n1 g4 m1 b = generate n1 g5 (f a1) in generate n g2 (Gen mbindf) = let ... in mbindf n g2 = let m = arbitrary n = 42 g = sg (g1,g2) = split g a = generate n g1 m m1 = sequence undefined f = \xs -> return (a:xs) (g4,g5) = split g2 a1 = generate n g4 m1 b = generate n g5 (f a1) in generate n g5 (f a1) = let ... in generate n g5 (return (a:a1)) = let ... in generate n g5 (Gen (\_ _ -> (a:a1))) = let ... in (\_ _ -> (a:a1)) n g5 = let ... in (a:a1) = let ... in (generate n g1 m : a1) = let ... in (generate n g1 arbitrary : a1) = let ... in (<arbitrary> : a1) We have now output a cons cell with an arbitrary value without even evaluating the rest of the input to sequence (which is undefined; could have been 'repeat aribtrary' or anything else) Lazy evaluation is pretty neat :) -- ryan

On Fri, Feb 5, 2010 at 3:39 PM, Ryan Ingram
On Fri, Feb 5, 2010 at 5:19 AM, Martijn van Steenbergen
wrote: Ryan Ingram wrote:
Unfortunately, this makes things like
infinite_xs <- sequence (repeat arbitrary)
no longer work, since the state never comes out the other side.
You're asking to execute an infinite number of monadic actions. How can this ever terminate at all?
Stefan already gave an example, but to explain slightly further --
There's nothing "magical" about monadic actions. It's just another function call.
In the case of QuickCheck, Gen is a reader monad with a "broken" >>= that changes the state of the generator passed to each side:
Incidentally, the alternative Gen I suggested also works for infinite
lists. (It's equivalent to StateT StdGen (Reader Int), using the
StateT from Control.Monad.State.Lazy.)
The problem, as Ryan pointed out, is that you can't access the state
after the infinite computation, so you can't create two infinite
streams or an infinite tree, which the current definition of Gen
allows.
More concretely, this works fine:
stream = do
x <- arbitrary
xs <- stream
return (x:xs)
but you can't call arbitrary after you call stream
broken = do
xs <- stream
y <- arbitrary -- can't evaluate until stream is fully evaluated
(i.e., never)
The present definition of Gen avoids this by splitting the StdGen at
every >>=, but that creates the situation where two expressions which
should be equivalent produce different results in some contexts.
It isn't clear to me which implementation is best. I lean towards the
StateT-like implementation, on the theory that it's limitations are
easier to explain, but I guess it comes down to whether we want to
make life easier for (a) people creating infinite structures or (b)
people who need reproducible results.
--
Dave Menendez
participants (5)
-
David Menendez
-
Martijn van Steenbergen
-
Ryan Ingram
-
Sean Leather
-
Stefan Holdermans