Generating repeatable arbitrary values with QuickCheck 2

older
online Real World Haskell - tool...

Sean Leather

1 Feb 2010 1 Feb '10

9:54 p.m.

I would like to generate an arbitrary (large) value to benchmark the performance of constructing that value with isomorphic types. It seems like QuickCheck might be useful in this regards. Has anyone done something similar? In versions 1.*, there was a generate function: generate :: Int -> StdGen -> Gen a -> a

...

generate n rnd (Gen m) = m size rnd' where (size, rnd') = randomR (0, n) rnd

That seems to have disappeared in versions 2.*, and I didn't find a clear replacement. I came up with using the destructor for Gen: unGen :: Gen a -> StdGen -> Int -> a

...

The function generate seems to have a little something extra, though I'm not sure if it's necessary. Is this true, or should I write an equivalent generate function? As an aside, it would be nice to have a generate function in the library, even if it is only a wrapper for unGen. In the end, I would write something like the following: unGen arbitrary (mkStdGen 11) 5 :: [Int]

...

This produces, for example, [5,1,-2,-4,2]. I also want to generate the same value for a type isomorphic to [Int]. unGen arbitrary (mkStdGen 11) 5 :: List Int

...

Unfortunately, this produces Cons 4 (Cons 3 (Cons (-2) (Cons 0 (Cons (-1) Nil)))): same length but different values. The Arbitrary instances are the same. I had similar results with generate from QC 1. Any suggestions on how to do this? With another library perhaps? Thanks, Sean

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Sean Leather

2 Feb 2 Feb

12:34 p.m.

Correction about the latter part...

...

In the end, I would write something like the following:

unGen arbitrary (mkStdGen 11) 5 :: [Int]

...
This produces, for example, [5,1,-2,-4,2]. I also want to generate the same value for a type isomorphic to [Int].

unGen arbitrary (mkStdGen 11) 5 :: List Int

...
Unfortunately, this produces Cons 4 (Cons 3 (Cons (-2) (Cons 0 (Cons (-1) Nil)))): same length but different values. The Arbitrary instances are the same.

The Arbitrary instance were _slightly_ different, but different enough. ;) Now, the values are isomorphic. Thankfully, purity is restored. Sean

Ryan Ingram

6:48 p.m.

Gen slightly breaks the monad laws:

...

arbitrary >>= return is not the same as return () >>= const arbitrary because each bind splits the generator, so you end up with a different seed passed to arbitrary in these two cases.

If the observable value is "some random object" this is a safe fudge, but if you want repeatable, it doesn't quite work. You need your instances to be exactly identical, down to the associativity of binds, in order to get the same results. -- ryan On Tue, Feb 2, 2010 at 4:34 AM, Sean Leather wrote:

...

Correction about the latter part...

...
In the end, I would write something like the following:

...
unGen arbitrary (mkStdGen 11) 5 :: [Int]

This produces, for example, [5,1,-2,-4,2]. I also want to generate the same value for a type isomorphic to [Int].

...
unGen arbitrary (mkStdGen 11) 5 :: List Int

Unfortunately, this produces Cons 4 (Cons 3 (Cons (-2) (Cons 0 (Cons (-1) Nil)))): same length but different values. The Arbitrary instances are the same.

The Arbitrary instance were _slightly_ different, but different enough. ;) Now, the values are isomorphic. Thankfully, purity is restored.

Sean

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

David Menendez

7:25 p.m.

On Tue, Feb 2, 2010 at 1:48 PM, Ryan Ingram wrote:

...

Gen slightly breaks the monad laws:

...
arbitrary >>= return is not the same as return () >>= const arbitrary because each bind splits the generator, so you end up with a different seed passed to arbitrary in these two cases.

If the observable value is "some random object" this is a safe fudge, but if you want repeatable, it doesn't quite work. You need your instances to be exactly identical, down to the associativity of binds, in order to get the same results.

We could avoid that problem by redefining Gen as a state transformer monad. newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) } instance Monad Gen where return a = MkGen $ \r _ -> (a,r) MkGen m >>= k = MkGen $ \r n -> let (a,r') = m r n in unGen (k a) r' n I'm pretty sure all the Gen primitives can be similarly redefined. -- Dave Menendez http://www.eyrie.org/~zednenem/

Sean Leather

8:30 p.m.

On Tue, Feb 2, 2010 at 20:25, David Menendez wrote:

...

On Tue, Feb 2, 2010 at 1:48 PM, Ryan Ingram wrote:

...
Gen slightly breaks the monad laws:

...
arbitrary >>= return is not the same as return () >>= const arbitrary because each bind splits the generator, so you end up with a different seed passed to arbitrary in these two cases.

Ah yes, that was exactly the problem.

...

...
If the observable value is "some random object" this is a safe fudge, but if you want repeatable, it doesn't quite work. You need your instances to be exactly identical, down to the associativity of binds, in order to get the same results.

We could avoid that problem by redefining Gen as a state transformer monad.

newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }

instance Monad Gen where return a = MkGen $ \r _ -> (a,r) MkGen m >>= k = MkGen $ \r n -> let (a,r') = m r n in unGen (k a) r' n

I'm pretty sure all the Gen primitives can be similarly redefined.

And I'm guessing I haven't been or won't be the only one running into this issue. Sean

Ryan Ingram

10:04 p.m.

On Tue, Feb 2, 2010 at 11:25 AM, David Menendez wrote:

...

We could avoid that problem by redefining Gen as a state transformer monad.

newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }

Unfortunately, this makes things like

...

infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.

Which is a pretty significant change. -- ryan

Ryan Ingram

10:09 p.m.

Although now that I think about it, most of these could be pretty easily fixed by a new primitive:

...

splitGen :: Gen a -> Gen a splitGen m = MkGen spg where spg g n = (a, g2) where (g1, g2) = split g (a, _) = unGen m g1 n

Then you could do

...

infinite_xs <- splitGen $ sequence (repeat arbitrary)

-- ryan On Tue, Feb 2, 2010 at 2:04 PM, Ryan Ingram wrote:

...

On Tue, Feb 2, 2010 at 11:25 AM, David Menendez wrote:

...
We could avoid that problem by redefining Gen as a state transformer monad.

newtype Gen a = MkGen { unGen :: StdGen -> Int -> (a, StdGen) }

Unfortunately, this makes things like

...
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.

Which is a pretty significant change.

-- ryan

Martijn van Steenbergen

5 Feb 5 Feb

1:19 p.m.

Ryan Ingram wrote:

...

Unfortunately, this makes things like

...
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.

You're asking to execute an infinite number of monadic actions. How can this ever terminate at all? Martijn.

Stefan Holdermans

1:39 p.m.

Martijn, Ryan wrote:

...

...
Unfortunately, this makes things like

...
infinite_xs <- sequence (repeat arbitrary) no longer work, since the state never comes out the other side.

You replied:

...

You're asking to execute an infinite number of monadic actions. How can this ever terminate at all?

There is this thing called lazy evaluation, you know. ;-) Try for yourself: import System.Random import Test.QuickCheck foo :: Gen [Int] foo = do ns <- sequence (repeat arbitrary) return (take 5 ns) main :: IO () main = do stdGen <- newStdGen print (generate 42 stdGen foo) Cheers, Stefan

Ryan Ingram

8:39 p.m.

On Fri, Feb 5, 2010 at 5:19 AM, Martijn van Steenbergen wrote:

...

Ryan Ingram wrote:

...
Unfortunately, this makes things like

...
infinite_xs <- sequence (repeat arbitrary)

no longer work, since the state never comes out the other side.

You're asking to execute an infinite number of monadic actions. How can this ever terminate at all?

Stefan already gave an example, but to explain slightly further -- There's nothing "magical" about monadic actions. It's just another function call. In the case of QuickCheck, Gen is a reader monad with a "broken" >>= that changes the state of the generator passed to each side:

...

newtype Gen a = Gen (Int -> StdGen -> a) generate n g (Gen f) = f n g

return x = Gen (\_ _ -> x) m >>= f = Gen mbindf where mbindf n g = b where (g1,g2) = split g a = generate n g1 m b = generate n g2 (f a)

Now, to see how this generates data for an infinite list, just consider

...

sequence [arbitrary, ... which we can represent as sequence (arbitrary:undefined)

Recall the definition of sequence:

...

sequence [] = return [] sequence (a:as) = do x <- a xs <- sequence as return (x:xs)

If we are ever required to evaluate the rest of the list, we'll get undefined and computation will fail. The goal is to get something out of the computation without needing to do so; if that works, then it will work for (arbitrary:arbitrary:undefined) and so on up to an infinite list of actions. Let's try it! generate 42 g $ sequence (aribtrary : undefined) = generate 42 sg $ do x <- arbitrary xs <- sequence undefined return (x:xs) = generate 42 sg ( arbitrary >>= \x -> sequence undefined >>= \xs -> return (x:xs) ) = let m = arbitrary f = \x -> sequence undefined >>= \xs -> return (x:xs) mbindf n g = b where (g1,g2) = split g a = generate n g m b = generate n g (f a) in generate 42 sg (Gen mbindf) = let ... in mbindf 42 sg = let m = arbitrary f = \x -> sequence undefined >>= \xs -> return (x:xs) n = 42 g = sg (g1,g2) = split g a = generate n g1 m b = generate n g2 (f a) in b = let ... in generate n g2 (f a) = let ... in generate n g2 (sequence undefined >>= \xs -> return (a:xs) = let m = arbitrary n = 42 g = sg (g1,g2) = split g a = generate n g1 m m1 = sequence undefined f = \xs -> return (a:xs) mbindf n1 g3 = b where (g4,g5) = split g3 a1 = generate n1 g4 m1 b = generate n1 g5 (f a1) in generate n g2 (Gen mbindf) = let ... in mbindf n g2 = let m = arbitrary n = 42 g = sg (g1,g2) = split g a = generate n g1 m m1 = sequence undefined f = \xs -> return (a:xs) (g4,g5) = split g2 a1 = generate n g4 m1 b = generate n g5 (f a1) in generate n g5 (f a1) = let ... in generate n g5 (return (a:a1)) = let ... in generate n g5 (Gen (\_ _ -> (a:a1))) = let ... in (\_ _ -> (a:a1)) n g5 = let ... in (a:a1) = let ... in (generate n g1 m : a1) = let ... in (generate n g1 arbitrary : a1) = let ... in (<arbitrary> : a1) We have now output a cons cell with an arbitrary value without even evaluating the rest of the input to sequence (which is undefined; could have been 'repeat aribtrary' or anything else) Lazy evaluation is pretty neat :) -- ryan

David Menendez

10:13 p.m.

On Fri, Feb 5, 2010 at 3:39 PM, Ryan Ingram wrote:

...

On Fri, Feb 5, 2010 at 5:19 AM, Martijn van Steenbergen wrote:

...
Ryan Ingram wrote:

...
Unfortunately, this makes things like

...
infinite_xs <- sequence (repeat arbitrary)

no longer work, since the state never comes out the other side.

You're asking to execute an infinite number of monadic actions. How can this ever terminate at all?

Stefan already gave an example, but to explain slightly further --

There's nothing "magical" about monadic actions. It's just another function call.

In the case of QuickCheck, Gen is a reader monad with a "broken" >>= that changes the state of the generator passed to each side:

Incidentally, the alternative Gen I suggested also works for infinite lists. (It's equivalent to StateT StdGen (Reader Int), using the StateT from Control.Monad.State.Lazy.) The problem, as Ryan pointed out, is that you can't access the state after the infinite computation, so you can't create two infinite streams or an infinite tree, which the current definition of Gen allows. More concretely, this works fine: stream = do x <- arbitrary xs <- stream return (x:xs) but you can't call arbitrary after you call stream broken = do xs <- stream y <- arbitrary -- can't evaluate until stream is fully evaluated (i.e., never) The present definition of Gen avoids this by splitting the StdGen at every >>=, but that creates the situation where two expressions which should be equivalent produce different results in some contexts. It isn't clear to me which implementation is best. I lean towards the StateT-like implementation, on the theory that it's limitations are easier to explain, but I guess it comes down to whether we want to make life easier for (a) people creating infinite structures or (b) people who need reproducible results. -- Dave Menendez http://www.eyrie.org/~zednenem/

5639

Age (days ago)

5643

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

David Menendez
Martijn van Steenbergen
Ryan Ingram
Sean Leather
Stefan Holdermans