proper way to generate a random data in criterion

Hello, I'm measuring performance of the insertion operation of red-black trees. For input, three kinds of [Int] are prepared: the increasing the order, decreasing order, and random. The random case is 4 or 5 times slower than the others. I'm afraid that my program also measured the cost of random Int generation. My benchmark code can be found: https://github.com/kazu-yamamoto/llrbtree/blob/master/bench/insert/Bench.hs Does anyone kindly take a look and tell me whether or not my criterion code measures the cost of random Int generation? If so, would you suggest how to avoid it? Background infromation can be found: Purely Functional Left-Leaning Red-Black Trees http://www.mew.org/~kazu/proj/red-black-tree/ --Kazu

On Wed, Oct 19, 2011 at 10:13 AM, Kazu Yamamoto
Hello,
I'm measuring performance of the insertion operation of red-black trees. For input, three kinds of [Int] are prepared: the increasing the order, decreasing order, and random.
The random case is 4 or 5 times slower than the others. I'm afraid that my program also measured the cost of random Int generation.
My benchmark code can be found:
https://github.com/kazu-yamamoto/llrbtree/blob/master/bench/insert/Bench.hs
Does anyone kindly take a look and tell me whether or not my criterion code measures the cost of random Int generation? If so, would you suggest how to avoid it?
The code looks ok to me -- you've deepseq'ed the list, and forcing it
to whnf should force the deepseq. Also, criterion runs your benchmark
many times, if your code was measuring the RNG time it would only
happen once. This would show up in the criterion output as an
unusually large outlier.
G
--
Gregory Collins

On 19 October 2011 19:21, Gregory Collins
On Wed, Oct 19, 2011 at 10:13 AM, Kazu Yamamoto
wrote: Hello,
I'm measuring performance of the insertion operation of red-black trees. For input, three kinds of [Int] are prepared: the increasing the order, decreasing order, and random.
The random case is 4 or 5 times slower than the others. I'm afraid that my program also measured the cost of random Int generation.
My benchmark code can be found:
https://github.com/kazu-yamamoto/llrbtree/blob/master/bench/insert/Bench.hs
Does anyone kindly take a look and tell me whether or not my criterion code measures the cost of random Int generation? If so, would you suggest how to avoid it?
The code looks ok to me -- you've deepseq'ed the list, and forcing it to whnf should force the deepseq. Also, criterion runs your benchmark many times, if your code was measuring the RNG time it would only happen once. This would show up in the criterion output as an unusually large outlier.
Wouldn't it be better to write a sensible NFData instance than to explicitly use deepseq? However, you could try to fix this by using an actual seed rather than mkStdGen; this would _really_ give you random values to let you see whether different runtimes affect matters.
G -- Gregory Collins
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Greg,
The code looks ok to me -- you've deepseq'ed the list, and forcing it to whnf should force the deepseq. Also, criterion runs your benchmark many times, if your code was measuring the RNG time it would only happen once. This would show up in the criterion output as an unusually large outlier.
Thank you for your review. Because my benchmark seems OK, I need to think why the algorithm works slower against random data. :) --Kazu

Hi,
On Wed, Oct 19, 2011 at 1:13 AM, Kazu Yamamoto
Hello,
I'm measuring performance of the insertion operation of red-black trees. For input, three kinds of [Int] are prepared: the increasing the order, decreasing order, and random.
The random case is 4 or 5 times slower than the others. I'm afraid that my program also measured the cost of random Int generation.
My benchmark code can be found:
https://github.com/kazu-yamamoto/llrbtree/blob/master/bench/insert/Bench.hs
Does anyone kindly take a look and tell me whether or not my criterion code measures the cost of random Int generation?
It does. You need to use evaluate to have ensure actually be evaluated.
If so, would you suggest how to avoid it?
Have a look at: https://github.com/tibbe/unordered-containers/blob/master/benchmarks/Benchma... -- Johan

On Wed, Oct 19, 2011 at 5:03 PM, Johan Tibell
It does. You need to use evaluate to have ensure actually be evaluated.
I'm almost certain you're wrong about this. The bang pattern on the
return from ensure (!r1 <- ensure $ ...) forces r1 to WHNF, which goes
through deepseq, and thus the whole list is forced. See
https://gist.github.com/1299380 for a short counterexample.
G
--
Gregory Collins

On Wed, Oct 19, 2011 at 12:21 PM, Gregory Collins
On Wed, Oct 19, 2011 at 5:03 PM, Johan Tibell
wrote: It does. You need to use evaluate to have ensure actually be evaluated.
I'm almost certain you're wrong about this. The bang pattern on the return from ensure (!r1 <- ensure $ ...) forces r1 to WHNF, which goes through deepseq, and thus the whole list is forced. See https://gist.github.com/1299380 for a short counterexample.
I should have paid more attention; I missed the bangs on the bindings. I still recommend the pattern I linked in my previous email. If you want to do it they way you currently do use let !foo = xs `deepseq` xs no return needed.

On Wed, 19 Oct 2011 21:21:48 +0200, Gregory Collins
On Wed, Oct 19, 2011 at 5:03 PM, Johan Tibell
wrote: It does. You need to use evaluate to have ensure actually be evaluated.
I'm almost certain you're wrong about this. The bang pattern on the return from ensure (!r1 <- ensure $ ...) forces r1 to WHNF, which goes through deepseq, and thus the whole list is forced. See https://gist.github.com/1299380 for a short counterexample.
G
You don't need the ensure function or the bang patterns if you do it like this: main = deepseq testList $ do putStrLn "we survived without evaluating testList" let x = foldl' (+) 0 $ take 50 testList putStrLn $ "x was " ++ show x Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html --

On 19 October 2011 17:03, Johan Tibell
Have a look at:
https://github.com/tibbe/unordered-containers/blob/master/benchmarks/Benchma...
I see you use the (evaluate . rnf) composition. I also used it in: https://github.com/basvandijk/vector-bytestring/blob/master/bench.hs#L118 and called it: deepEvaluate :: NFData a => a -> IO () deepEvaluate = evaluate . rnf I'm not sure about the name but I think it would be nice if this was added to Control.DeepSeq. Bas
participants (6)
-
Bas van Dijk
-
Gregory Collins
-
Henk-Jan van Tuyl
-
Ivan Lazar Miljenovic
-
Johan Tibell
-
Kazu Yamamoto