
Hello, I'm writing a library for dealing with binders and I want to benchmark it against DeBruijn, Locally Nameless, HOAS, etc. One on my benchmark consists in 1. generating a big term \x.t 2. substituting u fox in t The part I want to benchmark is 2. In particular I would like that: a. \x.t is already evaluated when I run 2 (I don't want to measure the performances of the generator) b. the action of substituting u for x in t were measured as if I had to fully evaluate the result (by printing the resulting term for instance). After looking at what was available on hackage, i set my mind on strictbench, which basically calls (rnf x `seq` print "") and then uses benchpress to measure the pure computation x. Since I wanted (a), my strategy was (schematically): let t = genterm rnf t `seq` print "" bench (subst u t) I got numbers I didn't expect so I ran the following program: let t = genterm print t bench (subst u t) and then I got other numbers! Which were closer to what I think they should be, so I may be happy with them, but all of this seems to indicate that rnf doesn't behave as intended. Then I did something different: I wrote two programs. One that prints the result of (subst u t): let t = genterm let x = (subst u t) print x bench (print x) recorded the numbers of that one and then ran the program: let t = genterm bench (print (subst u t)) got the numbers, and substracted the first ones to them. By doing so, I'm sure that I get realistic numbers at least: since I print the whole resulting term, I've got a visual "proof" that it's been evaluated. But this is not very satisfactory. Does anyone have an idea why calling rnf before the bench doesn't seem to "cache" the result as calling show does? (my instances of NFData follow the scheme described in strictbench documentation). If not, do you think that measuring (computation + pretty printing time - pretty printing time) is ok? Regards, Paul

Excerpts from Paul Brauner's message of Wed Mar 31 03:17:02 -0400 2010:
The part I want to benchmark is 2. In particular I would like that:
a. \x.t is already evaluated when I run 2 (I don't want to measure the performances of the generator) b. the action of substituting u for x in t were measured as if I had to fully evaluate the result (by printing the resulting term for instance).
Criterion uses Control.DeepSeq; perhaps this is what you are looking for? Cheers, Edward

On Wed, Mar 31, 2010 at 9:17 AM, Paul Brauner
Does anyone have an idea why calling rnf before the bench doesn't seem to "cache" the result as calling show does? (my instances of NFData follow the scheme described in strictbench documentation).
Is it possible you could show us your term type and your NFData instance? regards, Bas

Hello, actually I don't know if I can. I totally wouldn't mind but this is mainly my co-author work and I don't know if he would (I suppose not but since he is sleeping right now I can't check). However let's assume it's a deBruijn representation for instance, I can tell you the scheme I used: data Term = Lam Term | App Term Term | Var Int instance NFData where rnf (Lam t) = rnf t rnf (App t1 t2) = rnf t1 `seq` rnf t2 rnf (Var x) = rnf x the actual datatype doesn't have fancy stuff like higher-order types for constructors, it's really similar. The only difference is that it is a GADT, but this souldn't change anything right? Did I make some mistake in instancing NFData ? Regards, Paul On Wed, Mar 31, 2010 at 09:32:29AM +0200, Bas van Dijk wrote:
On Wed, Mar 31, 2010 at 9:17 AM, Paul Brauner
wrote: Does anyone have an idea why calling rnf before the bench doesn't seem to "cache" the result as calling show does? (my instances of NFData follow the scheme described in strictbench documentation).
Is it possible you could show us your term type and your NFData instance?
regards,
Bas

On Wed, Mar 31, 2010 at 11:06 AM, Paul Brauner
data Term = Lam Term | App Term Term | Var Int
instance NFData where rnf (Lam t) = rnf t rnf (App t1 t2) = rnf t1 `seq` rnf t2 rnf (Var x) = rnf x
the actual datatype doesn't have fancy stuff like higher-order types for constructors, it's really similar. The only difference is that it is a GADT, but this souldn't change anything right?
Did I make some mistake in instancing NFData ?
No, your NFData instance is correct. You first pattern match on the term followed by recursively calling rnf on the sub-terms. This will correctly force the entire term. Maybe you could try using criterion[1] for your benchmark and see if that makes any difference. Something like: {-# LANGUAGE BangPatterns #-} import Criterion.Main main :: IO () main = let !t = genterm in defaultMain [bench "subst" $ nf (subst u) t] regards, Bas [1] http://hackage.haskell.org/package/criterion

Thank you, I will look at that. But it seems that criterion uses NFData no? Paul On Wed, Mar 31, 2010 at 12:57:20PM +0200, Bas van Dijk wrote:
On Wed, Mar 31, 2010 at 11:06 AM, Paul Brauner
wrote: data Term = Lam Term | App Term Term | Var Int
instance NFData where rnf (Lam t) = rnf t rnf (App t1 t2) = rnf t1 `seq` rnf t2 rnf (Var x) = rnf x
the actual datatype doesn't have fancy stuff like higher-order types for constructors, it's really similar. The only difference is that it is a GADT, but this souldn't change anything right?
Did I make some mistake in instancing NFData ?
No, your NFData instance is correct. You first pattern match on the term followed by recursively calling rnf on the sub-terms. This will correctly force the entire term.
Maybe you could try using criterion[1] for your benchmark and see if that makes any difference. Something like:
{-# LANGUAGE BangPatterns #-}
import Criterion.Main
main :: IO () main = let !t = genterm in defaultMain [bench "subst" $ nf (subst u) t]
regards,
Bas

On Wed, Mar 31, 2010 at 4:12 AM, Paul Brauner
Thank you, I will look at that. But it seems that criterion uses NFData no?
I do not know of anything wrong with NFData. What you're seeing is much more likely to be a bug in either the benchmarking library you're using, or in your use of it. Most of the benchmarking frameworks on Hackage are extremely dodgy, which was why I wrote criterion.

Ok, thank you for all your answers. I'm going to use NFData as advised by everyone. Paul On Wed, Mar 31, 2010 at 10:38:50AM -0700, Bryan O'Sullivan wrote:
On Wed, Mar 31, 2010 at 4:12 AM, Paul Brauner
wrote: Thank you, I will look at that. But it seems that criterion uses NFData no?
I do not know of anything wrong with NFData. What you're seeing is much more likely to be a bug in either the benchmarking library you're using, or in your use of it. Most of the benchmarking frameworks on Hackage are extremely dodgy, which was why I wrote criterion.
participants (4)
-
Bas van Dijk
-
Bryan O'Sullivan
-
Edward Z. Yang
-
Paul Brauner