
Hello, I'm writing a library for dealing with binders and I want to benchmark it against DeBruijn, Locally Nameless, HOAS, etc. One on my benchmark consists in 1. generating a big term \x.t 2. substituting u fox in t The part I want to benchmark is 2. In particular I would like that: a. \x.t is already evaluated when I run 2 (I don't want to measure the performances of the generator) b. the action of substituting u for x in t were measured as if I had to fully evaluate the result (by printing the resulting term for instance). After looking at what was available on hackage, i set my mind on strictbench, which basically calls (rnf x `seq` print "") and then uses benchpress to measure the pure computation x. Since I wanted (a), my strategy was (schematically): let t = genterm rnf t `seq` print "" bench (subst u t) I got numbers I didn't expect so I ran the following program: let t = genterm print t bench (subst u t) and then I got other numbers! Which were closer to what I think they should be, so I may be happy with them, but all of this seems to indicate that rnf doesn't behave as intended. Then I did something different: I wrote two programs. One that prints the result of (subst u t): let t = genterm let x = (subst u t) print x bench (print x) recorded the numbers of that one and then ran the program: let t = genterm bench (print (subst u t)) got the numbers, and substracted the first ones to them. By doing so, I'm sure that I get realistic numbers at least: since I print the whole resulting term, I've got a visual "proof" that it's been evaluated. But this is not very satisfactory. Does anyone have an idea why calling rnf before the bench doesn't seem to "cache" the result as calling show does? (my instances of NFData follow the scheme described in strictbench documentation). If not, do you think that measuring (computation + pretty printing time - pretty printing time) is ok? Regards, Paul