
Hi all,
On 07 Feb 2013, at 10:44, Simon Marlow
On 06/02/13 22:26, Andy Georges wrote:
Quantifying performance changes with effect size confidence intervals - Tomas Kalibera and Richard Jones, 2012 (tech report)
This is a good one - it was actually a talk by Richard Jones that highlighted to me the problems with averaging over benchmarks (aside from the problem with GM, which he didn't mention).
The paper has a guide for practitioners that improves on what I did in part of my PhD. I think it could be fairly easy to wrap that around Criterion for comparing runs -- most of your . I should note that a number of people I know are involved in performance measurement think it is a bit too detailed, but if you can implement this in your testing framework, it could be a cool feature that other people start using too.
This paper mentions Criterion, incidentally.
Yes :-) I mentioned it several times when we discussed performance measuring in the Evaluate workshops. Since I changed jobs, I am no longer very actively involved here, but some people seem to have picked things up, I guess.
• [[1]] J.E., Smith. Characterizing computer performance with a single number. CACM 31(10), 1988.
And I wish I'd read this a long time ago :) Thanks. No more geometric means for me!
You are very welcome. Regards, -- Andy