
Hi,
Following up and the threads on haskell and haskell-cafe, I'd like to gather ideas, comments and suggestions for a standarized Haskell Benchmark Suite.
The idea is to gather a bunch of programs written in Haskell, and which are representative for the Haskell community (i.e. apps, libraries, ...). Following the example of SPEC (besides the fact that the SPEC benchmarks aren't available for free), we would like to build a database containing performance measurements for the various benchmarks in the suite. Users should be able to submit their results. This will hopefully stimulate people to take performance into account when writing a Haskell program/library, and will also serve as a valuable tool for further optimizing both applications written in Haskell and the various Haskell compilers out there (GHC, jhc, nhc, ...).
This thread is meant to gather peoples thought on this subject. Which programs should we consider for the first version of the Haskell benchmark suite? How should we standarize them, and make them produce reliable performance measurement? Should we only use hardware performance counters, or also do more thorough analysis such as data locality studies, ... Are there any papers available on this subject (I know about the paper which is being written as we speak ICFP, which uses PAPI as a tool).
I think that we should have, as David Roundy pointed out, a restriction to code that is actually used frequently. However, I think we should make a distinction between micro-benchmarks, that test some specific item, and real-life benchmarks. When using micro benchmarks, the wrong conclusions may be drawn, because e.g., code or data can be completely cached, there are no TLB misses after startup, etc. I think that is somebody is interested in knowing how Haskell performs, and if he should use it for his development, it is nice to know that e.g., Data.ByteString performs as good as C, but is would be even nicer to see that large, real-life apps can reach that same performance. There is more to the Haskell runtime than simply executing application code, and these things should also be taken into account. Also, I think that having several compilers for the benchmark set is a good idea, because, afaik, they can provide a different runtime system as well. We know that in Java, the VM can have a significant impact on behaviour on the microprocessor. I think that Haskell may have similar issues. Also, similar to SPEC CPU, it would be nice to have input sets for each benchmark that gets included into the set. Furthermore, I think that we should provide a rigorous analysis of the benchmarks on as many platforms as is feasible. See e.g., the analysis done for the Dacapo Java benchmark suite, published at OOPSLA 2006. -- Andy