ANNOUNCE: The Fibon benchmark suite (v0.2.0)

I'm pleased to announce the release of the Fibon benchmark tools and suite. Fibon is a set of tools for running and analyzing benchmark programs in Haskell. Most importantly, it includes an optional set of benchmark programs including many programs taken from the Hackage open source repository. The source code for the tools and benchmarks are available on github https://github.com/dmpots/fibon http://github.com/dmpots/fibon-benchmarks The Fibon tools (without the benchmarks) are available on hackage. http://hackage.haskell.org/package/fibon The package needs to be unpacked and built in place to be able to run any benchmarks. It can be used with the official Fibon benchmarks or you can create your own suite and just use Fibon to run and analyze your benchmark programs. Some more documentation is available on the fibon wiki https://github.com/dmpots/fibon/wiki Fibon Tools =================================================================== Fibon is a pure Haskell framework for running and analyzing benchmark programs. Cabal is used for building the benchmarks. The benchmark harness, configuration files, and benchmark descriptions are all written in Haskell. The benchmark descriptions and run configurations are all statically compiled into the benchmark runner to ensure that configuration errors are found at compile time. The Fibon tools are not tied to any compiler infrastructure and can build benchmarks using any compiler supported by cabal. However, there are some extra features available when using GHC to build the benchmarks: * Support in config files for using an inplace GHC HEAD build * Support in `fibon-run` for collecting GC stats from GHC compiled programs * Support in `fibon-analyse` for reading GC stats from Fibon result files The Fibon Benchmark Suite =================================================================== The Fibon benchmark suite currently contains 34 benchmarks from a variety of sources. The individual benchmarks and lines of code are given below. Dph _DphLib 316 Dotp 308 Qsort 236 QuickHull 680 Sumsq 72 ------------------------------ TOTAL 1612 Hackage Agum 786 Bzlib 432 Cpsa 11582 Crypto 4486 Fgl 3834 Fst 4532 Funsat 16085 Gf 23970 HaLeX 4035 Happy 5833 Hgalib 819 Palindromes 496 Pappy 7313 QuickCheck 4495 Regex 6873 Simgi 5134 TernaryTrees 722 Xsact 2783 ------------------------------ TOTAL 104210 Repa _RepaLib 8775 Blur 77 FFT2d 89 FFT3d 103 Laplace 274 MMult 133 ------------------------------ TOTAL 9451 Shootout BinaryTrees 63 ChameneosRedux 96 Fannkuch 27 Mandelbrot 68 Nbody 192 Pidigits 26 SpectralNorm 97 ------------------------------ TOTAL 569

On Tue, Nov 9, 2010 at 1:24 PM, David Peixotto
I'm pleased to announce the release of the Fibon benchmark tools and suite.
Fibon is a set of tools for running and analyzing benchmark programs in Haskell. Most importantly, it includes an optional set of benchmark programs including many programs taken from the Hackage open source repository.
The source code for the tools and benchmarks are available on github
https://github.com/dmpots/fibon http://github.com/dmpots/fibon-benchmarks
The Fibon tools (without the benchmarks) are available on hackage.
http://hackage.haskell.org/package/fibon
The package needs to be unpacked and built in place to be able to run any benchmarks. It can be used with the official Fibon benchmarks or you can create your own suite and just use Fibon to run and analyze your benchmark programs.
Some more documentation is available on the fibon wiki
https://github.com/dmpots/fibon/wiki
Fibon Tools =================================================================== Fibon is a pure Haskell framework for running and analyzing benchmark programs. Cabal is used for building the benchmarks. The benchmark harness, configuration files, and benchmark descriptions are all written in Haskell. The benchmark descriptions and run configurations are all statically compiled into the benchmark runner to ensure that configuration errors are found at compile time.
The Fibon tools are not tied to any compiler infrastructure and can build benchmarks using any compiler supported by cabal. However, there are some extra features available when using GHC to build the benchmarks:
* Support in config files for using an inplace GHC HEAD build * Support in `fibon-run` for collecting GC stats from GHC compiled programs * Support in `fibon-analyse` for reading GC stats from Fibon result files
The Fibon Benchmark Suite =================================================================== The Fibon benchmark suite currently contains 34 benchmarks from a variety of sources. The individual benchmarks and lines of code are given below.
Congrats on the release! It looks like you've invested a lot of time and put in some hard work. I have a few questions: * What differentiates fibon from criterion? I see both use the statistics package. * Does it track memory statistics? I glanced at the FAQ but didn't see anything about it. * Are the numbers in the sample output seconds or milliseconds? What is the stddev (eg., what does the distribution of run-times look like)? Thanks, Jason

On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
I have a few questions: * What differentiates fibon from criterion? I see both use the statistics package.
I think the two packages have different benchmarking targets. Criterion allows you to easily test individual functions and gives some help with benchmarking in the presence of lazy evaluation. If some code does not execute for a long time it will run it multiple times to get sensible timings. Criterion does a much more sophisticated statistical analysis of the results, but I hope to incorporate that into the Fibon analysis in the future. Fibon is a more traditional benchmarking suite like SPEC or nofib. My interest is using it to test compiler optimizations. It can only benchmark at the whole program level by running an executable. It checks that the program produces the correct output, can collect extra metrics generated by the program, separates collecting results from analyzing results, and generates tables directly comparing the results from different benchmark runs.
* Does it track memory statistics? I glanced at the FAQ but didn't see anything about it.
Yes, it can read memory statistics dumped by the GHC runtime. It has built in support for reading the stats dumped by `+RTS -t --machine-readable` which includes things like bytes allocated and time spent in GC.
* Are the numbers in the sample output seconds or milliseconds? What is the stddev (eg., what does the distribution of run-times look like)?
I'm not sure which results you are referring to exactly (the numbers in the announcement were lines of code). I picked benchmarks that all ran for at least a second (and hopefully longer) with compiler optimizations enabled. On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds, mean time is 12.57 seconds and standard deviation is 14.56 seconds. -David

On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto
On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
I have a few questions: * What differentiates fibon from criterion? I see both use the statistics package.
I think the two packages have different benchmarking targets.
Criterion allows you to easily test individual functions and gives some help with benchmarking in the presence of lazy evaluation. If some code does not execute for a long time it will run it multiple times to get sensible timings. Criterion does a much more sophisticated statistical analysis of the results, but I hope to incorporate that into the Fibon analysis in the future.
Fibon is a more traditional benchmarking suite like SPEC or nofib. My interest is using it to test compiler optimizations. It can only benchmark at the whole program level by running an executable. It checks that the program produces the correct output, can collect extra metrics generated by the program, separates collecting results from analyzing results, and generates tables directly comparing the results from different benchmark runs.
* Does it track memory statistics? I glanced at the FAQ but didn't see anything about it.
Yes, it can read memory statistics dumped by the GHC runtime. It has built in support for reading the stats dumped by `+RTS -t --machine-readable` which includes things like bytes allocated and time spent in GC.
Oh, I see. In that case, it's more similar to darcs-benchmark. Except that darcs-benchmark is tailored specifically at benchmarking darcs. Where they overlap is parsing the RTS statistics, running the whole program, and tabular reports. Darcs-benchmark adds to that an embedded DSL for specifying operations to do on the repository between benchmarks (and translating those operations to runnable shell snippets). I wonder if Fibon and darcs-benchmark could share common infrastructure beyond the statistics package. It sure sounds like it to me. Perhaps some collaboration is in order.
* Are the numbers in the sample output seconds or milliseconds? What is the stddev (eg., what does the distribution of run-times look like)?
I'm not sure which results you are referring to exactly (the numbers in the announcement were lines of code). I picked benchmarks that all ran for at least a second (and hopefully longer) with compiler optimizations enabled. On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds, mean time is 12.57 seconds and standard deviation is 14.56 seconds.
I probably read your email too fast, sorry. Thanks for the clarification. Thanks, Jason

Hi Jason,
Sorry for the delayed response. Thanks for pointing out the darcs-benchmark
package. I had not seen that before and there may be some room for sharing
infrastructure. Parsing the runtime stats is pretty easy, but comparing
different runs, computing statistics, and generating tables should be a
common task.
On a related note, when I uploaded the fibon package, I put it in a new
"Benchmarking" category as opposed to the existing "Testing" category. In my
mind testing is more for correctness and benchmarking is for performance. I
think it would be useful to include other benchmarking packages
(darcs-benchmark, criterion) in that category.
--------------------------------------------------
From: "Jason Dagit"
On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto
wrote: On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
I have a few questions: * What differentiates fibon from criterion? I see both use the statistics package.
I think the two packages have different benchmarking targets.
Criterion allows you to easily test individual functions and gives some help with benchmarking in the presence of lazy evaluation. If some code does not execute for a long time it will run it multiple times to get sensible timings. Criterion does a much more sophisticated statistical analysis of the results, but I hope to incorporate that into the Fibon analysis in the future.
Fibon is a more traditional benchmarking suite like SPEC or nofib. My interest is using it to test compiler optimizations. It can only benchmark at the whole program level by running an executable. It checks that the program produces the correct output, can collect extra metrics generated by the program, separates collecting results from analyzing results, and generates tables directly comparing the results from different benchmark runs.
* Does it track memory statistics? I glanced at the FAQ but didn't see anything about it.
Yes, it can read memory statistics dumped by the GHC runtime. It has built in support for reading the stats dumped by `+RTS -t --machine-readable` which includes things like bytes allocated and time spent in GC.
Oh, I see. In that case, it's more similar to darcs-benchmark. Except that darcs-benchmark is tailored specifically at benchmarking darcs. Where they overlap is parsing the RTS statistics, running the whole program, and tabular reports. Darcs-benchmark adds to that an embedded DSL for specifying operations to do on the repository between benchmarks (and translating those operations to runnable shell snippets).
I wonder if Fibon and darcs-benchmark could share common infrastructure beyond the statistics package. It sure sounds like it to me. Perhaps some collaboration is in order.
* Are the numbers in the sample output seconds or milliseconds? What is the stddev (eg., what does the distribution of run-times look like)?
I'm not sure which results you are referring to exactly (the numbers in the announcement were lines of code). I picked benchmarks that all ran for at least a second (and hopefully longer) with compiler optimizations enabled. On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds, mean time is 12.57 seconds and standard deviation is 14.56 seconds.
I probably read your email too fast, sorry. Thanks for the clarification.
Thanks, Jason
participants (2)
-
David Peixotto
-
Jason Dagit