ANNOUNCE: nobench: Haskell implementaion benchmarks. GHC v Hugs v Yhc v NHC v ...

Following recent discussion about a cross-implementation performance benchmark suite, based on nofib, I've gone and combined nofib with the great language shootout programs, and rewritten the build system to support cross implementation measurements. The result is: nobench http://www.cse.unsw.edu.au/~dons/nobench.html The benchmark suite runs regularly, and currently reports the speed of each program in the suite, running under each system. The results are quite interesting. The most recent run is available: http://www.cse.unsw.edu.au/~dons/nobench/bench.results http://www.cse.unsw.edu.au/~dons/nobench/bench.log The programs are a mixture of traditional nofib style Haskell, with more performance-tuned code from the shootout. More tweaking is required to help better support nhc and yhc (and jhc, and ...). The entire benchmark set and framework is available via darcs: darcs get http://www.cse.unsw.edu.au/~dons/code/nobench Currently todo are porting the rest of nofib, pretty graphs of the results (and html), and memory use measurements. Patches welcome! Cheers, Don

Hi Dons,
Yhc is consistently half the speed of nhc, whereas in our tests, its typically 20% faster. Can you make sure you've built Yhi with -O (scons type=release should do it). I opened a bug just a few days ago, because I realised all benchmark's would get run at no optimisation otherwise :) If anyone wants a project finding out what flags to build Yhi with to get the best performance here would be nice to see :) Why does the integrate benchmark import both System and System.Environment? Yhc currently doesn't export getArgs from System, only System.Environment. (And yes, we really should fix that!) Thanks Neil

Donald Bruce Stewart wrote:
Following recent discussion about a cross-implementation performance benchmark suite, based on nofib, I've gone and combined nofib with the great language shootout programs, and rewritten the build system to support cross implementation measurements.
Great work! ..but I wonder if the shootout is really the kind of code that is ideal for compiler benchmark. Typically (at least based on what I've seen of the submissions) they tend to be fairly heavily tuned, using optimizations that are a) obfuscating the code and b) tuned specifically for GHC. (Another potential issue that follows from this is how to resolve a modification to a benchmark that makes one compiler faster at the expense of another.) Wouldn't it be better to benchmark a more idiomatically correct codebase? -k

Quoth Ketil Malde, nevermore,
Wouldn't it be better to benchmark a more idiomatically correct codebase?
I suppose the ideal way to do it would be benchmarks for the (1) idiomatic and (2) the highly tuned implementations. Then the compiler writers can push 1 towards 2, while the pesky shootout implementers can move the goalposts of 2. ;-) In reality this may just foster a small set of horribly specialised optimisers in the compilers, with little benefit for real-world usage. :-( Cheers, D. -- Dougal Stanton

ithika:
Quoth Ketil Malde, nevermore,
Wouldn't it be better to benchmark a more idiomatically correct codebase?
I suppose the ideal way to do it would be benchmarks for the (1) idiomatic and (2) the highly tuned implementations. Then the compiler writers can push 1 towards 2, while the pesky shootout implementers can move the goalposts of 2. ;-)
In reality this may just foster a small set of horribly specialised optimisers in the compilers, with little benefit for real-world usage. :-(
I think more likely, and hopefully, we'll use this to check that things aren't getting worse from release to release. -- Don

Hello Dougal, Monday, February 19, 2007, 3:02:30 PM, you wrote:
I suppose the ideal way to do it would be benchmarks for the (1) idiomatic and (2) the highly tuned implementations. Then the compiler writers can push 1 towards 2, while the pesky shootout implementers can move the goalposts of 2. ;-)
In reality this may just foster a small set of horribly specialised optimisers in the compilers, with little benefit for real-world usage. :-(
i disagree. when i write some general-purpose library, i use these optimization tricks to make library as fast as possible. and wide audience of library users will benefit from such low-level optimizations great example of such low-level optimized library is ByteString which provides C-close speed with very high-level interface -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Ketil.Malde:
Donald Bruce Stewart wrote:
Following recent discussion about a cross-implementation performance benchmark suite, based on nofib, I've gone and combined nofib with the great language shootout programs, and rewritten the build system to support cross implementation measurements.
Great work!
..but I wonder if the shootout is really the kind of code that is ideal for compiler benchmark. Typically (at least based on what I've seen of the submissions) they tend to be fairly heavily tuned, using optimizations that are a) obfuscating the code and b) tuned specifically for GHC.
They exercise the pointy end of things. Specifically, mutable arrays, double precision math and bytestrings. Stuff we don't have tests for in nofib, that has performed poorly in the past (till we noticed it on the shootout..). This kind of code does get written in practice (and when it is written, it is usually because it needs to be fast). So I think the few that were added are useful. More category 'real' programs could be contributed, though. -- Don

On 2/19/07, Donald Bruce Stewart
results are quite interesting. The most recent run is available:
http://www.cse.unsw.edu.au/~dons/nobench/bench.results http://www.cse.unsw.edu.au/~dons/nobench/bench.log
Maybe I'm missing something, but how can ghci beat ghc (on pidigits)? BTW, nice compilation of tests =). -- Felipe.

felipe.lessa:
On 2/19/07, Donald Bruce Stewart
wrote: results are quite interesting. The most recent run is available:
http://www.cse.unsw.edu.au/~dons/nobench/bench.results http://www.cse.unsw.edu.au/~dons/nobench/bench.log
Maybe I'm missing something, but how can ghci beat ghc (on pidigits)?
BTW, nice compilation of tests =).
As far as I can see, this benchmark relies soley on how fast gmp is. There's very little overhead other than that. More investigation required though. -- Don

Hi all,
GHC v Hugs v Yhc v NHC v ...
... Hacle & Clean! I shoved 5 of the benchmarks that Donald used through Hacle, and compiled the outputs using version 2.1 of the Clean compiler. Results are below. As for the other examples, Hacle doesn't like non-Haskell98 and translates arbitrary-precision integers to fixed-precision ones (!) I'm not sure how well Hacle would work with nobench because input files must be unambiguously-typed assuming a "default ()" at the top. So some programs may require a little tweaking to go through. Mind, this was only a problem on 1 of the 5 programs I just tried... Matt. (Note: ignore the "65536" at the end of each Clean result -- my fault for not compiling with the right options) =================================================================== binarytrees (GHC) =================================================================== stretch tree of depth 17 check: -1 131072 trees of depth 4 check: -131072 32768 trees of depth 6 check: -32768 8192 trees of depth 8 check: -8192 2048 trees of depth 10 check: -2048 512 trees of depth 12 check: -512 128 trees of depth 14 check: -128 32 trees of depth 16 check: -32 long lived tree of depth 16 check: -1 real 0m3.301s user 0m3.280s sys 0m0.016s =================================================================== binarytrees (Clean) =================================================================== Execution: 2.34 Garbage collection: 0.25 Total: 2.59 stretch tree of depth 17 check: -1 131072 trees of depth 4 check: -131072 32768 trees of depth 6 check: -32768 8192 trees of depth 8 check: -8192 2048 trees of depth 10 check: -2048 512 trees of depth 12 check: -512 128 trees of depth 14 check: -128 32 trees of depth 16 check: -32 long lived tree of depth 16 check: -1 65536 real 0m2.691s user 0m2.592s sys 0m0.100s =================================================================== partial sums (GHC) =================================================================== 2.9999999999999987 (2/3)^k 3160.817621887086 k^-0.5 0.9999996000002026 1/k(k+1) 30.31454150956248 Flint Hills 42.99523399808393 Cookson Hills 15.30901715473893 Harmonic 1.644933666848388 Riemann Zeta 0.6931469805600944 Alternating Harmonic 0.7853980633974358 Gregory real 0m4.887s user 0m4.888s sys 0m0.000s =================================================================== partial sums (Clean) =================================================================== Execution: 4.41 Garbage collection: 0.05 Total: 4.46 3 (2/3)^k 3160.81762188709 k^-0.5 0.999999600000203 1/k(k+1) 30.3145415095625 Flint Hills 42.9952339980839 Cookson Hills 15.3090171547389 Harmonic 1.64493366684839 Riemann Zeta 0.693146980560094 Alternating Harmonic 0.785398063397435 Gregory 65536 real 0m4.545s user 0m4.468s sys 0m0.076s =================================================================== queens (GHC) =================================================================== 14200 real 0m1.990s user 0m1.980s sys 0m0.012s =================================================================== queens (Clean) =================================================================== Execution: 6.58 Garbage collection: 1.07 Total: 7.65 14200 65536 real 0m7.921s user 0m7.656s sys 0m0.264s =================================================================== recursive (GHC) =================================================================== Ack(3,9): 4093 Fib(36.0): 2.4157817e7 Tak(24,16,8): 9 Fib(3): 3 Tak(3.0,2.0,1.0): 2.0 real 0m5.232s user 0m5.224s sys 0m0.008s =================================================================== recursive (Clean) =================================================================== Execution: 2.40 Garbage collection: 0.00 Total: 2.40 Ack(3,9): 4093 Fib(36): 24157817 Tak(24,16,8): 9 Fib(3): 3 Tak(3,2,1): 2 65536 real 0m2.403s user 0m2.400s sys 0m0.000s =================================================================== loop (GHC) =================================================================== 3.3333333333333335 real 0m1.039s user 0m1.036s sys 0m0.004s =================================================================== loop (Clean) =================================================================== Execution: 1.26 Garbage collection: 0.00 Total: 1.26 3.33333333333333 65536 real 0m1.325s user 0m1.260s sys 0m0.068s

On Mon, Feb 19, 2007 at 08:12:14PM +0000, Matthew Naylor wrote:
Hi all,
GHC v Hugs v Yhc v NHC v ...
... Hacle & Clean!
I shoved 5 of the benchmarks that Donald used through Hacle, and compiled the outputs using version 2.1 of the Clean compiler. Results are below.
Submit a patch, it's easy! Took me <10 minutes to add YHC support and send it in. (the reason my name isn't in darcs changes is because dons' X crashed, killing darcs, irreperably corrupting _darcs, so he had to rm -r _darcs ; darcs init) Just edit header.mk and footer.mk in the obvious way.
As for the other examples, Hacle doesn't like non-Haskell98 and translates arbitrary-precision integers to fixed-precision ones (!)
Don't worry, nobench is based on a testsuite and as such is prepared to diff output. (if that doesn't happen, I'd consider it a bug)
I'm not sure how well Hacle would work with nobench because input files must be unambiguously-typed assuming a "default ()" at the top. So some programs may require a little tweaking to go through. Mind, this was only a problem on 1 of the 5 programs I just tried...
Well, he was willing to make concessions for Yhc brokenness (wrt importing System.Environment - yhc's System doesn't export getArgs like the Report says it should (first tangible result of nofib: the Yhc team has fixed it)) And don't worry about adding dependencies - you can remove compilers you don't have by editing the COMPILERS = line in header.mk. Stefan

Hi
Well, he was willing to make concessions for Yhc brokenness (wrt importing System.Environment - yhc's System doesn't export getArgs like the Report says it should (first tangible result of nofib: the Yhc team has fixed it))
The second tangible result should be that Yhc runs faster than nhc. Our internal testing originally showed a 20% speedup over nhc - something seems to have gone wrong to slow down Yhc, so we are working to fix this. Hopefully in a few days Yhc will beat nhc - just in case anyone is drawing performance ideas from the current benchmark. Thanks Neil

On 19/02/07, Neil Mitchell
The second tangible result should be that Yhc runs faster than nhc. Our internal testing originally showed a 20% speedup over nhc - something seems to have gone wrong to slow down Yhc, so we are working to fix this. Hopefully in a few days Yhc will beat nhc - just in case anyone is drawing performance ideas from the current benchmark.
Great! Nothing like a bit of competition to spur coding into action! :) Nice work, dons. -- -David House, dmhouse@gmail.com
participants (9)
-
Bulat Ziganshin
-
David House
-
dons@cse.unsw.edu.au
-
Dougal Stanton
-
Felipe Almeida Lessa
-
Ketil Malde
-
Matthew Naylor
-
Neil Mitchell
-
Stefan O'Rear