benchmarking c/c++ and haskell

Hi, I would like to benchmark C/C++ and Haskell code. The goal is to improve the Haskell port[0] of smallpt[1]. To make sure my approach was reliable, I got the code of two programs (one in C, the other in Haskell) from a post[2] by Don. The code is reproduced below. When timing the execution of both program, I have a
4x difference. It is said on the blog the programs should have similar performance.
I simply don't get the reason of such a difference. I've tried the
code on my Atom netbook and also on an older centrino machine. The
timing are similar (i.e. the C and Haskell program show >4x
difference). Both machines have GHC 6.12.1 on Linux.
Would you have an idea?
[0] http://hackage.haskell.org/package/smallpt-hs
[1] http://www.kevinbeason.com/smallpt/
[2] http://donsbot.wordpress.com/2008/06/04/haskell-as-fast-as-c-working-at-a-hi...
$ gcc -O2 mean.c -omean-c
$ ghc --make -O2 mean.hs -o mean-hs
$ time ./mean-c 1e8
50000000.500000
real 0m1.575s
user 0m1.513s
sys 0m0.000s
$ time ./mean-hs 1e8
50000000.5
real 0m6.997s
user 0m6.856s
sys 0m0.013s
-- file mean.hs
module Main where
import System.Environment
import Text.Printf
mean :: Double -> Double -> Double
mean n m = go 0 0 n
where
go :: Double -> Int -> Double -> Double
go s l x | x > m = s / fromIntegral l
| otherwise = go (s+x) (l+1) (x+1)
main = do
[d] <- map read `fmap` getArgs
printf "%f\n" (mean 1 d)
/* file mean.c */
#include

Vo Minh Thu wrote:
Hi,
I would like to benchmark C/C++ and Haskell code. The goal is to improve the Haskell port[0] of smallpt[1].
To make sure my approach was reliable, I got the code of two programs (one in C, the other in Haskell) from a post[2] by Don. The code is reproduced below. When timing the execution of both program, I have a
4x difference. It is said on the blog the programs should have
similar performance.
I simply don't get the reason of such a difference. I've tried the code on my Atom netbook and also on an older centrino machine. The timing are similar (i.e. the C and Haskell program show >4x difference). Both machines have GHC 6.12.1 on Linux.
Would you have an idea?
The function floor :: Double -> Int is surprisingly slow under GHC. (IIRC, it's implemented by converting Double -> (Int, Integer) -> Int, or something equally absurd.) Poking around GHC.Prim directly allows you to do the same operation much, much faster. I couldn't say exactly how much of a difference it makes, but I've had programs go from seconds to microseconds just by switching this. http://hackage.haskell.org/trac/ghc/ticket/2271 http://hackage.haskell.org/trac/ghc/ticket/1434 My solution was to do this: http://hackage.haskell.org/packages/archive/AC-Colour/1.1.3/doc/html/src/Dat... Uh... good luck! o_O (Of course, I could be speaking complete nonesense and this bug has since been fixed, or the cause of the slowness in smallpt is something else...)

2010/9/12 Andrew Coppin
Vo Minh Thu wrote:
Hi,
I would like to benchmark C/C++ and Haskell code. The goal is to improve the Haskell port[0] of smallpt[1].
To make sure my approach was reliable, I got the code of two programs (one in C, the other in Haskell) from a post[2] by Don. The code is reproduced below. When timing the execution of both program, I have a
4x difference. It is said on the blog the programs should have
similar performance.
I simply don't get the reason of such a difference. I've tried the code on my Atom netbook and also on an older centrino machine. The timing are similar (i.e. the C and Haskell program show >4x difference). Both machines have GHC 6.12.1 on Linux.
Would you have an idea?
The function floor :: Double -> Int is surprisingly slow under GHC. (IIRC, it's implemented by converting Double -> (Int, Integer) -> Int, or something equally absurd.) Poking around GHC.Prim directly allows you to do the same operation much, much faster. I couldn't say exactly how much of a difference it makes, but I've had programs go from seconds to microseconds just by switching this.
http://hackage.haskell.org/trac/ghc/ticket/2271 http://hackage.haskell.org/trac/ghc/ticket/1434
My solution was to do this:
http://hackage.haskell.org/packages/archive/AC-Colour/1.1.3/doc/html/src/Dat...
Uh... good luck! o_O
Thanks for the tip. But I would really get the two little programs from Don's blog to have the same performance as advertised so that I can build from there. I have to make sure things are different for a good reason, not because of a flaw in my setup (which I assume it is since I can't even get those two to run in the same time). Thanks, Thu

Does it help to compile with ghc --make -O2 -funbox-strict-fields ?? David.

On Monday 13 September 2010 11:50:14, Vo Minh Thu wrote:
2010/9/13 David Virebayre
: Does it help to compile with ghc --make -O2 -funbox-strict-fields ??
No, it doesn't. Can I assume you don't have the problem I described?
Currently, GHC's native code generator is not too good at optimising loops. It might help if you compile via C, ghc -O2 -fexcess-precision -fvia-C -optc-O3 On my box, that gives a > 2× speedup (unfortunately, that means it takes almost three times as long as the C version instead of > 6×). If you have the llvm backend for GHC, that is supposedly better for such code.
Thanks, Thu

2010/9/13 Daniel Fischer
On Monday 13 September 2010 11:50:14, Vo Minh Thu wrote:
2010/9/13 David Virebayre
: Does it help to compile with ghc --make -O2 -funbox-strict-fields ??
No, it doesn't. Can I assume you don't have the problem I described?
Currently, GHC's native code generator is not too good at optimising loops. It might help if you compile via C,
ghc -O2 -fexcess-precision -fvia-C -optc-O3
On my box, that gives a > 2× speedup (unfortunately, that means it takes almost three times as long as the C version instead of > 6×).
With your options, the Haskell code is only 1.5x slower. But still...
If you have the llvm backend for GHC, that is supposedly better for such code.
... the post is from 2008. No LLVM goodness. So I thought GHC 6.12.1 (not the latest and greatest HEAD) would be enough. Thanks, Thu

On 13 September 2010 20:41, Vo Minh Thu
... the post is from 2008. No LLVM goodness. So I thought GHC 6.12.1 (not the latest and greatest HEAD) would be enough.
I compiled the two programs myself out of curiosity and got the following times. Linux, 64bit, Ubuntu 10.10: 1e8 clang: 0.180s gcc: 0.179s ghc 6.12.1 (viac): 0.187s ghc 6.12.1 (fasm): 0.218s ghc HEAD (viac): 0.186s ghc HEAD (fasm): 0.179s ghc HEAD (llvm): 0.174s 1e9 clang: 1.657s gcc: 1.647s ghc 6.12.1 (viac): 1.653s ghc 6.12.1 (fasm): 1.975s ghc HEAD (viac): 1.648s ghc HEAD (fasm): 1.658s ghc HEAD (llvm): 1.646s So basically all have the same time except ghc 6.12.1 where fasm is a little slow. On windows xp 32bit I get quite different results which I trust less as the times are jumping around much more then they were on linux: 1e8 gcc: 0.365s ghc 6.12.1 (viac): 5.287s ghc 6.12.1 (fasm): 1.332s ghc HEAD (viac): 5.292s ghc HEAD (fasm): 0.875s ghc HEAD (llvm): 0.359s Not sure why the results on windows are so different. If anyone else wants to run the two programs on Windows and check that would be great. Cheers, David

On Tue, Sep 14, 2010 at 5:50 PM, David Terei
On 13 September 2010 20:41, Vo Minh Thu
wrote: ... the post is from 2008. No LLVM goodness. So I thought GHC 6.12.1 (not the latest and greatest HEAD) would be enough.
I compiled the two programs myself out of curiosity and got the following times.
Linux, 64bit, Ubuntu 10.10:
1e8 clang: 0.180s gcc: 0.179s ghc 6.12.1 (viac): 0.187s ghc 6.12.1 (fasm): 0.218s ghc HEAD (viac): 0.186s ghc HEAD (fasm): 0.179s ghc HEAD (llvm): 0.174s
1e9 clang: 1.657s gcc: 1.647s ghc 6.12.1 (viac): 1.653s ghc 6.12.1 (fasm): 1.975s ghc HEAD (viac): 1.648s ghc HEAD (fasm): 1.658s ghc HEAD (llvm): 1.646s
So basically all have the same time except ghc 6.12.1 where fasm is a little slow.
On windows xp 32bit I get quite different results which I trust less as the times are jumping around much more then they were on linux:
Thanks for your rather extensive effort to pin down the performance numbers. I just wanted to add a suggestion: I would highly recommend using criterion for this. It's easy, simple, and good at giving you statistically robust measures of the time it takes. http://hackage.haskell.org/package/criterion I resisted using it for a while, but once I took the plunge I was quite happy with the results. It's a library that is definitely worth the spin up time (for me at least). Jason

On Wednesday 15 September 2010 02:50:15, David Terei wrote:
On 13 September 2010 20:41, Vo Minh Thu
wrote: ... the post is from 2008. No LLVM goodness. So I thought GHC 6.12.1 (not the latest and greatest HEAD) would be enough.
I compiled the two programs myself out of curiosity and got the following times.
Linux, 64bit, Ubuntu 10.10:
1e8 clang: 0.180s gcc: 0.179s ghc 6.12.1 (viac): 0.187s ghc 6.12.1 (fasm): 0.218s ghc HEAD (viac): 0.186s ghc HEAD (fasm): 0.179s ghc HEAD (llvm): 0.174s
1e9 clang: 1.657s gcc: 1.647s ghc 6.12.1 (viac): 1.653s ghc 6.12.1 (fasm): 1.975s ghc HEAD (viac): 1.648s ghc HEAD (fasm): 1.658s ghc HEAD (llvm): 1.646s
So basically all have the same time except ghc 6.12.1 where fasm is a little slow.
On windows xp 32bit I get quite different results which I trust less as the times are jumping around much more then they were on linux:
1e8 gcc: 0.365s ghc 6.12.1 (viac): 5.287s ghc 6.12.1 (fasm): 1.332s ghc HEAD (viac): 5.292s ghc HEAD (fasm): 0.875s ghc HEAD (llvm): 0.359s
Not sure why the results on windows are so different.
I have no idea why, but I remember that on several occasions timings for via-C compiled programmes on Windows have been abysmal. Whether it's a general Windows/gcc mismatch or something GHC-specific, I've no idea (and can't find out, not having Windows). For what it's worth, the approximate timings on my 32-bit linux (openSUSE 11.1) box have been 1e8 gcc: ~0.4s 6.12.3 (viac): ~1.2s 6.12.3 (fasm): ~2.7s HEAD the same as 6.12.3 (I have deleted the programmes and I don't remember the exact timings, if you're interested, I could reconstruct them).
If anyone else wants to run the two programs on Windows and check that would be great.
Cheers, David

Thanks all for your answsers. I still wonder why some people get very
different results between gcc and ghc, and some others don't. A
difference in processor?
I guess I will crank up a little package using criterion and producing
two executables to make sure anyone who run the benchmark use the same
options, and ease the whole process.
Thanks,
Thu
2010/9/15 Daniel Fischer
On Wednesday 15 September 2010 02:50:15, David Terei wrote:
On 13 September 2010 20:41, Vo Minh Thu
wrote: ... the post is from 2008. No LLVM goodness. So I thought GHC 6.12.1 (not the latest and greatest HEAD) would be enough.
I compiled the two programs myself out of curiosity and got the following times.
Linux, 64bit, Ubuntu 10.10:
1e8 clang: 0.180s gcc: 0.179s ghc 6.12.1 (viac): 0.187s ghc 6.12.1 (fasm): 0.218s ghc HEAD (viac): 0.186s ghc HEAD (fasm): 0.179s ghc HEAD (llvm): 0.174s
1e9 clang: 1.657s gcc: 1.647s ghc 6.12.1 (viac): 1.653s ghc 6.12.1 (fasm): 1.975s ghc HEAD (viac): 1.648s ghc HEAD (fasm): 1.658s ghc HEAD (llvm): 1.646s
So basically all have the same time except ghc 6.12.1 where fasm is a little slow.
On windows xp 32bit I get quite different results which I trust less as the times are jumping around much more then they were on linux:
1e8 gcc: 0.365s ghc 6.12.1 (viac): 5.287s ghc 6.12.1 (fasm): 1.332s ghc HEAD (viac): 5.292s ghc HEAD (fasm): 0.875s ghc HEAD (llvm): 0.359s
Not sure why the results on windows are so different.
I have no idea why, but I remember that on several occasions timings for via-C compiled programmes on Windows have been abysmal. Whether it's a general Windows/gcc mismatch or something GHC-specific, I've no idea (and can't find out, not having Windows).
For what it's worth, the approximate timings on my 32-bit linux (openSUSE 11.1) box have been
1e8 gcc: ~0.4s 6.12.3 (viac): ~1.2s 6.12.3 (fasm): ~2.7s HEAD the same as 6.12.3
(I have deleted the programmes and I don't remember the exact timings, if you're interested, I could reconstruct them).
If anyone else wants to run the two programs on Windows and check that would be great.
Cheers, David

On Wednesday 15 September 2010 12:22:24, Vo Minh Thu wrote:
Thanks all for your answsers. I still wonder why some people get very different results between gcc and ghc, and some others don't. A difference in processor?
Architecture (32/64-bit, x86/..., ...), processor, gcc version, OS, all play a role. Since ghc fares pretty well on 64-bit linux (David Terei and Don), both, via the NCG and via C, it seems its 64-bit code generator is better than its 32-bit code generator :( And judging from David's 32-bit Windows timings, GHC's 32-bit code generator is better adapted to Windows than to linux, but its generated C code is horrible for Windows.
I guess I will crank up a little package using criterion and producing two executables to make sure anyone who run the benchmark use the same options, and ease the whole process.
Thanks, Thu

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 9/15/10 06:41 , Daniel Fischer wrote:
play a role. Since ghc fares pretty well on 64-bit linux (David Terei and Don), both, via the NCG and via C, it seems its 64-bit code generator is better than its 32-bit code generator :(
amd64 has more registers; IIRC ghc needs a *lot* of registers to play with to perform well (TBH *everything* does), so x86 is always going to be behind the curve. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkySRUMACgkQIn7hlCsL25XG5QCfWOqBnm/jwFXM9L5cDU33yJiO JYQAoLnytUMIZ2XHuWKcnin7fEWJk66u =0FBK -----END PGP SIGNATURE-----
participants (7)
-
Andrew Coppin
-
Brandon S Allbery KF8NH
-
Daniel Fischer
-
David Terei
-
David Virebayre
-
Jason Dagit
-
Vo Minh Thu