
xj2106:
Don Stewart
writes: Can you start by retrying with flags from the spectral-norm benchmark:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0
The interaction with gcc here is quite important, so forcing -fvia-C will matter.
Clearly things has been changed, since the release of ghc-6.8.1. I tried them with my laptop, and here are the results of N=3000.
C++ g++ =======
real 0m4.553s user 0m4.551s sys 0m0.002s
changed one option: -march=nocona
Haskell GHC ===========
real 0m34.392s user 0m34.316s sys 0m0.074s
I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'.
Using unsafePerformIO here would break some crucial inlining. (the same trick is used in Data.ByteString, by the way). You can find inlinePerformIO is in Data.ByteString.Internal. Comparing the two, n=5500, ghc 6.8: $ ghc -O -fglasgow-exts -fbang-patterns -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math spec.hs -o spec_hs --make With inlinePerformIO: $ time ./spec_hs 5500 1.274224153 ./spec_hs 5500 26.32s user 0.00s system 99% cpu 26.406 total As expected, and comparable to the shooutout result for the same N. With unsafePerformIO, the whole thing falls apart: $ time ./spec_hs 5500 ^Cspec_hs: interrupted ./spec_hs 5500 124.86s user 0.11s system 99% cpu 2:05.04 total I gave up after 2 minutes. This FFI peek/poke code, acting as an ST monad, under a pure interface relies on inlinePerformIO. And the C++ program, just for comparison: $ g++ -c -pipe -O3 -fomit-frame-pointer -march=pentium4 -mfpmath=sse -msse2 spec.c $ g++ spec.o -o spec-cpp $ time ./spec-cpp 5500 1.274224153 ./spec-cpp 5500 18.81s user 0.00s system 99% cpu 18.816 total So we remain competitive after changing to 6.8. Again, low level array code optimised is within 2x optimised C/C++. -- Don