
Hello Don, Friday, February 20, 2009, 7:41:33 PM, you wrote:
main = print $ sum[1..10^9::Int]
This won't be comparable to your loop below, as 'sum' is a left fold (which doesn't fuse under build/foldr).
You should use the list implementation from the stream-fusion package (or uvector) if you're expecting it to fuse to the following loop:
it was comparison of native haskell, low-level haskell (which is harder to write than native C) and native C. stream-fusion and any other packages provides libraries for some tasks but they can't make faster maps, for example. so i used plain list
Which seems ... OK.
really? :D
Well, that's a bit different. It doesn't print the result, and it returns a different results on 64 bit....
doesn't matter for testing speed
I don't get anything near the 0.062s which is interesting.
it was beautiful gcc optimization - it added 8 values at once. with xor results are: xor.hs 12.605 xor-fast.hs 1.856 xor.cpp 0.339
The print statement slows things down, I guess...
are you really believe that printing one number needs so much time? :)
So we have:
ghc -fvia-C -O2 1.127 ghc -fasm 1.677 gcc -O0 4.500 gcc -O3 -funroll-loops 0.318
why not compare to ghc -O0? also you can disable loop unrolling in gcc and unroll loops manually in haskell. or you can generate asm code on the fly. there are plenty of tricks to "prove" that gcc generates bad code :D
So. some lessons. GHC is around 3-4x slower on this tight loop. (Which isn't as bad as it used to be).
really? what i see: low-level haskell code is usually 3 times harder to write and 3 times slower than gcc code. native haskell code is tens to thousands times slower than C code (just recall that real programs use type classes and monads in addition to laziness)
That's actually a worse margin than any current shootout program, where we are no worse than 2.9 slower on larger things:
1) most benchmarks there depend on libraries speed. in one test, for example, php is winner 2) for the sum program ghc libs was modified to win in benchmark 3) the remaining 1 or 2 programs that measure speed of ghc-generated code was hardly optimized using low-level code, so they don't have anything common with real haskell code most of us write every day
Now, given GHC gets most of the way there -- I think this might make a good bug report against GHC head, so we can see if the new register allocator helps any.
you mean that 6.11 includes new allocator? in that case you can test it too i believe that ghc developers are able to test sum performance without my bugreports :D -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com