Haskell performance question

newer
[Haskell-cafe] Letting the darcs...

Dan Piponi

8 Nov 2007 8 Nov '07

6:33 p.m.

I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case: I have this C program: #include #define n 100000000 double a[n]; int main() { int i; for (i = 0; i do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work? -- Dan

Show replies by date

Bulat Ziganshin

8 Nov 8 Nov

6:36 p.m.

Hello Dan, Thursday, November 8, 2007, 9:33:12 PM, you wrote:

...

main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations: forM_ [0..n-2] $ \i -> do { return $! i; x <- readArray a i; return $! x; y <- readArray a (i+1); return $! y; writeArray a (i+1) (x+y) } such cycle should be approx. 3-10 times slower than C one -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Dan Piponi

7:12 p.m.

Bulat, The strictness gave me something like a 10% performance increase making the Haskell code more than 10 times slower than the C. Is this the right type of array to use for performance? -- Dan On Nov 8, 2007 10:36 AM, Bulat Ziganshin wrote:

...

Hello Dan,

Thursday, November 8, 2007, 9:33:12 PM, you wrote:

...
main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations:

forM_ [0..n-2] $ \i -> do { return $! i; x <- readArray a i; return $! x; y <- readArray a (i+1); return $! y; writeArray a (i+1) (x+y) }

such cycle should be approx. 3-10 times slower than C one

-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin

9:19 p.m.

New subject: Re[2]: Haskell performance question

Hello Dan, Thursday, November 8, 2007, 10:12:04 PM, you wrote:

...

The strictness gave me something like a 10% performance increase making the Haskell code more than 10 times slower than the C. Is this the right type of array to use for performance?

yes. but ghc is especially slow doing FP arithmetics. in my own simple test ( a[i]=b[i]+c[i] ) it's 20x slower than gcc, so your results aren't surprising -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Xiao-Yong Jin

7:41 p.m.

Bulat Ziganshin writes:

...

Hello Dan,

Thursday, November 8, 2007, 9:33:12 PM, you wrote:

...
main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations:

forM_ [0..n-2] $ \i -> do { return $! i; x <- readArray a i; return $! x; y <- readArray a (i+1); return $! y; writeArray a (i+1) (x+y) }

such cycle should be approx. 3-10 times slower than C one

I didn't know you can do that. Anyway, I tried OP's C and Haskell versions and the one with your strictness annotations. It seems that the `$!' thing doesn't do much, or it's not doing anything. OP's C code: $ time ./test 100000000.000000 real 0m1.128s user 0m0.572s sys 0m0.555s OP's Haskell code with n=1e8: $ time ./test-ghc 1.0e8 real 0m2.065s user 0m1.505s sys 0m0.558s OP's Haskell code with n=1e8 and your strictness annotations: $ time ./test-ghc 1.0e8 real 0m2.064s user 0m1.472s sys 0m0.591s However, bad performance observed by OP is not observed by me. $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.1 $ gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- c/* __o/* <\ * (__ */\ <

Bulat Ziganshin

9:25 p.m.

New subject: Re[2]: Haskell performance question

Hello Xiao-Yong, Thursday, November 8, 2007, 10:41:11 PM, you wrote:

...

...
forM_ [0..n-2] $ \i -> do { return $! i; x <- readArray a i; return $! x; y <- readArray a (i+1); return $! y; writeArray a (i+1) (x+y) }

such cycle should be approx. 3-10 times slower than C one

...

I didn't know you can do that. Anyway, I tried OP's C and Haskell versions and the one with your strictness annotations. It seems that the `$!' thing doesn't do much, or it's not doing anything.

i don't said that it will help, just that it may help. to know exactly, one should either understand ghc strictness analyzer, or check asm code -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Mikhail Gusarov

6:51 p.m.

"Dan Piponi" writes:

...

Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail.

import Data.List main = do print $ foldl' (+) 0 $ take 100000000 [1.0,1.0..] works 10 times faster than your C version. You just need to adapt to the radically different style of programming. Real Fortran programmer may write Fortran programs on any programming language :) -- JID: dottedmag@jabber.dottedmag.net

Thomas Schilling

7:03 p.m.

On Fri, 2007-11-09 at 00:51 +0600, Mikhail Gusarov wrote:

...

"Dan Piponi" writes:

...
Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail.

import Data.List

main = do print $ foldl' (+) 0 $ take 100000000 [1.0,1.0..]

works 10 times faster than your C version. You just need to adapt to the radically different style of programming.

Real Fortran programmer may write Fortran programs on any programming language :)

Good point, but this is not true for the OP. Take a look at Dan's blog. It's linked on planet.haskell.org. BTW. I prefer: print 10000000.0 ;)

Dan Piponi

7:28 p.m.

Mikhail,

...

main = do print $ foldl' (+) 0 $ take 100000000 [1.0,1.0..]

works 10 times faster than your C version. You just need to adapt to the radically different style of programming.

My wasn't intended to represent the problem that I'm trying to solve, but the approach I want to take. The problems that I do want to solve don't lend themselves to this kind of approach. My real situation is that I want to write code that has both a high-level component and a low-level number-crunching component that works on large dense and sparse arrays. Idiomatic Haskell is great for the high-level component. But the question is whether or not its worth trying to write the low-level code in Haskell or whether I should just write that code in C and use the FFI. There are still advantages to using Haskell even if the code is highly unidiomatic: you have the freedom of throwing in higher order functions from time to time in the low-level code, and writing mixed-language code is a pain. -- Dan

Xiao-Yong Jin

8:03 p.m.

"Dan Piponi" writes:

...

My wasn't intended to represent the problem that I'm trying to solve, but the approach I want to take. The problems that I do want to solve don't lend themselves to this kind of approach.

My real situation is that I want to write code that has both a high-level component and a low-level number-crunching component that works on large dense and sparse arrays. Idiomatic Haskell is great for the high-level component. But the question is whether or not its worth trying to write the low-level code in Haskell or whether I should just write that code in C and use the FFI.

I've written a bit number-crunching code in haskell. My experience is that usually the haskell code runs about 2 to 5 times slower -- It depends on whether I use the best optimisation scheme or not -- I do not know if I am doing the correct thing most of the time, since there is not much literature on-line about numerical computing in haskell, or in FP in general. There is one in ocaml, though.

...

There are still advantages to using Haskell even if the code is highly unidiomatic: you have the freedom of throwing in higher order functions from time to time in the low-level code, and writing mixed-language code is a pain. -- Dan

I fully agree. Xiao-Yong -- c/* __o/* <\ * (__ */\ <

Jason Dusek

7:34 p.m.

Can you show us your compilation options and timings? -- _jsn

Dan Piponi

7:48 p.m.

On Nov 8, 2007 11:34 AM, Jason Dusek wrote:

...

Can you show us your compilation options and timings?

I was simply using -O3. I tried a bunch of other flags (copied from the shootout examples) but they made no appreciable difference. I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000. I now see that my ghc is in fact an i386 build whose binary was downloaded from the ghc binary web page. So I'm going to try to make a better comparison with a 64 bit 6.8.1. The batteries on my laptop will probably drop to zero before then however, so it'll be tomorrow at the earliest. -- Dan

Don Stewart

8:16 p.m.

dpiponi:

...

On Nov 8, 2007 11:34 AM, Jason Dusek wrote:

...
Can you show us your compilation options and timings?

I was simply using -O3. I tried a bunch of other flags (copied from the shootout examples) but they made no appreciable difference.

Argh, -O2 please. -O3 does nothing more, and sometime does less, depending on the compiler :) Double math always needs the gcc backend, and -optc flags (see the spectral-norm post).

...

I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000.

I'm sure we can do better than that!

...

I now see that my ghc is in fact an i386 build whose binary was downloaded from the ghc binary web page. So I'm going to try to make a better comparison with a 64 bit 6.8.1. The batteries on my laptop will probably drop to zero before then however, so it'll be tomorrow at the earliest.

If you can post the code somewhere, that would be great, with examples of how to reproduce your timings. -- don

Dan Piponi

8:34 p.m.

On Nov 8, 2007 12:16 PM, Don Stewart wrote:

...

If you can post the code somewhere, that would be great, with examples of how to reproduce your timings.

The code is exactly what I posted originally (but nore that n is 10 times larger in the C code). I compiled using ghc -O3 -o test test.hs. I timed the resulting executable with unix 'time'. I think I used the tiger Intel binary here: http://www.haskell.org/ghc/download_ghc_661.html (ghc --version reports only "The Glorious...verison 6.6.1", file ~/lib/ghc-6.6.1/ghc-6.6.1 reports "Mach-O executable i386".) This was all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything else you need? I was investigating the plausibility of implementing something like a fluid dynamics solver completely in Haskell. I have some working C code so I thought I'd port it. But only if there's a chance of it being within a factor of 2 or 3 of the original speed. -- Dan

Don Stewart

8:36 p.m.

dpiponi:

...

On Nov 8, 2007 12:16 PM, Don Stewart wrote:

...
If you can post the code somewhere, that would be great, with examples of how to reproduce your timings.

The code is exactly what I posted originally (but nore that n is 10 times larger in the C code). I compiled using ghc -O3 -o test test.hs. I timed the resulting executable with unix 'time'. I think I used the tiger Intel binary here: http://www.haskell.org/ghc/download_ghc_661.html (ghc --version reports only "The Glorious...verison 6.6.1", file ~/lib/ghc-6.6.1/ghc-6.6.1 reports "Mach-O executable i386".) This was all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything else you need?

I was investigating the plausibility of implementing something like a fluid dynamics solver completely in Haskell. I have some working C code so I thought I'd port it. But only if there's a chance of it being within a factor of 2 or 3 of the original speed.

Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0 The interaction with gcc here is quite important, so forcing -fvia-C will matter. -- Don

Dan Piponi

9 p.m.

On Nov 8, 2007 12:36 PM, Don Stewart wrote:

...

dpiponi: Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0

Actually, that was my starting point for investigating how to optimise Haskell numerical code and I initially used all of those flags. -fvia-C isn't listed among those flags but I also tried adding it anyway. 1.45s every time, they make no difference. (And I made sure I touched the source each time to force recompilation.) It looks like my whole question might become moot with ghc 6.8.1, but so far I've been unable to build it due to the cyclic happy dependency. -- Dan

Duncan Coutts

10:48 p.m.

On Thu, 2007-11-08 at 13:00 -0800, Dan Piponi wrote:

...

It looks like my whole question might become moot with ghc 6.8.1, but so far I've been unable to build it due to the cyclic happy dependency.

You really do not need happy to build ghc. Just ignore the extralibs tarball. You can install any of those libs that you need later from hackage. Duncan

Dan Piponi

9 Nov 9 Nov

12:58 a.m.

On Nov 8, 2007 2:48 PM, Duncan Coutts wrote:

...

You really do not need happy to build ghc. Just ignore the extralibs tarball.

Well that was the crucial fact I needed. 6.8.1 is now built. ghci doesn't work, it complains about an unknown symbol '_environ' in HSbase-3.0.0.0.o but I was able to rerun the timings for my code. WIth -O2 the run time went from about 1.5s to 0.2s. With unsafeRead and unsafeWrite that becomes 0.16s. I can start thinking about writing lots of Fortran in Haskell now. (Anyone have any idea what that missing symbol is about?) -- Dan

Bulat Ziganshin

6:28 a.m.

New subject: Re[2]: Haskell performance question

Hello Dan, Friday, November 9, 2007, 3:58:42 AM, you wrote:

...

HSbase-3.0.0.0.o but I was able to rerun the timings for my code. WIth -O2 the run time went from about 1.5s to 0.2s. With unsafeRead and unsafeWrite that becomes 0.16s.

cool! seems that small loops now are runs on registers, in 6.6 each step of such loop was about 50 instructions long, fetching everything from memory just for curiosity, can you try to manually unroll loop and see results? btw, this still doesn't mean that ghc can be used for numeric-intensive code - it should be tried on larger code blocks. but definitely, it's a whole new era in low-level ghc programming -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Don Stewart

6:54 a.m.

bulat.ziganshin:

...

definitely, it's a whole new era in low-level ghc programming

victory! -- Don :D

Derek Elkins

2:11 p.m.

On Thu, 2007-11-08 at 22:54 -0800, Don Stewart wrote:

...

bulat.ziganshin:

...
definitely, it's a whole new era in low-level ghc programming

victory!

Now I want a way of getting (well-used) SIMD instructions and such, and with some luck some high-level approach as well.

Dan Piponi

7:09 p.m.

New subject: Re[2]: Haskell performance question

On Nov 8, 2007 10:28 PM, Bulat Ziganshin wrote:

...

just for curiosity, can you try to manually unroll loop and see results?

I don't get any noticeable performance change when I unroll the loop by 2 by hand. I suspect that on a modern CPU loop unrolling probably isn't that much of a win. I'm more interested in strength reduction, especially when computing things like indices into multidimensional arrays. Anyone know how well ghc performs in this area? -- Dan

Joel Reymont

10:47 a.m.

New subject: Missing _environ

Dan, On Nov 9, 2007, at 12:58 AM, Dan Piponi wrote:

...

Well that was the crucial fact I needed. 6.8.1 is now built. ghci doesn't work, it complains about an unknown symbol '_environ' in HSbase-3.0.0.0.o

The installation process strips the binaries which strips away _environ. You need to either find the place where "install-sh -s" is invoked and remove -s or copy ghc from compiler/stage2 into your installation directory. I think this should work. Thanks, Joel -- http://wagerlabs.com

Xiao-Yong Jin

8 Nov 8 Nov

10:23 p.m.

Don Stewart writes:

...

Can you start by retrying with flags from the spectral-norm benchmark:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0

The interaction with gcc here is quite important, so forcing -fvia-C will matter.

Clearly things has been changed, since the release of ghc-6.8.1. I tried them with my laptop, and here are the results of N=3000. C++ g++ ======= real 0m4.553s user 0m4.551s sys 0m0.002s changed one option: -march=nocona Haskell GHC =========== real 0m34.392s user 0m34.316s sys 0m0.074s I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'. Haskell GHC #2 ============== real 0m20.069s user 0m20.052s sys 0m0.017s The same label `Haskell GHC #2' on that web page. This is the one without any optimisation. And it out performs the previous heavily optimised one. Is it because I used `unsafePerformIO' instead of the original `inlinePerformIO'? `-optc-march' is also changed to `nocona'. Is it only my laptop, or ghc-6.8.1 is indeed much faster than 6.6.1 and the internal has changed so much that the old optimisation no longer works? $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.1 $ gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz Xiao-Yong -- c/* __o/* <\ * (__ */\ <

Don Stewart

10:44 p.m.

xj2106:

...

Don Stewart writes:

...
Can you start by retrying with flags from the spectral-norm benchmark:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0

The interaction with gcc here is quite important, so forcing -fvia-C will matter.

Clearly things has been changed, since the release of ghc-6.8.1. I tried them with my laptop, and here are the results of N=3000.

C++ g++ =======

real 0m4.553s user 0m4.551s sys 0m0.002s

changed one option: -march=nocona

Haskell GHC ===========

real 0m34.392s user 0m34.316s sys 0m0.074s

I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'.

Using unsafePerformIO here would break some crucial inlining. (the same trick is used in Data.ByteString, by the way). You can find inlinePerformIO is in Data.ByteString.Internal. Comparing the two, n=5500, ghc 6.8: $ ghc -O -fglasgow-exts -fbang-patterns -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math spec.hs -o spec_hs --make With inlinePerformIO: $ time ./spec_hs 5500 1.274224153 ./spec_hs 5500 26.32s user 0.00s system 99% cpu 26.406 total As expected, and comparable to the shooutout result for the same N. With unsafePerformIO, the whole thing falls apart: $ time ./spec_hs 5500 ^Cspec_hs: interrupted ./spec_hs 5500 124.86s user 0.11s system 99% cpu 2:05.04 total I gave up after 2 minutes. This FFI peek/poke code, acting as an ST monad, under a pure interface relies on inlinePerformIO. And the C++ program, just for comparison: $ g++ -c -pipe -O3 -fomit-frame-pointer -march=pentium4 -mfpmath=sse -msse2 spec.c $ g++ spec.o -o spec-cpp $ time ./spec-cpp 5500 1.274224153 ./spec-cpp 5500 18.81s user 0.00s system 99% cpu 18.816 total So we remain competitive after changing to 6.8. Again, low level array code optimised is within 2x optimised C/C++. -- Don

Xiao-Yong Jin

11:07 p.m.

Don Stewart writes:

...

xj2106:

...
I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'.

Using unsafePerformIO here would break some crucial inlining. (the same trick is used in Data.ByteString, by the way).

You can find inlinePerformIO is in Data.ByteString.Internal.

Comparing the two, n=5500, ghc 6.8:

$ ghc -O -fglasgow-exts -fbang-patterns -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math spec.hs -o spec_hs --make

With inlinePerformIO:

$ time ./spec_hs 5500 1.274224153 ./spec_hs 5500 26.32s user 0.00s system 99% cpu 26.406 total

As expected, and comparable to the shooutout result for the same N. With unsafePerformIO, the whole thing falls apart:

$ time ./spec_hs 5500 ^Cspec_hs: interrupted ./spec_hs 5500 124.86s user 0.11s system 99% cpu 2:05.04 total

I gave up after 2 minutes. This FFI peek/poke code, acting as an ST monad, under a pure interface relies on inlinePerformIO.

Thanks for pointing this out. Xiao-Yong -- c/* __o/* <\ * (__ */\ <

Andrew Coppin

10:14 p.m.

New subject: MD5? (was: Haskell performance question)

Don Stewart wrote:

...

dpiponi:

...
I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000.

I'm sure we can do better than that!

That's the spirit! :-D Speaking of which [yes, I'm going to totally hijack this thread now...], does anybody have a Haskell MD5 hash implementation that goes fast? IIRC, I found one in MissingH, and it worked great. Except that as soon as you feed it a 10 MB file, the standard Unix "md5sum" executable takes about 0.001s to do it, and the Haskell version goes crazy and starts eating virtual memory like candy. o_O (Although given a few minutes it *does* produce the correct answer. But given that I want to run it over an entire CD......) Given the choise, I'd *like* to find a fast 100% Haskell implementation - but failing that, (nice) bindings to a fast C implementation will do I guess. (I *only* need to compute MD5 hashes for files on disk. I don't need to do anything more fancy than that...)

Don Stewart

11:12 p.m.

New subject: MD5? (was: Haskell performance question)

andrewcoppin:

...

Don Stewart wrote:

...
dpiponi:

...
I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000.

I'm sure we can do better than that!

That's the spirit! :-D

Speaking of which [yes, I'm going to totally hijack this thread now...], does anybody have a Haskell MD5 hash implementation that goes fast? IIRC, I found one in MissingH, and it worked great. Except that as soon as you feed it a 10 MB file, the standard Unix "md5sum" executable takes about 0.001s to do it, and the Haskell version goes crazy and starts eating virtual memory like candy. o_O (Although given a few minutes it *does* produce the correct answer. But given that I want to run it over an entire CD......)

Given the choise, I'd *like* to find a fast 100% Haskell implementation - but failing that, (nice) bindings to a fast C implementation will do I guess. (I *only* need to compute MD5 hashes for files on disk. I don't need to do anything more fancy than that...)

Start with a fast C version, and translate that into code over ByteStrings. If its not within 2x, call the bytestring hackers hotline, which is on the wiki. -- Don

Andrew Coppin

9 Nov 9 Nov

7:39 p.m.

New subject: MD5?

Don Stewart wrote:

...

andrewcoppin:

...
Speaking of which [yes, I'm going to totally hijack this thread now...], does anybody have a Haskell MD5 hash implementation that goes fast?

Start with a fast C version, and translate that into code over ByteStrings. If its not within 2x, call the bytestring hackers hotline, which is on the wiki.

I'm liking your attitude more and more. :-D

Thomas M. DuBuisson

8 Nov 8 Nov

11:14 p.m.

New subject: MD5? (was: Haskell performance question)

Glad you asked! http://sequence.complete.org/node/367 I just posted that last night! Once I get a a community.haskell.org login I will put the code on darcs. The short of it it: 1) The code is still ugly, I haven't been modivated to clean. 2) Manually unrolled, it is ~ 6 times slower than C 3) When Rolled it is still much slower than that 4) There is some optimizer bug in GHC - this code could be 2x faster, I feel certain. 5) I benchmarked using a 200MB file, so I think it will handle whatever. Thomas DuBuisson On Thu, 2007-11-08 at 22:14 +0000, Andrew Coppin wrote:

...

Don Stewart wrote:

...
dpiponi:

...
I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000.

I'm sure we can do better than that!

That's the spirit! :-D

Speaking of which [yes, I'm going to totally hijack this thread now...], does anybody have a Haskell MD5 hash implementation that goes fast? IIRC, I found one in MissingH, and it worked great. Except that as soon as you feed it a 10 MB file, the standard Unix "md5sum" executable takes about 0.001s to do it, and the Haskell version goes crazy and starts eating virtual memory like candy. o_O (Although given a few minutes it *does* produce the correct answer. But given that I want to run it over an entire CD......)

Given the choise, I'd *like* to find a fast 100% Haskell implementation - but failing that, (nice) bindings to a fast C implementation will do I guess. (I *only* need to compute MD5 hashes for files on disk. I don't need to do anything more fancy than that...)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- "The philosophy behind your actions should never change, on the other hand, the practicality of them is never constant." - Thomas Main DuBuisson

Don Stewart

11:37 p.m.

New subject: MD5? (was: Haskell performance question)

thomas.dubuisson:

...

Glad you asked!

http://sequence.complete.org/node/367

I just posted that last night! Once I get a a community.haskell.org login I will put the code on darcs.

Cool. I'll look at this. You might like to test against, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/nano-md5-0.1 which is a strict bytestring openssl binding. -- Don

Andrew Coppin

9 Nov 9 Nov

7:47 p.m.

New subject: MD5?

Don Stewart wrote:

...

You might like to test against,

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/nano-md5-0.1

which is a strict bytestring openssl binding.

Is that likely to work on Windows? Much as I'd love a 100% Haskell implementation [that goes fast], I do also have a certain amount of time pressure here. (I.e., I'm attempting to use Haskell "for work purposes". ;-)

Don Stewart

7:54 p.m.

New subject: MD5?

andrewcoppin:

...

Don Stewart wrote:

...
You might like to test against,

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/nano-md5-0.1

which is a strict bytestring openssl binding.

Is that likely to work on Windows?

Much as I'd love a 100% Haskell implementation [that goes fast], I do also have a certain amount of time pressure here. (I.e., I'm attempting to use Haskell "for work purposes". ;-)

do you have the OpenSSL library on windows?

Andrew Coppin

8 p.m.

New subject: MD5?

Don Stewart wrote:

...

andrewcoppin:

...
Don Stewart wrote:

...
You might like to test against,

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/nano-md5-0.1

which is a strict bytestring openssl binding.

Is that likely to work on Windows?

Much as I'd love a 100% Haskell implementation [that goes fast], I do also have a certain amount of time pressure here. (I.e., I'm attempting to use Haskell "for work purposes". ;-)

do you have the OpenSSL library on windows?

Ah... no, but I understand it has been made to work. (E.g., PuTTY uses it AFAIK, and that works fine here.) Don't know how easy it is to get hold of source... Alternatively, would it be simpler for me to track down the source for the Unix "md5sum" program and do some FFI on that? (Bearing in mind, I don't know how FFI works and I don't know C.) The final alternative is that I just call MD5SUM.EXE from my Haskell program and try to parse the output. But that strikes me as rather messy.

Neil Mitchell

8:05 p.m.

New subject: MD5?

...

The final alternative is that I just call MD5SUM.EXE from my Haskell program and try to parse the output. But that strikes me as rather messy.

Messy, but I don't see any disadvantage to doing it this way - if you can control that the MD5SUM program is installed alongside your code. Of course, there is the standard Crypto library for Haskell, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Crypto-3.0.3 - either: 1) it goes fast enough 2) it goes too slow and someone should make it go faster Either way, it should go fast enough as soon as someone needs it to go faster. Thanks Neil

Andrew Coppin

9:05 p.m.

New subject: MD5?

Neil Mitchell wrote:

...

Hi

...
The final alternative is that I just call MD5SUM.EXE from my Haskell program and try to parse the output. But that strikes me as rather messy.

Messy, but I don't see any disadvantage to doing it this way - if you can control that the MD5SUM program is installed alongside your code.

Ooo... it's all coming back to me now... The MD5SUM.EXE file I have chokes if you ask it to hash a file in another directory. It will hash from stdin, or from a file in the current directory, but point-blank refuses to hash anything else. So I'd have to write my Haskell code to open the file I want and *pipe* it to stdin on MD5SUM.EXE (making sure to not translate line ends or anything else that will alter the hash code). Then I have to parse the output and change it to actually match the UNIX md5sum program. [Because I want to make it compatible with that.] That also involves some fun with line ends...

...

Of course, there is the standard Crypto library for Haskell, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Crypto-3.0.3 - either:

1) it goes fast enough 2) it goes too slow and someone should make it go faster

Either way, it should go fast enough as soon as someone needs it to go faster.

I see. Well, I'll see if I can get that working, and then we'll see about making it faster! :-D

Neil Mitchell

9:42 p.m.

New subject: MD5?

...

The MD5SUM.EXE file I have chokes if you ask it to hash a file in another directory. It will hash from stdin, or from a file in the current directory, but point-blank refuses to hash anything else.

Try http://www.cs.york.ac.uk/fp/yhc/dependencies/UnxUtils.zip - that has an MD5SUM program in it that seems to work fine on things in different directories. It also has many other great utilities in it. I'm trying to imagine what mistake the authors of your version of MD5SUM must have made to screw up files in different directories, but it eludes me... Thanks Neil

Andrew Coppin

17 Nov 17 Nov

2:45 p.m.

New subject: MD5?

Neil Mitchell wrote:

...

Hi

...
The MD5SUM.EXE file I have chokes if you ask it to hash a file in another directory. It will hash from stdin, or from a file in the current directory, but point-blank refuses to hash anything else.

Try http://www.cs.york.ac.uk/fp/yhc/dependencies/UnxUtils.zip - that has an MD5SUM program in it that seems to work fine on things in different directories. It also has many other great utilities in it.

Negative. It gives strange output if the pathname contains any backslashes. (Each backslash appears twice, and an additional backslash appears just before the hash value. Very odd...) I spent a while playing with Google, and found many, many implementations of MD5. Every single one of them did *something* strange under certain conditions. Most frustrating! Well anyway, I eventually settled on a program MD5DEEP.EXE, which seems to work just about well enough to be useful.

...

I'm trying to imagine what mistake the authors of your version of MD5SUM must have made to screw up files in different directories, but it eludes me...

It seems typically Unix tools are compiled for Windows with the aid of a Unix emulator. These often do all sorts of strange path munging to make Windows look like Unix. That's probably the source of the problem... BTW, while I'm here... I sat down and wrote my own MD5 implementation. It's now 95% working. (The padding algorithm goes wrong for certain message lengths.) I doubt it'll ever be fast, but I wanted to see how hard it would be to implement. The hard part, ridiculously enough, wasn't MD5 itself. It's all the datatype conversions. Nowhere in the Haskell libraries can I find any of these functions: pack8into16 :: [Word8] -> Word16 pack8into32 :: [Word8] -> Word32 unpack16into8 :: Word16 -> [Word8] unpack32into8 :: Word32 -> [Word8] pack8into16s :: [Word8] -> [Word16] pack8into32s :: [Word8] -> [Word32] etc. I had to write all these myself, by hand, and then check that I got everything the right way round and so forth. (And every now and then I find an edge case where these functions go wrong.) Of course, on top of that, MD5 uses something really stupid called "little endian integers". In other words, to interpret the data, you have to read it partially backwards, partially forwards. Really awkward to get right! But, after a few hours last night and a few more this morning, I was able to get the main program to work properly. If I can just straighten out the message padding code, I'll be all set... Then I can see about measuring just how slow it is. :-} Most amusing moment: Trying to run the GHC debugger, and then realising that you have to actually install the new version of GHC first...

Bulat Ziganshin

3:50 p.m.

New subject: Re[2]: MD5?

Hello Andrew, Saturday, November 17, 2007, 5:45:29 PM, you wrote:

...

wasn't MD5 itself. It's all the datatype conversions. Nowhere in the Haskell libraries can I find any of these functions:

...

I had to write all these myself, by hand, and then check that I got

it's a good case for making useful library and put it to hackage ;) you may wonder, but there is no paid army of haskellers which wrote all the libs you required. everything you've seen are written by enthusiasts just because they need it for their work while working on my own program, i've made bindings to aes, twofish, sha512, pkcs5, fortune and lot of other C libs -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Andrew Coppin

4:13 p.m.

New subject: MD5?

Bulat Ziganshin wrote:

...

Hello Andrew,

Saturday, November 17, 2007, 5:45:29 PM, you wrote:

...
wasn't MD5 itself. It's all the datatype conversions. Nowhere in the Haskell libraries can I find any of these functions:

...
I had to write all these myself, by hand, and then check that I got

it's a good case for making useful library and put it to hackage ;)

you may wonder, but there is no paid army of haskellers which wrote all the libs you required. everything you've seen are written by enthusiasts just because they need it for their work

Just looked like something that would be frequently required, but anyway... Out of curiosity, what's hackage, and how do you put stuff on it?

...

while working on my own program, i've made bindings to aes, twofish, sha512, pkcs5, fortune and lot of other C libs

I don't know C, so I can't really write bindings. (I also don't have a C compiler for that matter... Presumably I'd have to move to Linux for that.)

Bulat Ziganshin

4:15 p.m.

New subject: Re[2]: MD5?

Hello Andrew, Saturday, November 17, 2007, 7:13:23 PM, you wrote:

...

Out of curiosity, what's hackage, and how do you put stuff on it?

google for "haskell hackage". i never uploaded anything to it, but site should contain instructions

...

...
while working on my own program, i've made bindings to aes, twofish, sha512, pkcs5, fortune and lot of other C libs

...

I don't know C, so I can't really write bindings. (I also don't have a C compiler for that matter... Presumably I'd have to move to Linux for that.)

i don't think that C knowledge is really reqd, but you should understand cpu memory model. there is also "ffi packaging tool" which should generate bindings automatically. C and C++ compilers are shipped as part of GHC/win32 distribution -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Don Stewart

7:13 p.m.

New subject: MD5?

bulat.ziganshin:

...

Hello Andrew,

Saturday, November 17, 2007, 5:45:29 PM, you wrote:

...
wasn't MD5 itself. It's all the datatype conversions. Nowhere in the Haskell libraries can I find any of these functions:

...
I had to write all these myself, by hand, and then check that I got

it's a good case for making useful library and put it to hackage ;)

you may wonder, but there is no paid army of haskellers which wrote all the libs you required. everything you've seen are written by enthusiasts just because they need it for their work

while working on my own program, i've made bindings to aes, twofish, sha512, pkcs5, fortune and lot of other C libs

Can you upload them to hackage Bulat? Under the Crypto category. Even just the raw FFI decls will be useful. -- Don

Thomas DuBuisson

4:27 p.m.

New subject: MD5?

...

BTW, while I'm here... I sat down and wrote my own MD5 implementation.

How is the performance on this new MD5 routine? It looks like we have gone from just one Haskell MD5 implementation (that I know of) to three in a short period of time. This isn't counting the C bindings, of coarse. Also, I changed the license of my implementation to BSD3 a bit ago, so you can use that pretty much as you please.

Andrew Coppin

4:40 p.m.

New subject: MD5?

Thomas DuBuisson wrote:

...

...
BTW, while I'm here... I sat down and wrote my own MD5 implementation.

How is the performance on this new MD5 routine?

Ask me *after* I modify it to give the correct answers. ;-) Interesting question: How do you determine when an implementation of something as complex as MD5 is actually "correct"? I might get it so it passes all the tests I've tried, but there's some obscure edge case that makes it fail. How would you know that? Hmm, in fact... how do I know the implementation(s) in checking my program *against* are correct? Oh noes! I'm becoming a paranoid cryptographer! LOL.

...

It looks like we have gone from just one Haskell MD5 implementation (that I know of) to three in a short period of time. This isn't counting the C bindings, of coarse.

Also, I changed the license of my implementation to BSD3 a bit ago, so you can use that pretty much as you please.

Yeah, there seem to be a few different MD5 implementations floating around. As far as I know, mine is unique in that it's 100% Haskell and requires nothing aside from the libraries shipping with GHC in order to compile. (E.g., I downloaded somebody else's, and it just wouldn't compile. It was looking for modules that don't exist.)

Derek Elkins

6:59 p.m.

New subject: MD5?

On Sat, 2007-11-17 at 16:40 +0000, Andrew Coppin wrote:

...

Thomas DuBuisson wrote:

...
...
BTW, while I'm here... I sat down and wrote my own MD5 implementation.

How is the performance on this new MD5 routine?

Ask me *after* I modify it to give the correct answers. ;-)

Interesting question: How do you determine when an implementation of something as complex as MD5 is actually "correct"? I might get it so it passes all the tests I've tried, but there's some obscure edge case that makes it fail. How would you know that? Hmm, in fact... how do I know the implementation(s) in checking my program *against* are correct?

It's a small enough program that it should not be that hard to prove that it does the same thing as the reference implementation (although, that one may be broken...)

Thomas DuBuisson

18 Nov 18 Nov

2:55 a.m.

New subject: MD5?

On Nov 17, 2007 11:40 AM, Andrew Coppin wrote:

...

As far as I know, mine is unique in that it's 100% Haskell and requires nothing aside from the libraries shipping with GHC in order to compile. (E.g., I downloaded somebody else's, and it just wouldn't compile. It was looking for modules that don't exist.)

Mines also a pure Haskell MD5 routine, though you do need Data.Binary (which could be eliminated with minor effort). Let me know if that "sombody else's" MD5 routine that wouldn't compile was mine and what the issue was - I'd be happy to fix it up a little if it would help.

Andrew Coppin

17 Nov 17 Nov

5:51 p.m.

New subject: My MD5

Thomas DuBuisson wrote:

...

...
BTW, while I'm here... I sat down and wrote my own MD5 implementation.

Huzzah! It works! :-D I had a silly bug where somewhere deep in the heart of the huge complex message padding algorithm, I forgot to add on the cumulative total to the message size count. This results in the number of bits in the *final* block being used, rather than the number of bits in the entire message. Oops!

...

How is the performance on this new MD5 routine?

Not good. (Surprised?) I told it to hash a 1 MB file, and there was a noticable split-second pause. I told it to hash a 400 MB file, and... well, after about 1 minute wall time and 200 MB RAM I killed the process. RAM usage seems to grow linearly with time, which is Not Good(tm). To me, this suggests that something somewhere isn't as lazy as it should be. Hmm. Well whatever, next stop is the GHC profiler to see where my resources are going. Given the above bug, it's probably something easily fixable...

Aaron Denney

8:11 p.m.

New subject: MD5?

On 2007-11-17, Andrew Coppin wrote:

...

pack8into16 :: [Word8] -> Word16 pack8into32 :: [Word8] -> Word32 unpack16into8 :: Word16 -> [Word8] unpack32into8 :: Word32 -> [Word8] pack8into16s :: [Word8] -> [Word16] pack8into32s :: [Word8] -> [Word32] etc.

I had to write all these myself, by hand, and then check that I got everything the right way round and so forth. (And every now and then I find an edge case where these functions go wrong.)

Well, you know, some of these are really the wrong signatures: pack8into16 :: (Word8, Word8) -> Word16 pack8into32 :: (Word8, Word8, Word8, Word8) -> Word32 unpack16into8 :: Word16 -> (Word8, Word8) unpack32into8 :: Word32 -> (Word8, Word8, Word8, Word8) curry the above to taste. -- Aaron Denney -><-

Andrew Coppin

9:20 p.m.

New subject: Binary [was MD5]

Aaron Denney wrote:

...

On 2007-11-17, Andrew Coppin wrote:

...
pack8into16 :: [Word8] -> Word16 pack8into32 :: [Word8] -> Word32 unpack16into8 :: Word16 -> [Word8] unpack32into8 :: Word32 -> [Word8] pack8into16s :: [Word8] -> [Word16] pack8into32s :: [Word8] -> [Word32] etc.

I had to write all these myself, by hand, and then check that I got everything the right way round and so forth. (And every now and then I find an edge case where these functions go wrong.)

Well, you know, some of these are really the wrong signatures:

pack8into16 :: (Word8, Word8) -> Word16 pack8into32 :: (Word8, Word8, Word8, Word8) -> Word32 unpack16into8 :: Word16 -> (Word8, Word8) unpack32into8 :: Word32 -> (Word8, Word8, Word8, Word8)

curry the above to taste.

Yeah, but unpack16into8s = concatMap unpack16into8 ;-) Now if you just define some function splitN :: Int -> [x] -> [[x]] (I'm sure we've debated why this isn't defined already...), we can even write pack8into16s = map pack8into16 . splitN 16 And so we continue...

Stefan O'Rear

10 Nov 10 Nov

1:57 a.m.

New subject: MD5?

On Fri, Nov 09, 2007 at 09:05:58PM +0000, Andrew Coppin wrote:

...

The MD5SUM.EXE file I have chokes if you ask it to hash a file in another directory. It will hash from stdin, or from a file in the current directory, but point-blank refuses to hash anything else. So I'd have to write my Haskell code to open the file I want and *pipe* it to stdin on MD5SUM.EXE (making sure to not translate line ends or anything else that will alter the hash code). Then I have to parse the output and change it to actually match the UNIX md5sum program. [Because I want to make it compatible with that.] That also involves some fun with line ends...

I'm drawing a blank on the windows API, but at least in DOS it was possible to redirect standard input from a file without involving your program. Stefan

Dominic Steinitz

8:07 a.m.

New subject: MD5?

Neil Mitchell writes:

...

Hi

...
The final alternative is that I just call MD5SUM.EXE from my Haskell program and try to parse the output. But that strikes me as rather messy.

Messy, but I don't see any disadvantage to doing it this way - if you can control that the MD5SUM program is installed alongside your code.

Of course, there is the standard Crypto library for Haskell, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Crypto-3.0.3 - either:

1) it goes fast enough 2) it goes too slow and someone should make it go faster

Either way, it should go fast enough as soon as someone needs it to go faster.

Thanks

Neil

Of course the code in the Crypto library isn't fast enough. I maintain the package but at the moment I don't have any time to do anything with it. If anyone wants to send me patches, I will happily apply them. I did have a quick look a Thomas DuBuisson's code and it looks like a change to the API. If there is going to be change to this then I think it ought to go via the libraries change process. I recall that someone improved the performance of SHA-1 as well but again I haven't had time to do anything with it. If anyone wants to take over the care and nurture of the Crypto library then they are most welcome and I would be very grateful. Dominic.

Thomas M. DuBuisson

1:50 a.m.

New subject: MD5? (was: Haskell performance question)

I minor changes, fixing up my chunking function (finally) thus eliminating the space leak. Performance is now under 3x that of C! Yay! Also, nano MD5 benched at 1.15x 'C' (for files small enough for strict ByteStrings to do ok). Get the code: darcs get http://code.haskell.org/~tommd/pureMD5 On the 2GB benchmark it is even more competitive (see my blog on sequence.complete). Let me know if you get significantly different results (and you will if you IO doesn't horribly bottle neck you like on my laptop). -Tom

...

You might like to test against,

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/nano-md5-0.1

which is a strict bytestring openssl binding.

-- Don -- "The philosophy behind your actions should never change, on the other hand, the practicality of them is never constant." - Thomas Main DuBuisson

Stefan O'Rear

9 Nov 9 Nov

12:22 a.m.

New subject: MD5? (was: Haskell performance question)

On Thu, Nov 08, 2007 at 06:14:20PM -0500, Thomas M. DuBuisson wrote:

...

Glad you asked!

http://sequence.complete.org/node/367

I just posted that last night! Once I get a a community.haskell.org login I will put the code on darcs.

The short of it it: 1) The code is still ugly, I haven't been modivated to clean. 2) Manually unrolled, it is ~ 6 times slower than C 3) When Rolled it is still much slower than that 4) There is some optimizer bug in GHC - this code could be 2x faster, I feel certain. 5) I benchmarked using a 200MB file, so I think it will handle whatever.

Why did you put yourself through all this pain when you could have just copied the code from md5sum(1), removed the main function, and foreign imported its buffer accumulator wrapping it as a function over lazy bytestrings? We have the best foreign function interface in the world. Reinventing wheels is stupid, especially if the existing wheels are this easy to use. Stefan

Thomas Schilling

8 Nov 8 Nov

6:57 p.m.

On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:

...

I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case:

I have this C program:

#include

#define n 100000000

double a[n];

int main() { int i; for (i = 0; i
And this Haskell program:

import Data.Array.IO import Control.Monad

n = 10000000

main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail.

What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work?

I tried both, but it would be really helpful to see your command line options. I used: $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’ $ ghc --make -O2 ghc-bench.hs and got: $ time ./ghc-bench 2.0e7 real 0m0.714s user 0m0.576s sys 0m0.132s $ time ./ghcbC 20000000.000000 real 0m0.305s user 0m0.164s sys 0m0.132s This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019 $ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Stefan O'Rear

9 Nov 9 Nov

12:24 a.m.

On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:

...

$ ghc --make -O2 ghc-bench.hs

and got:

$ time ./ghc-bench 2.0e7

real 0m0.714s user 0m0.576s sys 0m0.132s

$ time ./ghcbC 20000000.000000

real 0m0.305s user 0m0.164s sys 0m0.132s

This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo

$ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019

$ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’

-O3 is worse than -O0, DO NOT USE IT. I reported the bug several months ago. In the meantime, use -O2. (there are no high optimizations, it's just that the option parser misparses -O3) Stefan

Thomas Schilling

12:39 a.m.

On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote:

...

On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:

...
$ ghc --make -O2 ghc-bench.hs

and got:

$ time ./ghc-bench 2.0e7

real 0m0.714s user 0m0.576s sys 0m0.132s

$ time ./ghcbC 20000000.000000

real 0m0.305s user 0m0.164s sys 0m0.132s

This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo

$ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019

$ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’

-O3 is worse than -O0, DO NOT USE IT.

I reported the bug several months ago. In the meantime, use -O2.

(there are no high optimizations, it's just that the option parser misparses -O3)

Stefan

Even for GCC (/not/ G_H_C)? I know about the GHC bug, that's why I used -O2 for GHC/Haskell and -O3 for GCC/C. /Thomas

Stefan O'Rear

1:03 a.m.

On Fri, Nov 09, 2007 at 01:39:55AM +0100, Thomas Schilling wrote:

...

On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote:

...
On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:

...
$ ghc --make -O2 ghc-bench.hs

...

Even for GCC (/not/ G_H_C)?

No, GCC implements -Ox properly.

...

I know about the GHC bug, that's why I used -O2 for GHC/Haskell and -O3 for GCC/C.

You just contradicted the line I quoted; was it in error? Stefan

Stefan O'Rear

1:11 a.m.

On Thu, Nov 08, 2007 at 05:03:54PM -0800, Stefan O'Rear wrote:

...

On Fri, Nov 09, 2007 at 01:39:55AM +0100, Thomas Schilling wrote:

...
On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote:

...
On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:

...
$ ghc --make -O2 ghc-bench.hs

...
Even for GCC (/not/ G_H_C)?

No, GCC implements -Ox properly.

...
I know about the GHC bug, that's why I used -O2 for GHC/Haskell and -O3 for GCC/C.

You just contradicted the line I quoted; was it in error?

Nevermind, I'm just blind today. (thanks Shachaf for bringing this to my attention...) Stefan

Thomas Schilling

8 Nov 8 Nov

7:24 p.m.

On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:

...

I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case:

I have this C program:

#include

#define n 100000000

double a[n];

int main() { int i; for (i = 0; i
And this Haskell program:

import Data.Array.IO import Control.Monad

n = 10000000

main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail.

What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work?

Wow. You should *really* try using GHC 6.8.1: GHC 6.6.1 (-O2) real 0m7.062s user 0m5.464s sys 0m0.288s GHC 6.8.0.20071019 real 0m0.723s user 0m0.616s sys 0m0.100s result is "2.0e7" for both

Dan Piponi

7:30 p.m.

On Nov 8, 2007 11:24 AM, Thomas Schilling wrote:

...

Wow. You should *really* try using GHC 6.8.1:

I was hoping you weren't going to say that :-) As soon as I find a suitable 64-bit Intel binary for MacOSX, or can bootstrap my way out of happy needing happy in my attempted source build, I'll report back on how my code performs. -- Dan

Don Stewart

9 p.m.

nominolo:

...

On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:

...
I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case:

I have this C program:

#include

#define n 100000000

double a[n];

int main() { int i; for (i = 0; i
And this Haskell program:

import Data.Array.IO import Control.Monad

n = 10000000

main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail.

What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work?

Wow. You should *really* try using GHC 6.8.1:

GHC 6.6.1 (-O2)

real 0m7.062s user 0m5.464s sys 0m0.288s

GHC 6.8.0.20071019

real 0m0.723s user 0m0.616s sys 0m0.100s

result is "2.0e7" for both

So that looks like a bug in the simplifier in 6.6.1 that has been fixed. This should go in the testsuite, perhaps, Simon? -- Don

Don Stewart

7:53 p.m.

dpiponi:

...

I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case:

I have this C program:

#include

#define n 100000000

double a[n];

int main() { int i; for (i = 0; i
And this Haskell program:

import Data.Array.IO import Control.Monad

n = 10000000

main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x

Be really careful with Doubles (i.e. check what Int arrays also do), since you need special flags to get good performance from unboxed Double arrays. The best effort I know of is: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=all which took some time to work out the right path through the compiler. In essence: * some gcc flags: -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math * Foreign.Marshal.Array * type Reals = Ptr Double * inlinePerformIO in the end I'm stunned at how well we did there, since Double math has been a long term issue. Perhaps there are other lessons we can extract from that code? -- Don

Bulat Ziganshin

9:29 p.m.

New subject: Re[2]: Haskell performance question

Hello Don, Thursday, November 8, 2007, 10:53:28 PM, you wrote:

...

...
a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) }

oh, i was stupid. obviously, first thing you need to do is to use unsafeRead/unsafeWrite operations. i'm wonder how this code is only 20x slower than C version, may be you need to use gcc -O3 -funroll-loops and gcc3 (because gcc4 becomes slower) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Don Stewart

10:28 p.m.

bulat.ziganshin:

...

Hello Don,

Thursday, November 8, 2007, 10:53:28 PM, you wrote:

...
...
a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) }

oh, i was stupid. obviously, first thing you need to do is to use unsafeRead/unsafeWrite operations. i'm wonder how this code is only 20x slower than C version, may be you need to use gcc -O3 -funroll-loops and gcc3 (because gcc4 becomes slower)

I think you use a different computer to me. Out of the box, amd64, ghc 6.8, with the same n=100000000, Dan's original code runs: $ ghc -O2 A.hs $ time ./A_slow 1.0e8 ./A_slow 1.60s user 0.50s system 95% cpu 2.193 total and the C program $ gcc -O2 A.c -o A_c $ time ./A_c 100000000.000000 ./A_c 1.03s user 0.52s system 96% cpu 1.612 total without using any special flags. A 1.3x slowdown such as this is more typical of low level array programs. If I awa a > 5x slow down it would be considered a bug, and its been a /long/ time since I've seen a 20x slowdown. Note that with ghc 6.6, we get: $ time ./A_66 A_66: out of memory (requested 1048576 bytes) ./A_66 4.94s user 0.78s system 99% cpu 5.735 total So I know which compiler I prefer. -- Don

6446

Age (days ago)

6456

Last active (days ago)

List overview

Download

63 comments

17 participants

participants (17)

Aaron Denney
Andrew Coppin
Bulat Ziganshin
Dan Piponi
Derek Elkins
Dominic Steinitz
Don Stewart
Duncan Coutts
Jason Dusek
Joel Reymont
Mikhail Gusarov
Neil Mitchell
Stefan O'Rear
Thomas DuBuisson
Thomas M. DuBuisson
Thomas Schilling
Xiao-Yong Jin