Well, if it's "in many ways the same as C", then again it's probably not
idiomatic Haskell.

It's just a recursive equation for mandelbrot fractals. I should have been precise, I didn't mean that the source is literally the same as the C source (i.e. there's no for loop, no mutable variables), rather that it presents the compiler backend with the same advantages as the C backend would have with the equivalent loop. That is, there's no control flow obfuscation (dictionaries, calls to unknown targets), no problems with laziness, and the data representations are completely known.




mandel :: Int -> Complex Double -> Int



mandel max_depth c = loop 0 0



  where   



   loop i !z



    | i == max_depth      = i



    | magnitude z >= 2.0  = i



    | otherwise           = loop (i+1) (z*z + c)

It's also a loop that is readily recognized as a loop. Now, to my knowledge, GHC doesn't explicitly recognize loops in any stage of the compiler (so as to perform loop optimizations), something which other functional compilers sometimes do.

But, anyway, it turns out that my example above is easily transformed from a bad GHC performance story into a good one. If you'll bear with me, I'll show how below.

First, Manuel makes a good point about the LLVM backend. My "6X" anecdote was from a while ago and I didn't use llvm [1]. I redid it just now with 7.4.1+LLVM, results below. (The below table should read correctly in fixed width font, but you can also see the data in the spreadsheet here.)

Time (ms) Compiled File size Comple+Runtime (ms)

GHC 7.4.1 O0 2444 1241K

GHC 7.4.1 O2 925 1132K 1561

GHC 7.4.1 O2 llvm 931 1133K

GHC 7.0.4 O2 via-C 684 974K

So LLVM didn't help [1]. And in fact the deprecated via-C backend did the best! Compare with GCC:

G++ O0 300 9K 531

G++ O3 110 7K 347

G++ O3 recursive 116 9K

Uh oh, the "6X" gap I mentioned is now closer to 9X. And, in fact, performing a mini "language shootout" on the above code, reveals that GHC is doing worse than not only OCaml, but Chez Scheme, in spite of dynamic type checks, and a necessarily boxed representation of complex numbers:

Chez Scheme 8.4 284 2.7K notStandalone 372

OCaml 166 180K 301

At least Python does worse!

Python 2.6 1973 NA 1973

So here's the catch: If you check the Core and STG GHC 7.4 is actually compiling the above loop very well. This microbenchmark turns into just a "magnitude" microbenchmark. The implementation of Data.Complex uses an EXTREMELY expensive method to avoid overflow [2].

Since OCaml did well above, I took a look at their standard library's implementation, on line 51 here. They use a nice little math trick (the extra division) that is also mentioned on Wikipedia. By implementing the same trick in Haskell, replacing the standard "magnitude" function, we get the following results.

GHC 7.4.1 No
Overflow Avoidance 39 1127K 674

GHC 741, OCaml-style
Overflow avoidance 74 1127K

Wow, now not only is the Haskell version faster than OCaml/Scheme, but it is 48% faster than C++, which is appropriate to the title of this email! Of course, this probably means that C++'s abs is also doing something overly expensive for overflow avoidance (or that their representation of complex numbers is not fully unpacked and stored in registers) I haven't tracked it down yet.

But in any case, I'll be submitting a library patch. The moral, I think, is that community members could do a great deal to help "Haskell" performance by simply microbenchmarking and optimizing library routines in base!

Cheers,

-Ryan

P.S. You can check out the above benchmarks from here: https://github.com/rrnewton/MandelMicrobench

[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM.

[2] P.P.P.S. It turns out there was already a ticket (http://hackage.haskell.org/trac/ghc/ticket/2450) regarding magnitude's performance. But it still has bad performance even after a small refactoring was performed.