Well, if it's "in many ways the same as C", then again it's probably not
idiomatic Haskell.
It's just a recursive equation for mandelbrot fractals. I should have been precise, I didn't mean that the source is literally the same as the C source (i.e. there's no for loop, no mutable variables), rather that it presents the compiler backend with the same advantages as the C backend would have with the equivalent loop. That is, there's no control flow obfuscation (dictionaries, calls to unknown targets), no problems with laziness, and the data representations are completely known.
mandel :: Int -> Complex Double -> Int
mandel max_depth c = loop 0 0
where
loop i !z
| i == max_depth = i
| magnitude z >= 2.0 = i
| otherwise = loop (i+1) (z*z + c)
It's also a loop that is readily recognized as a loop. Now, to my knowledge, GHC doesn't explicitly recognize loops in any stage of the compiler (so as to perform loop optimizations), something which other functional compilers sometimes do.
But, anyway, it turns out that my example above is easily transformed from a bad GHC performance story into a good one. If you'll bear with me, I'll show how below.
First, Manuel makes a good point about the LLVM backend. My "6X" anecdote was from a while ago and I didn't use llvm [1]. I redid it just now with 7.4.1+LLVM, results below. (The below table should read correctly in fixed width font,
but you can also see the data in the spreadsheet here.)
Time (ms) Compiled File size Comple+Runtime (ms)
GHC 7.4.1 O0 2444 1241K
GHC 7.4.1 O2 925 1132K 1561
GHC 7.4.1 O2 llvm 931 1133K
GHC 7.0.4 O2 via-C 684 974K
So LLVM didn't help [1]. And in fact the deprecated via-C backend did the best! Compare with GCC:
G++ O0 300 9K 531
G++ O3 110 7K 347
G++ O3 recursive 116 9K
Uh oh, the "6X" gap I mentioned is now closer to 9X. And, in fact, performing a mini "language shootout" on the above code, reveals that GHC is doing worse than not only OCaml, but Chez Scheme, in spite of dynamic type checks, and a necessarily boxed representation of complex numbers:
Chez Scheme 8.4 284 2.7K notStandalone 372
OCaml 166 180K 301
At least Python does worse!
Python 2.6 1973 NA 1973
GHC 7.4.1 No
Overflow Avoidance 39 1127K 674
GHC 741, OCaml-style
Overflow avoidance 74 1127K
Wow, now not only is the Haskell version faster than OCaml/Scheme, but it is 48% faster than C++, which is appropriate to the title of this email! Of course, this probably means that C++'s abs is also doing something overly expensive for overflow avoidance (or that their representation of complex numbers is not fully unpacked and stored in registers) I haven't tracked it down yet.
But in any case, I'll be submitting a library patch. The moral, I think, is that community members could do a great deal to help "Haskell" performance by simply microbenchmarking and optimizing library routines in base!
Cheers,
-Ryan
[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM.