Well, if it's "in many ways the same as C", then again it's probably not
idiomatic Haskell.  

It's just a recursive equation for mandelbrot fractals.  I should have been precise, I didn't mean that the source is literally the same as the C source (i.e. there's no for loop, no mutable variables), rather that it presents the compiler backend with the same advantages as the C backend would have with the equivalent loop.  That is, there's no control flow obfuscation (dictionaries, calls to unknown targets), no problems with laziness, and the data representations are completely known.  


mandel :: Int -> Complex Double -> Int
mandel max_depth c = loop 0 0
  where
   loop i !z
    | i == max_depth = i
    | magnitude z >= 2.0 = i
    | otherwise = loop (i+1) (z*z + c)

It's also a loop that is readily recognized as a loop.  Now, to my knowledge, GHC doesn't explicitly recognize loops in any stage of the compiler (so as to perform loop optimizations), something which other functional compilers sometimes do.  

But, anyway, it turns out that my example above is easily transformed from a bad GHC performance story into a good one.  If you'll bear with me, I'll show how below.
   First, Manuel makes a good point about the LLVM backend.  My "6X" anecdote was from a while ago and I didn't use llvm [1].  I redid it just now with 7.4.1+LLVM, results below.  (The below table should read correctly in fixed width font, but you can also see the data in the spreadsheet here.)

                   Time (ms)   Compiled File size   Comple+Runtime (ms)
GHC 7.4.1 O0    2444        1241K
GHC 7.4.1 O2    925        1132K             1561
GHC 7.4.1 O2 llvm  931         1133K
GHC 7.0.4 O2 via-C 684         974K

So LLVM didn't help [1].  And in fact the deprecated via-C backend did the best!  Compare with GCC: 

G++ O0             300         9K                   531
G++ O3             110         7K                   347
G++ O3 recursive   116         9K

Uh oh, the "6X" gap I mentioned is now closer to 9X.  And, in fact, performing a mini "language shootout" on the above code, reveals that GHC is doing worse than not only OCaml, but Chez Scheme, in spite of dynamic type checks, and a necessarily boxed representation of complex numbers:

Chez Scheme 8.4    284         2.7K notStandalone   372
OCaml              166         180K                 301

At least Python does worse!

Python 2.6         1973        NA                   1973

So here's the catch:  If you check the Core and STG GHC 7.4 is actually compiling the above loop very well.  This microbenchmark turns into just a "magnitude" microbenchmark.  The implementation of Data.Complex uses an EXTREMELY expensive method to avoid overflow [2].

Since OCaml did well above, I took a look at their standard library's implementation, on line 51 here.  They use a nice little math trick (the extra division) that is also mentioned on Wikipedia.  By implementing the same trick in Haskell, replacing the standard "magnitude" function, we get the following results.

GHC 7.4.1 No
Overflow Avoidance   
39         1127K                674
GHC 741, OCaml-style
Overflow avoidance   
74                  1127K

Wow, now not only is the Haskell version faster than OCaml/Scheme, but it is 48% faster than C++, which is appropriate to the title of this email!  Of course, this probably means that C++'s abs is also doing something overly expensive for overflow avoidance (or that their representation of complex numbers is not fully unpacked and stored in registers)  I haven't tracked it down yet.

But in any case, I'll be submitting a library patch.  The moral, I think, is that community members could do a great deal to help "Haskell" performance by simply microbenchmarking and optimizing library routines in base!

Cheers,
  -Ryan

P.S. You can check out the above benchmarks from here: https://github.com/rrnewton/MandelMicrobench

[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM.

[2]  P.P.P.S. It turns out there was already a ticket (http://hackage.haskell.org/trac/ghc/ticket/2450) regarding magnitude's performance.  But it still has bad performance even after a small refactoring was performed.