Thanks. Your version is much faster.
Yes, I have compiled with 
ghc --make -O2 -fllvm -optlo-O3 -optlo-constprop fannkuchredux4.hs
(there is bug in ghc 7.4.2 regarding llvm 3.1 > which is circumvented with constrprop)

results: 
yours:
bmaxa@maxa:~/shootout/fannkuchredux$ time ./fannkuchredux4 12
3968050
Pfannkuchen(12) = 65

real    0m39.200s
user    0m39.132s
sys     0m0.044s

mine:
bmaxa@maxa:~/shootout/fannkuchredux$ time ./fannkuchredux 12
3968050
Pfannkuchen(12) = 65

real    0m50.784s
user    0m50.660s
sys     0m0.092s

Seems that you machine is faster than mine and somewhat better for executing mine version.
Thanks ! Should I contribute your version on shootout site?



Date: Mon, 3 Dec 2012 00:01:32 -0800
Subject: Re: [Haskell-cafe] Help optimize fannkuch program
From: bos@serpentine.com
To: bmaxa@hotmail.com
CC: haskell-cafe@haskell.org

On Sun, Dec 2, 2012 at 3:12 PM, Branimir Maksimovic <bmaxa@hotmail.com> wrote:
Well, playing with Haskell I have literally trasnlated my c++ program 
http://shootout.alioth.debian.org/u64q/program.php?test=fannkuchredux&lang=gpp&id=3
and got decent performance but not that good in comparison
with c++ 
On my machine Haskell runs 52 secs while c++ 30 secs.

Did you compile with -O2 -fllvm?

On my machine:

C++ 28 sec
Mine -O2 -fllvm 37 sec
Yours -O2 -fllvm 41 sec
Mine -O2 48 sec
Yours -O2 54 sec

My version of your Haskell code is here: http://hpaste.org/78705