Thanks Erik!
I was kind of hoping for such a benchmark. On my system, the C version is consistently twice as fast as the Haskell version. (I tried -fllvm, but it doesn't change perf)
Then I added bang patterns for the argument of only012 and in the tuple (p,q) and it now has the same runtime as C.
Just for fun, I tried to void writing explicit recursion and used only combinators and this Haskell code is still as competitive as C, so the real bottleneck really was the laziness in the tuple/argument.
In fact, this Haskell version takes 8 sec vs 10 sec for the C version on my laptop (input: 1000000000).
I tried using Data.List.Stream, I couldn't get any performance gain, probably because the code is very close to optimal.
So remember kids:
- Bang pattern the heck out of your code (not a silver bullet, but definitely worth trying).
- Don't rely on too much IO for benchmark.
- Don't be afraid of combinators even if you want performance.
- Ask on haskell-cafe, chaps here are very helpful :)