Switching to quotRem gave no measurable improvements.
After switching to ByteString, the code now runs in 9 seconds, which outperforms my C version. But honestly, I have no idea why.
New code:
$ ghc --make -O3 303only012.hs && time ./303only012 50000000 > /dev/null
./303only012 50000000 > /dev/null 9.72s user 0.21s system 90% cpu 10.961 total
@Alois, I'm not sure how criterion can help compare my code with the C version, since in the C version I cannot measure the exec time of only012 only. What did you have in mind?
Thanks everyone!