Hey,
I was looking at the reverse-complement benchmark on the Language Shootout, and among other things, I noticed that the Haskell implementation was using (filter (/= '\n')) on ByteStrings, and also using lists as queues.
I had a few improvements which using -fasm seem to yield about a 19% improvement in speed, and a 35% reduction in allocation, on my computer. (If both programs are compiled with -fllvm -- I'm not sure whether or not that's fair game on the Shootout -- my implementation is 35% faster, and does 10% less allocation.) I've checked my code on the Shootout's test input, as well.
Mostly, the improvement comes from a tightly specialized version of (filter (/= '\n')), although eliminating an intermediate list entirely (and one used in a queuelike fashion) didn't seem to hurt. I managed to cut the program to a point where the program size is about the same as before.
Let the arguing begin?