
After some evaluation, here come some results of my benchmarks, and the final function I've packaged: * It is tricky to write an Enum-based general loop function that is fast for all data types. The way overflow checks are implemented in different types makes e.g. Int and Word32 behave differently; Word32 is slower. * [a..b] is nice and can be just as fast as a manual loop - if you are lucky. Unfortunately, if you don't want to rely on luck for your program's performance, you can't have your [a..b] (at the time being). * Vector's equivalent of [a..b] *might* not be as luck dependent, but suffers from a factor 5 penalty, which is hopefully a but (see John's post). Current most appealing solution for fast loops ---------------------------------------------- Unfortunately, this seems to be the most robustly fast (across all types I have tested it with) loop: forLoop :: (Monad m) => a -> (a -> Bool) -> (a -> a) -> (a -> m ()) -> m () forLoop start cond inc f = go start where go !x | cond x = f x >> go (inc x) | otherwise = return () And you can use it as forLoop 0 (< n) (+1) $ \i -> ... Now, you probably don't like this, and neither do I, but it really looks like this is what you should use if you want to write high performance code :/ What also works is a `numLoop :: (Num a, Eq a, Monad m) => a -> a -> (a -> m ()) -> m ()` with start and end value like [a..b], but the `forLoop` aboves gives more flexibility and you can use either (+1) or `succ` with it depending on whether you want performance or overflow-safety. Does it really matter? ---------------------- One might think that in many cases the loop incrementing performance doesn't matter too much, since the loop body should dominate the time spent. That is true, but it is easy to get into cases where it doesn't. Some examples: * You want to write a test to make sure that reinterpret-casting a Word32 to Float and back gives you the same Word32 [1]. Using `forLoop` can make the difference if you have to wait 40 seconds for your test to pass or 3 seconds. (This is how I got into this topic.) * You want to implement something that does trivial operations on unboxed vectors, say dot product or matrix multiplication. `forLoop` can make a 5x performance difference. Shouldn't I write something this trivial ad-hoc? ------------------------------------------------ I've packed `forLoop` into https://github.com/nh2/loop and uploaded it to Hackage (http://hackage.haskell.org/package/loop). * Maintaining the fastest way to loop in one place frees one from thinking about it, and I plan to keep this updated with the fastest implementation known (contributions welcome). * It gives us a place to discuss alternatives and collect benchmarks for high-performance looping. Let's hope it becomes useless soon! [1]: http://stackoverflow.com/a/7002812/263061