@Niklas

You can use the `foldM` form the FoldL package to achieve equal results as the fastest loops in your current benchmark. foldM will allow you to deal with your loops at a high level of abstraction though. See the following post, by Gabriel Gonzalez, for an example:
http://www.haskellforall.com/2013/08/foldl-100-composable-streaming-and.html

I have added the bench mark to my fork of your repo, and made a pull request:

It looks like the only reason that `foldM` does not preform well with `Word32` is because of the naive implementation of `enumFromTo` for Word32 as explained in my other email in more detail.

Here is the Criterion report:

http://htmlpreview.github.io/?https://github.com/Davorak/loop/blob/master/results/bench.html

On Tue, Apr 29, 2014 at 11:31 AM, Niklas Hambüchen <mail@nh2.me> wrote:

This is just a short notice that using

foldl' (+) 0 [0..100000::Int]

is over 10 times slower than using

flip execState 0 $
forLoop (0 :: Int) (< n) (+1) $ \i -> do
x <- get
put $! x + i

with `loopFor` as on https://github.com/nh2/loop.

Even using an IORef is twice as fast as the pure foldl' (but still 5
times slower than strict State).

The benchmark is at
http://htmlpreview.github.io/?https://github.com/nh2/loop/blob/master/results/bench-foldl-and-iorefs-are-slow.html.

(All benchmarks are linked from https://github.com/nh2/loop.)

You can see there that the problem is gone when using Vector.foldl', but
only for Int - for Word32 it persists.

It seems that manual looping is beneficial even when dealing with prime
examples of pure code that GHC ought to optimize well.

Niklas
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

--
Patrick Wheeler
Patrick.John.Wheeler@gmail.com
Patrick.J.Wheeler@rice.edu
Patrick.Wheeler@colorado.edu