
On Saturday 26 April 2014, 22:40:50, Niklas Hambüchen wrote:
As you probably now, `forM_ [1..n]` is incredibly slow in Haskell due to lack of list fusion [1];
It's not the lack of list fusion per se. If you have a forM_ [a .. b] $ \n -> do stuff where the type is Int, GHC is perfectly capable of eliminating the list and rewriting it to a loop (usually on par with a hand-written loop, although the core you get from a hand-written loop often is smaller and nicer to look at). The problem is that you use the same list multiple times, and GHC thinks "hey, let me re-use that", so it gets floated to the top-level, and the inner "loop" really uses the list, which apart from the `case`s on the list forces it to use boxed 'Int's instead of Int#. Then the boxed Ints need to be unboxed to be used, oops. If you manage to conceal the fact that the lists are the same from GHC, it eliminates the lists and you get fast code also with forM_. In your matmult example, using forM_ [k-k .. _SIZE-1] $ \j -> do ... does the trick for GHC-7.0 to 7.8. That is of course a brittle workaround, future GHCs might rewrite the k-k to 0 and share the list again.
a manually written loop is 10x faster.
How do you people work around this?
Do the idiomatic thing and write a loop ;)
Cheers, Daniel