
bf3:
Thanks Bulat, but now you scattered my hopes that GHC would magically do all these optimizations for me ;-)
I must say that although the performance of Haskell is not really a concern to me, I was a bit disappointed that even with all the tricks of the state monad, unboxing, and no-bounds-check, the matrix-vector multiplication was still 7 to 8 times slower than the C version. And at the end of the paper, it's only a factor 4 slower. Okay, going from 300x slower to 4x slower is impressive, but why is it *still* 4x slower? It would be interesting to compare the assembly code generated by the C compiler versus the GHC compiler; after all, we're just talking about a vector/matrix multiplication, which is just a couple of lines of assembly code... And now I'm again talking about performance, nooo! ;-)
Yeah, there's some known low level issues in the code generator regarding heap and stack checks inside loops, and the use of registers on x86. But note this updated paper, http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html Add another core to your machine and it is no longer 4x slower :) Add 15 more cores and its really no longer 4x slower :) -- Don