
droundy On Wed, Apr 23, 2008 at 05:24:20PM +0100, Sebastian Sylvan wrote:
On Wed, Apr 23, 2008 at 5:01 PM, Tillmann Vogt
wrote: Hi,
I am currently experimenting with parallelizing C-programs. I have therefore written a matrix vector multiplication example that needs 13 seconds to run (5 seconds with OpenMP). Because I like Haskell I did the same in this language, but it takes about 134 seconds. Why is it so slow? Does someone have an idea?
Yes, in the C version you use unboxed arrays, in the Haskell version you use a linked list of linked lists. Naturally the latter will take up more space, require more work to index, and will thrash the cache quite a bit.
In fact, I'm impressed that Haskell can come within a factor of 10, given that it's using such a challenging data type! (I wonder if this may be due to inefficiencies in the C code, although I haven't looked at it.)
I had a look at this code, using the (unboxed, strict, fused) data parallel
arrays library (http://darcs.haskell.org/packages/ndp).
We get rather nice code. The original C program (modified, to sum the result,
rather than just filling the array and throwing it out):
#include