The core benefit of nested data parallelism is that the sub-arrays in a nested array of arrays can be of varying size leading to irregular parallelism. However, that flexibility comes at a price, namely that it is a fairly inefficient representation for *rectangular arrays*, such as regular two-dimensional matrices (as in your example). It shouldn't be quite as inefficient as what you report, but it will never be competitive with a dedicated regular representation.

mblanco <blancomau@gmail.com>:

Hi, I'm trying to implement a matrix product example using DPH. This is the code:
-------------------------------------------------------------------------------------------------------------------
type MMultType = Double
type Matrix = [:[:MMultType:]:]
type MVector = [:MMultType:]
type Matrix_wrapper = PArray (PArray MMultType)

{-# NOINLINE matMult_wrapper #-}
matMult_wrapper :: Matrix_wrapper -> Matrix_wrapper -> Matrix_wrapper
matMult_wrapper mA mB = toPArrayP (mapP toPArrayP (matMult (fromNestedPArrayP mA) (fromNestedPArrayP mB)))

matMult :: Matrix -> Matrix -> Matrix
matMult mA mB = mapP (\row -> mapP (\col -> dotp row col) (transposeP mB)) mA

dotp :: MVector -> MVector -> MMultType
dotp row col = D.sumP (zipWithP (D.*) row col)

transposeP :: Matrix -> Matrix
transposeP m =
let
h = lengthP m
w = lengthP (m !: 0)
rh = I.enumFromToP 0 (h I.- 1)
rw = I.enumFromToP 0 (w I.- 1)
in
if h I.== 0 then [: :]
else mapP (\y -> mapP (\x -> m !: x !: y) rh) rw
-------------------------------------------------------------------------------------------------------------------

My problem is at execution time, on matrices of size 300*300 the program does finish (although it is very slow), but on 700*700 it consumes GBs of RAM until the process is aborted.

In the paper "Work Efficient Higher-Order Vectorisation" it is explained that a work complexity problem (wich involved unnecesary array replication) was recently treated. So at first I thought the code implementation related to the paper had not been uploaded to hackage. But as I understand it must have been, as that seems to be the motive of the "dph-lifted-vseg" package.

Does anybody notice the problem with the example or if the problem is related to the subject treated in the paper?

Thanks in advance!
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe