Hi, I'm trying to implement a matrix product example using DPH. This is the code:
-------------------------------------------------------------------------------------------------------------------
type MMultType = Double
type Matrix = [:[:MMultType:]:]
type MVector = [:MMultType:]
type Matrix_wrapper = PArray (PArray MMultType)

{-# NOINLINE matMult_wrapper #-}
matMult_wrapper :: Matrix_wrapper -> Matrix_wrapper -> Matrix_wrapper
matMult_wrapper mA mB = toPArrayP (mapP toPArrayP (matMult (fromNestedPArrayP mA) (fromNestedPArrayP mB)))

matMult :: Matrix -> Matrix -> Matrix
matMult mA mB = mapP (\row -> mapP (\col -> dotp row col) (transposeP mB)) mA

dotp :: MVector -> MVector -> MMultType
dotp row col = D.sumP (zipWithP (D.*) row col)

transposeP :: Matrix -> Matrix
transposeP m = 
    let
        h = lengthP m
        w = lengthP (m !: 0)
        rh = I.enumFromToP 0 (h I.- 1)
        rw = I.enumFromToP 0 (w I.- 1)
    in
        if h I.== 0 then [: :]
        else mapP (\y -> mapP (\x -> m !: x !: y) rh) rw
-------------------------------------------------------------------------------------------------------------------

My problem is at execution time, on matrices of size 300*300 the program does finish (although it is very slow), but on 700*700 it consumes GBs of RAM until the process is aborted.

In the paper "Work Efficient Higher-Order Vectorisation" it is explained that a work complexity problem (wich involved unnecesary array replication) was recently treated. So at first I thought the code implementation related to the paper had not been uploaded to hackage. But as I understand it must have been, as that seems to be the motive of the "dph-lifted-vseg" package.

Does anybody notice the problem with the example or if the problem is related to the subject treated in the paper?

Thanks in advance!