DPH, granularity, and GPUs

29 Sep 2010

      Hello,

DPH seems to build parallel vectors at the level of scalar elements
(doubles, say).  Is this a design decision aimed at targettiing GPUs?  If I
am filtering an hour's worth of multichannel data (an array of (Vector
Double)) then off the top of my head I would think that the optimal
efficiency would be achieved on n cpu cores with each core filtering one
channel, rather than trying to do anything fancy with processing vectors in
parallel.

I say this because filters (which could be assembled from arrow structures)
feedback across (regions of) a vector.  Do GPUs have some sort of shift
operation optimisation?  In other words, if I have a (constant) matrix A, my
filter, and a datastream, x, where x_i(t+1) = x_{i-1}(t), can a GPU perform
Ax in O(length(x))?

Otherwise, given the cost of moving data to and from the GPU, I would guess
that one sequential algorithm per core is faster (Concurrent Haskell) and
that there is a granularity barrier.

Cheers,

Vivian

DPH, granularity, and GPUs

Vivian McPhail