Hi Clark,The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].import Data.Array.Accelerate as Atype Matrix a = Array DIM2 amatMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)matMul arr brr= A.fold (+) 0$ A.zipWith (*) arrRepl brrReplwhereZ :. rowsA :. _ = unlift (shape arr) :: Z :. Exp Int :. Exp IntZ :. _ :. colsB = unlift (shape brr) :: Z :. Exp Int :. Exp IntarrRepl = A.replicate (lift $ Z :. All :. colsB :. All) arrbrrRepl = A.replicate (lift $ Z :. rowsA :. All :. All) (A.transpose brr)
If you use github sources rather than the hackage package, those intermediate replicates will get fused away.Cheers,-TrevOn 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel@uwaterloo.ca> wrote:_______________________________________________Hello cafe,
I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.
Here's what I have so far:
import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
, Acc
, Vector
, Scalar
, Elt
, fold
, slice
, constant
, Array
, Z(..), DIM1, DIM2
, fromList
, All(..)
, generate
, lift, unlift
, shape
)
import Data.Array.Accelerate.Interpreter ( run )
dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys
type Matrix a = Array DIM2 a
getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All
-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
\ix ->
let (Z :. i :. j) = unlift ix
in getRow i a `dotP` getRow j b
where
b = A.transpose b' -- I assume row indexing is faster than column indexing...
(Z :. nrows :. _ ) = unlift $ shape a
(Z :. _ :. ncols) = unlift $ shape b
This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.
So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?
Thanks for your help,
- Clark
[1] http://hackage.haskell.org/package/accelerate
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe