Re: [Haskell-cafe] Naive matrix multiplication with Accelerate

5 Dec 2012


      No. But that doesn't stop me from being curious with Accelerate. Might you
have a better explaination for what's happening here than Trevor's?

  - Clark


On Tue, Dec 4, 2012 at 7:08 PM, Alexander Solla wrote:
...
I don't mean to be blunt, but have you guys taken a course in linear
algebra?
On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <
tmcdonell@cse.unsw.edu.au> wrote:
...
As far as I am aware, the only description is in the Repa paper. I you
are right, it really should be explained properly somewhere…
At a simpler example, here is the outer product of two vectors [1].
vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc
(Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys
xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys
If we then `A.fold (+) 0` the matrix, it would reduce along each row
producing a vector. So the first element of that vector is going to be
calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's
the idea we want for our matrix multiplication … but I agree, it is
difficult for me to visualise as well.
I do the same sort of trick with the n-body demo to get all n^2 particle
interactions.
-Trev
[1]: http://en.wikipedia.org/wiki/Outer_product#Vector_multiplication
On 04/12/2012, at 3:41 AM, Clark Gaebel  wrote:
Ah. I see now. Silly Haskell making inefficient algorithms hard to write
and efficient ones easy. It's actually kind of annoying when learning, but
probably for the best.
Is there a good write-up of the algorithm you're using somewhere? The
Repa paper was very brief in its explaination, and I'm having trouble
visualizing the mapping of the 2D matricies into 3 dimensions.
- Clark
On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <
tmcdonell@cse.unsw.edu.au> wrote:
...
Hi Clark,
The trick is that most accelerate operations work over multidimensional
arrays, so you can still get around the fact that we are limited to flat
data-parallelism only.
Here is matrix multiplication in Accelerate, lifted from the first Repa
paper [1].
import Data.Array.Accelerate as A
type Matrix a = Array DIM2 a
matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc
(Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp
Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp
Int
arrRepl             = A.replicate (lift $ Z :. All   :. colsB :.
All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :.
All) (A.transpose brr)
If you use github sources rather than the hackage package, those
intermediate replicates will get fused away.
Cheers,
-Trev
[1] http://www.cse.unsw.edu.au/~chak/papers/KCLPL10.html
On 03/12/2012, at 5:07 PM, Clark Gaebel  wrote:
Hello cafe,
I've recently started learning about cuda and hetrogenous programming,
and have been using accelerate [1] to help me out. Right now, I'm running
into trouble in that I can't call parallel code from sequential code. Turns
out GPUs aren't exactly like Repa =P.
Here's what I have so far:
import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )
dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc
(Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys
type Matrix a = Array DIM2 a
getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All
-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc
(Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than
column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b
This, of course, gives me errors right now because I'm calling getRow
and dotP from within the generation function, which expects Exp[ression]s,
not Acc[elerated computation]s.
So maybe I need to replace that line with an inner for loop? Is there an
easy way to do that with Accelerate?
Thanks for your help,
  - Clark
[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe