Repa: traverse delayed array multiple times

13 Jan 2013

      Greetings to all!

Currently I'm trying to do physical simulations using finite difference
method. To accomplish this I need to repeat some manipulations on
2-dimensional grid several thousands or millions times, you know. Is it
possible to do using repa?

I wrote this rough sketch that shows what I am into. Apparently, program is
severely slow. I think reason is:
"Every time an element is requested from a delayed array it is calculated
anew, which means that delayed arrays are inefficient when the data is
needed multiple times"
(http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial).

I started diving into Guiding Parallel Array Fusion with Indexed
Typeshttp://www.cse.unsw.edu.au/%7Ebenl/papers/guiding/guiding-Haskell2012-sub.pd...mentioned
there. Is it a right place to find any answer to my question?

To summarize, I want to apply my function to Array several thousand times
as fast as possible (using multi-threading and GPGPU). It seems I need a
lot of memory and throughput. Do I have to call iterate only once at a time
or in some kind of batches. What are my possibilities?

Thanks in advance!

Example program:

import System.Environment
import Data.Array.Repa as Repa

h = 50
w = 50

grid = fromListUnboxed (Z :. h :. w) $ take (h * w) [1, 1..]

main = do
  args <- getArgs
  let n = read $ head args
  g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int)
  putStrLn "Ok"

accumulate :: Int -> Array U DIM2 Int -> Array D DIM2 Int
accumulate n g = repaIterate step n $ delay g

step :: (DIM2 -> Int) -> DIM2 -> Int
step f (Z :. i :. j) = center + right + left + top + bottom

 where
  left   = f'  j      (i - 1)
  center = f'  j       i
  right  = f'  j      (i + 1)
  top    = f' (j + 1)  i
  bottom = f' (j - 1)  i
  f' j i   =
    if j<0 || i<0 || i>=h || j>=w
    then 0
    else f (Z :. j :. i)

repaIterate :: ((DIM2 -> Int) -> DIM2 -> Int) -> Int -> Array D DIM2 Int ->
Array D DIM2 Int
repaIterate f n = (!! n) . iterate (\array -> traverse array id f)

Compilation:

ghc -O2 -threaded -rtsopts --make repatest.hs

time output:

./repatest 3 +RTS -N2 -H  0,02s user 0,02s system 109% cpu 0,033 total
./repatest 4 +RTS -N2 -H  0,09s user 0,02s system 126% cpu 0,089 total
./repatest 5 +RTS -N2 -H  0,30s user 0,02s system 155% cpu 0,200 total
./repatest 6 +RTS -N2 -H  1,16s user 0,06s system 185% cpu 0,663 total
./repatest 7 +RTS -N2 -H  5,63s user 0,24s system 191% cpu 3,071 total
./repatest 8 +RTS -N2 -H  27,72s user 0,89s system 191% cpu 14,960 total
./repatest 8 +RTS -N1 -H  26,23s user 0,19s system 100% cpu 26,388 total

Andrey Yankin

Dominic Steinitz

Andrey Yankin

tags

participants (2)