Repa: traverse delayed array multiple times

Greetings to all! Currently I'm trying to do physical simulations using finite difference method. To accomplish this I need to repeat some manipulations on 2-dimensional grid several thousands or millions times, you know. Is it possible to do using repa? I wrote this rough sketch that shows what I am into. Apparently, program is severely slow. I think reason is: "Every time an element is requested from a delayed array it is calculated anew, which means that delayed arrays are inefficient when the data is needed multiple times" (http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial). I started diving into Guiding Parallel Array Fusion with Indexed Typeshttp://www.cse.unsw.edu.au/%7Ebenl/papers/guiding/guiding-Haskell2012-sub.pd...mentioned there. Is it a right place to find any answer to my question? To summarize, I want to apply my function to Array several thousand times as fast as possible (using multi-threading and GPGPU). It seems I need a lot of memory and throughput. Do I have to call iterate only once at a time or in some kind of batches. What are my possibilities? Thanks in advance! Example program: import System.Environment import Data.Array.Repa as Repa h = 50 w = 50 grid = fromListUnboxed (Z :. h :. w) $ take (h * w) [1, 1..] main = do args <- getArgs let n = read $ head args g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int) putStrLn "Ok" accumulate :: Int -> Array U DIM2 Int -> Array D DIM2 Int accumulate n g = repaIterate step n $ delay g step :: (DIM2 -> Int) -> DIM2 -> Int step f (Z :. i :. j) = center + right + left + top + bottom where left = f' j (i - 1) center = f' j i right = f' j (i + 1) top = f' (j + 1) i bottom = f' (j - 1) i f' j i = if j<0 || i<0 || i>=h || j>=w then 0 else f (Z :. j :. i) repaIterate :: ((DIM2 -> Int) -> DIM2 -> Int) -> Int -> Array D DIM2 Int -> Array D DIM2 Int repaIterate f n = (!! n) . iterate (\array -> traverse array id f) Compilation: ghc -O2 -threaded -rtsopts --make repatest.hs time output: ./repatest 3 +RTS -N2 -H 0,02s user 0,02s system 109% cpu 0,033 total ./repatest 4 +RTS -N2 -H 0,09s user 0,02s system 126% cpu 0,089 total ./repatest 5 +RTS -N2 -H 0,30s user 0,02s system 155% cpu 0,200 total ./repatest 6 +RTS -N2 -H 1,16s user 0,06s system 185% cpu 0,663 total ./repatest 7 +RTS -N2 -H 5,63s user 0,24s system 191% cpu 3,071 total ./repatest 8 +RTS -N2 -H 27,72s user 0,89s system 191% cpu 14,960 total ./repatest 8 +RTS -N1 -H 26,23s user 0,19s system 100% cpu 26,388 total

Andrey Yankin
Greetings to all!
anew, which means that delayed arrays are inefficient when the data is needed multiple times"(http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial).I started diving into Guiding Parallel Array Fusion with Indexed Types mentioned
repa?I wrote this rough sketch that shows what I am into. Apparently, program is severely slow. I think reason is:"Every time an element is requested from a delayed array it is calculated there. Is it a right place to find any answer to my question?
It's certainly possible to do this in repa. As I understand it, that is pretty much what repa is for. I'm not sure about your explanation for the slowdown although it sounds plausible. I amended your code slightly and it runs pretty quickly for me using my 2 processors! updater :: Source r Int => Array r DIM2 Int -> Array D DIM2 Int updater a = traverse a id step updaterM :: Monad m => Int -> Array U DIM2 Int -> m (Array U DIM2 Int) updaterM n = foldr (>=>) return (replicate n (computeP . updater)) and replace g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int) by g <- updaterM n grid BTW I think you can use stencils for what you are doing (I assume it is some sort of relaxation method) as the coefficients are constant. I think this should speed things up further as you won't be testing the boundary conditions on every loop. Dominic.

Hi, Dominic! Thanks for advice. I will definitely use it for now. When I wrote
Do I have to call iterate only once at a time
It was obscure and has wrong words. I was trying to ask if it is necessary to run computeP every time I want to modify(traverse) an array? I was contemplating similar solution as you advise, but there is a concern. I want to modify it thousands of times and do not want to allocate and garbage more memory than needed. Here is the implementation I dream of: There must be function traverseNTimes that takes also an arbitrary number (the number of times to repeat traversal). Then it allocates but two new blocks and uses them to iterate the process as needed. No superfluous memory consumption. Is it unusual case? Is it worth thinking about? Do it give any performance gain? Actually I started with repa only yesterday, so I was thinking of delayed or cursored arrays which are capable of magic. Now I am beginning to see that it is not the case, though I don't know it for sure yet... )
BTW I think you can use stencils for what you are doing (I assume it is some sort of relaxation method) as the coefficients are constant. I think this should speed things up further as you won't be testing the boundary conditions on every loop. Sadly, there are much more sophisticated formulae and more than one boundary conditions on each side (and these conditions are moving)...
2013/1/14 Dominic Steinitz
Andrey Yankin
writes: Greetings to all!
anew, which means that delayed arrays are inefficient when the data is needed multiple times"( http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial).I started diving into Guiding Parallel Array Fusion with Indexed Types mentioned
repa?I wrote this rough sketch that shows what I am into. Apparently, program is severely slow. I think reason is:"Every time an element is requested from a delayed array it is calculated there. Is it a right place to find any answer to my question?
It's certainly possible to do this in repa. As I understand it, that is pretty much what repa is for. I'm not sure about your explanation for the slowdown although it sounds plausible.
I amended your code slightly and it runs pretty quickly for me using my 2 processors!
updater :: Source r Int => Array r DIM2 Int -> Array D DIM2 Int updater a = traverse a id step
updaterM :: Monad m => Int -> Array U DIM2 Int -> m (Array U DIM2 Int) updaterM n = foldr (>=>) return (replicate n (computeP . updater))
and replace
g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int)
by
g <- updaterM n grid
BTW I think you can use stencils for what you are doing (I assume it is some sort of relaxation method) as the coefficients are constant. I think this should speed things up further as you won't be testing the boundary conditions on every loop.
Dominic.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (2)
-
Andrey Yankin
-
Dominic Steinitz