
Greetings to all! Currently I'm trying to do physical simulations using finite difference method. To accomplish this I need to repeat some manipulations on 2-dimensional grid several thousands or millions times, you know. Is it possible to do using repa? I wrote this rough sketch that shows what I am into. Apparently, program is severely slow. I think reason is: "Every time an element is requested from a delayed array it is calculated anew, which means that delayed arrays are inefficient when the data is needed multiple times" (http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial). I started diving into Guiding Parallel Array Fusion with Indexed Typeshttp://www.cse.unsw.edu.au/%7Ebenl/papers/guiding/guiding-Haskell2012-sub.pd...mentioned there. Is it a right place to find any answer to my question? To summarize, I want to apply my function to Array several thousand times as fast as possible (using multi-threading and GPGPU). It seems I need a lot of memory and throughput. Do I have to call iterate only once at a time or in some kind of batches. What are my possibilities? Thanks in advance! Example program: import System.Environment import Data.Array.Repa as Repa h = 50 w = 50 grid = fromListUnboxed (Z :. h :. w) $ take (h * w) [1, 1..] main = do args <- getArgs let n = read $ head args g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int) putStrLn "Ok" accumulate :: Int -> Array U DIM2 Int -> Array D DIM2 Int accumulate n g = repaIterate step n $ delay g step :: (DIM2 -> Int) -> DIM2 -> Int step f (Z :. i :. j) = center + right + left + top + bottom where left = f' j (i - 1) center = f' j i right = f' j (i + 1) top = f' (j + 1) i bottom = f' (j - 1) i f' j i = if j<0 || i<0 || i>=h || j>=w then 0 else f (Z :. j :. i) repaIterate :: ((DIM2 -> Int) -> DIM2 -> Int) -> Int -> Array D DIM2 Int -> Array D DIM2 Int repaIterate f n = (!! n) . iterate (\array -> traverse array id f) Compilation: ghc -O2 -threaded -rtsopts --make repatest.hs time output: ./repatest 3 +RTS -N2 -H 0,02s user 0,02s system 109% cpu 0,033 total ./repatest 4 +RTS -N2 -H 0,09s user 0,02s system 126% cpu 0,089 total ./repatest 5 +RTS -N2 -H 0,30s user 0,02s system 155% cpu 0,200 total ./repatest 6 +RTS -N2 -H 1,16s user 0,06s system 185% cpu 0,663 total ./repatest 7 +RTS -N2 -H 5,63s user 0,24s system 191% cpu 3,071 total ./repatest 8 +RTS -N2 -H 27,72s user 0,89s system 191% cpu 14,960 total ./repatest 8 +RTS -N1 -H 26,23s user 0,19s system 100% cpu 26,388 total