Re: [Haskell-cafe] Data.Array.Accelerate initialization timings

Ah, it seems that I see now what's going wrong way. I'm not using the 'run'
function from the CUDA backend, and so by default I guess the code is
interpreted (the test backend used for semantics check). However, it's not
perfectly clear how to use CUDA backend explicitly.
If you have any suggestions, it would be a great help!
On 20 February 2012 16:06, Alex Gremm
Hi Paul,
even though I just started reading about Accelerate, it seems to me that you didn't use the "use" method which according to [1] initiates asynchronous data transfer from host to GPU.
Cheers, Alex
[1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf On 20/02/12 14:46, Paul Sujkov wrote:
Hi everyone,
since accelerate mail list seems to be defunct, I'm trying to ask specific questions here. The problem is: array initialization in Data.Array.Accelerate takes a 10x amount of time in contrast to both Data.Array and bare C++ CUDA array initialization. This can be due to Data.Array.Accelerate having two backends (however, it's own tests show that my nVidia card is CUDA-capable), but I'm not aware of how can I profile GPU to check whether it is used or not. Anyway, here's code:
both generateArray (DIM3) and generateArray1 (DIM1) take the same amount of time to initialize array. I'd say the problem is in GPU memory copying time, but here's bare C++ code:
which does exactly the same, but 10 times faster. I'm wandering what am I doing wrong and how to check if I really am. Thanks in advance if anyone can point me on my mistakes!
-- Regards, Paul Sujkov
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Regards, Paul Sujkov

On 20 February 2012 16:39, Paul Sujkov
Ah, it seems that I see now what's going wrong way. I'm not using the 'run' function from the CUDA backend, and so by default I guess the code is interpreted (the test backend used for semantics check). However, it's not perfectly clear how to use CUDA backend explicitly.
Neither the interpreter or the CUDA code are used in your example. Everything in Data.Array.Accelerate are front-end stuff, your arrays are allocated on the host, so it is here there is an inefficiency. The "use" method inserts a statement in the syntax tree generated by the front-end, which the back-end can use as a hint to transfer that array to the GPU, while compiling the rest of the program into CUDA code. The Data.Array.Accelerate.CUDA.run function is the one that actually moves the arrays to the GPU. I haven't tried executing your code and I'm not sure why the front-end is that slow. -- Martin Dybdal

Martin Dybdal:
On 20 February 2012 16:39, Paul Sujkov
wrote: Ah, it seems that I see now what's going wrong way. I'm not using the 'run' function from the CUDA backend, and so by default I guess the code is interpreted (the test backend used for semantics check). However, it's not perfectly clear how to use CUDA backend explicitly.
Neither the interpreter or the CUDA code are used in your example. Everything in Data.Array.Accelerate are front-end stuff, your arrays are allocated on the host, so it is here there is an inefficiency.
The "use" method inserts a statement in the syntax tree generated by the front-end, which the back-end can use as a hint to transfer that array to the GPU, while compiling the rest of the program into CUDA code. The Data.Array.Accelerate.CUDA.run function is the one that actually moves the arrays to the GPU.
I haven't tried executing your code and I'm not sure why the front-end is that slow.
The 'fromList' function is mostly meant for testing or to initialise small arrays. It is not particularly optimised. (Going via a vanilla list is just a bad idea if you want performance.) For efficient data marshalling have a look at the modules under Data.Array.Accelerate.IO. Manuel
participants (3)
-
Manuel M T Chakravarty
-
Martin Dybdal
-
Paul Sujkov