
Ah, it seems that I see now what's going wrong way. I'm not using the 'run'
function from the CUDA backend, and so by default I guess the code is
interpreted (the test backend used for semantics check). However, it's not
perfectly clear how to use CUDA backend explicitly.
If you have any suggestions, it would be a great help!
On 20 February 2012 16:06, Alex Gremm
Hi Paul,
even though I just started reading about Accelerate, it seems to me that you didn't use the "use" method which according to [1] initiates asynchronous data transfer from host to GPU.
Cheers, Alex
[1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf On 20/02/12 14:46, Paul Sujkov wrote:
Hi everyone,
since accelerate mail list seems to be defunct, I'm trying to ask specific questions here. The problem is: array initialization in Data.Array.Accelerate takes a 10x amount of time in contrast to both Data.Array and bare C++ CUDA array initialization. This can be due to Data.Array.Accelerate having two backends (however, it's own tests show that my nVidia card is CUDA-capable), but I'm not aware of how can I profile GPU to check whether it is used or not. Anyway, here's code:
both generateArray (DIM3) and generateArray1 (DIM1) take the same amount of time to initialize array. I'd say the problem is in GPU memory copying time, but here's bare C++ code:
which does exactly the same, but 10 times faster. I'm wandering what am I doing wrong and how to check if I really am. Thanks in advance if anyone can point me on my mistakes!
-- Regards, Paul Sujkov
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Regards, Paul Sujkov