Ah, it seems that I see now what's going wrong way. I'm not using the 'run' function from the CUDA backend, and so by default I guess the code is interpreted (the test backend used for semantics check). However, it's not perfectly clear how to use CUDA backend explicitly.
Hi Paul,
even though I just started reading about Accelerate, it seems to me that
you didn't use the "use" method which according to [1] initiates
asynchronous data transfer from host to GPU.
Cheers,
Alex
[1]: http://www.cse.unsw.edu.au/%7Echak/papers/acc-cuda.pdf
> _______________________________________________On 20/02/12 14:46, Paul Sujkov wrote:
> Hi everyone,
>
> since accelerate mail list seems to be defunct, I'm trying to ask
> specific questions here. The problem is: array initialization in
> Data.Array.Accelerate takes a 10x amount of time in contrast to both
> Data.Array and bare C++ CUDA array initialization. This can be due to
> Data.Array.Accelerate having two backends (however, it's own tests show
> that my nVidia card is CUDA-capable), but I'm not aware of how can I
> profile GPU to check whether it is used or not. Anyway, here's code:
>
> http://hpaste.org/64036
>
> both generateArray (DIM3) and generateArray1 (DIM1) take the same amount
> of time to initialize array. I'd say the problem is in GPU memory
> copying time, but here's bare C++ code:
>
> http://hpaste.org/64037
>
> which does exactly the same, but 10 times faster. I'm wandering what am
> I doing wrong and how to check if I really am. Thanks in advance if
> anyone can point me on my mistakes!
>
> --
> Regards, Paul Sujkov
>
>
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe