
Martin Dybdal:
On 20 February 2012 16:39, Paul Sujkov
wrote: Ah, it seems that I see now what's going wrong way. I'm not using the 'run' function from the CUDA backend, and so by default I guess the code is interpreted (the test backend used for semantics check). However, it's not perfectly clear how to use CUDA backend explicitly.
Neither the interpreter or the CUDA code are used in your example. Everything in Data.Array.Accelerate are front-end stuff, your arrays are allocated on the host, so it is here there is an inefficiency.
The "use" method inserts a statement in the syntax tree generated by the front-end, which the back-end can use as a hint to transfer that array to the GPU, while compiling the rest of the program into CUDA code. The Data.Array.Accelerate.CUDA.run function is the one that actually moves the arrays to the GPU.
I haven't tried executing your code and I'm not sure why the front-end is that slow.
The 'fromList' function is mostly meant for testing or to initialise small arrays. It is not particularly optimised. (Going via a vanilla list is just a bad idea if you want performance.) For efficient data marshalling have a look at the modules under Data.Array.Accelerate.IO. Manuel