
Hello, Recently I tried to look for the status of Data Parallel Haskell with a CUDA backend. I've found [1] which mentions [2] saying that this is difficult and work was being done. That was almost two years ago. Was any progress made since then or is the work stalled? About GSoC, I wonder if there is any part of the ticket that can be done in a summer's worth of time? Thanks, guys! :) [1] http://hackage.haskell.org/trac/summer-of-code/ticket/1537 [2] http://article.gmane.org/gmane.comp.lang.haskell.cafe/37538 -- Felipe.

IIRC, there has been woirk done by Manuel and his team. I'm sure
he'll chime in on that. One thing though, is that CUDA is being
supplanted by OpenCL in the next few years, and OpenCL can handle data
parallelism on multicore CPUs as well as GPUs with the same code.
It's a little more flexible overall than CUDA, and will be portable
across ATI/nVidia/Intell/AMD/Sony Cell in the end, and is well
supported on Linux, Mac, and Windows systems.
There are two Haskell bindings right now for OpenCL, mine which is
OpenCLRaw, and one that you can find IIRC on Google Code if you search
for Haskell and OpenCL in the same search string.
I would love to see a DPH backend in OpenCL, or failing that,
something like GPipe but for computational kernels instead of graphics
done in OpenCL.
-- Jeff
On Wed, Feb 3, 2010 at 12:28 PM, Felipe Lessa
Hello,
Recently I tried to look for the status of Data Parallel Haskell with a CUDA backend. I've found [1] which mentions [2] saying that this is difficult and work was being done. That was almost two years ago. Was any progress made since then or is the work stalled?
About GSoC, I wonder if there is any part of the ticket that can be done in a summer's worth of time?
Thanks, guys! :)
[1] http://hackage.haskell.org/trac/summer-of-code/ticket/1537 [2] http://article.gmane.org/gmane.comp.lang.haskell.cafe/37538
-- Felipe. _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Wed, Feb 3, 2010 at 12:34 PM, Jeff Heard
[...] One thing though, is that CUDA is being supplanted by OpenCL in the next few years, and OpenCL can handle data parallelism on multicore CPUs as well as GPUs with the same code. It's a little more flexible overall than CUDA, and will be portable across ATI/nVidia/Intell/AMD/Sony Cell in the end, and is well supported on Linux, Mac, and Windows systems. [...]
I have been told by NVIDIA folks that CUDA is not going to disappear. Its C++ API (as opposed to its ``driver interface'') is quite a bit higher level than OpenCL, suitable for (painful) application development, whereas OpenCL seems more targeted for compiler and framework writers. Sincerely, Brad

You're quite correct. I should say supplant for normal uses. The
OpenCL drivers are built on top of CUDA, and they intend CUDA to
continue to be available, but OpenCL is more portable and thus
something that we should probably target at some point.
On Wed, Feb 3, 2010 at 1:54 PM, Brad Larsen
On Wed, Feb 3, 2010 at 12:34 PM, Jeff Heard
wrote: [...] One thing though, is that CUDA is being supplanted by OpenCL in the next few years, and OpenCL can handle data parallelism on multicore CPUs as well as GPUs with the same code. It's a little more flexible overall than CUDA, and will be portable across ATI/nVidia/Intell/AMD/Sony Cell in the end, and is well supported on Linux, Mac, and Windows systems. [...]
I have been told by NVIDIA folks that CUDA is not going to disappear. Its C++ API (as opposed to its ``driver interface'') is quite a bit higher level than OpenCL, suitable for (painful) application development, whereas OpenCL seems more targeted for compiler and framework writers.
Sincerely, Brad

I'm personally pessimistic about the STI Cell, but it's a reasonable example wrt the difficulty of writing the backends. Barring the data and message orchestration, which also aflicts GPUs, there's the decision function as to when to migrate a computation to the accelerator (GPU, SPU, ...) Not so much of a problem in chip symmetric mcore, but gets pretty dicey for hybrid mcore.
Been trying to wrap my head around that problem for a couple of years.
-scooter
Sent from my Verizon Wireless BlackBerry
-----Original Message-----
From: Jeff Heard
On Wed, Feb 3, 2010 at 12:34 PM, Jeff Heard
wrote: [...] One thing though, is that CUDA is being supplanted by OpenCL in the next few years, and OpenCL can handle data parallelism on multicore CPUs as well as GPUs with the same code. It's a little more flexible overall than CUDA, and will be portable across ATI/nVidia/Intell/AMD/Sony Cell in the end, and is well supported on Linux, Mac, and Windows systems. [...]
I have been told by NVIDIA folks that CUDA is not going to disappear. Its C++ API (as opposed to its ``driver interface'') is quite a bit higher level than OpenCL, suitable for (painful) application development, whereas OpenCL seems more targeted for compiler and framework writers.
Sincerely, Brad
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hello Felipe,
I copied this email to Sean Lee & Manuel M T Chakravarty as they
worked on Haskell+CUDA, maybe they can comment on the current status?
Here's their paper...
GPU Kernels as Data-Parallel Array Computations in Haskell
http://www.cse.unsw.edu.au/~chak/papers/gpugen.pdf
Hope that helps.
--
Donnie
On Wed, Feb 3, 2010 at 11:28 AM, Felipe Lessa
Hello,
Recently I tried to look for the status of Data Parallel Haskell with a CUDA backend. I've found [1] which mentions [2] saying that this is difficult and work was being done. That was almost two years ago. Was any progress made since then or is the work stalled?
About GSoC, I wonder if there is any part of the ticket that can be done in a summer's worth of time?
Thanks, guys! :)
[1] http://hackage.haskell.org/trac/summer-of-code/ticket/1537 [2] http://article.gmane.org/gmane.comp.lang.haskell.cafe/37538
-- Felipe. _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Wed, Feb 03, 2010 at 11:37:09AM -0600, Donnie Jones wrote:
Hello Felipe,
I copied this email to Sean Lee & Manuel M T Chakravarty as they worked on Haskell+CUDA, maybe they can comment on the current status?
Here's their paper... GPU Kernels as Data-Parallel Array Computations in Haskell http://www.cse.unsw.edu.au/~chak/papers/gpugen.pdf
As far as I could look for on Hackage and on Chakravarty's web site the code from the paper isn't released. So now he has DPH, Data.Array.Accelerate and GPU monad for array processing :). I wonder if he has any plans of gluing things? Thanks for the tip! -- Felipe.

Felipe Lessa:
On Wed, Feb 03, 2010 at 11:37:09AM -0600, Donnie Jones wrote:
Hello Felipe,
I copied this email to Sean Lee & Manuel M T Chakravarty as they worked on Haskell+CUDA, maybe they can comment on the current status?
Here's their paper... GPU Kernels as Data-Parallel Array Computations in Haskell http://www.cse.unsw.edu.au/~chak/papers/gpugen.pdf
As far as I could look for on Hackage and on Chakravarty's web site the code from the paper isn't released. So now he has DPH, Data.Array.Accelerate and GPU monad for array processing :).
It's really only two things, as the GPU monad from the cited paper has been superseded by Data.Array.Accelerate — ie, the latter is a revision of the former. So, the code from the cited paper will eventually be released as a CUDA backend for D.A.Accelerate.
I wonder if he has any plans of gluing things?
Our intention is to bring the two together eventually, but at the moment, each project on its own is already rather challenging. As far as http://hackage.haskell.org/trac/summer-of-code/ticket/1537 is concerned, I think it is a huge amount of work, well beyond what even a group GSoC project could achieve, especially as it is not just an implementation project, but requires a large amount of research. Things may get a bit easier with the recently announced Fermi architecture, but I don't think that is going to change the picture fundamentally. I would suggest that any GSoC project in this space should be based on D.A.Accelerate (rather than DPH), simply because the code base is much smaller and more accessible. There is not much point in writing a CUDA backend, as we already have a partially working one that we are going to release in due course. However, I repeatedly had people asking for an OpenCL backend. So, there appears to be some demand for that (and it's the right thing to do, given that CUDA is tied to a single company). An OpenCL backend for D.A.Accelerate also appears to be in the scope of what a good coder can achieve in the timeframe of a GSoC project. (To be precise, I think, a good coder can implement a working backend in that timeframe, but it will probably require more work to generate well optimised code.) Manuel

On Thu, Feb 04, 2010 at 11:31:53AM +1100, Manuel M T Chakravarty wrote:
It's really only two things, as the GPU monad from the cited paper has been superseded by Data.Array.Accelerate — ie, the latter is a revision of the former. So, the code from the cited paper will eventually be released as a CUDA backend for D.A.Accelerate.
Oops, thanks for clarifying things up!
I would suggest that any GSoC project in this space should be based on D.A.Accelerate (rather than DPH), simply because the code base is much smaller and more accessible. There is not much point in writing a CUDA backend, as we already have a partially working one that we are going to release in due course. However, I repeatedly had people asking for an OpenCL backend. So, there appears to be some demand for that (and it's the right thing to do, given that CUDA is tied to a single company). An OpenCL backend for D.A.Accelerate also appears to be in the scope of what a good coder can achieve in the timeframe of a GSoC project. (To be precise, I think, a good coder can implement a working backend in that timeframe, but it will probably require more work to generate well optimised code.)
Thanks, that's very interesting. What about an LLVM backend, would it be useful? Perhaps it would be possible to use its vector operations to use SIMD instructions of modern CPUs (I think GHC isn't there yet, right?). This is just a thought :). Cheers! -- Felipe.

Felipe Lessa:
I would suggest that any GSoC project in this space should be based on D.A.Accelerate (rather than DPH), simply because the code base is much smaller and more accessible. There is not much point in writing a CUDA backend, as we already have a partially working one that we are going to release in due course. However, I repeatedly had people asking for an OpenCL backend. So, there appears to be some demand for that (and it's the right thing to do, given that CUDA is tied to a single company). An OpenCL backend for D.A.Accelerate also appears to be in the scope of what a good coder can achieve in the timeframe of a GSoC project. (To be precise, I think, a good coder can implement a working backend in that timeframe, but it will probably require more work to generate well optimised code.)
Thanks, that's very interesting. What about an LLVM backend, would it be useful? Perhaps it would be possible to use its vector operations to use SIMD instructions of modern CPUs (I think GHC isn't there yet, right?). This is just a thought :).
I'm currently implementing an LLVM backend. I'm not planning on using SIMD instructions in the first version, but it is an interesting idea for when a basic LLVM works. Manuel

Are you also planning a LLVM backend for ghc, in a general sense, or just
for the accelerated work you're doing? It seems to me that ghc itself could
be well served with a LLVM backend, especially if one relies on the JIT
mode. That could help identify code paths in the core and runtime that are
infrequently used, further optimizing ghc's overall performance.
That's the gist behind Latter's work to accelerate the OpenGL stack on Mac
OS. The LLVM JIT determines which code paths are actually taken, generating
code for only those paths.
-scooter
PS: I do stand behind my assertion regarding the Cell. I'm the CellSPU
backend hacker for LLVM and I've pretty much stopped work on it because IBM
has more or less killed the chip.
On Thu, Feb 4, 2010 at 3:07 PM, Manuel M T Chakravarty wrote: Felipe Lessa: I would suggest that any GSoC project in this space should be based
on D.A.Accelerate (rather than DPH), simply because the code base is
much smaller and more accessible. There is not much point in
writing a CUDA backend, as we already have a partially working one
that we are going to release in due course. However, I repeatedly
had people asking for an OpenCL backend. So, there appears to be
some demand for that (and it's the right thing to do, given that
CUDA is tied to a single company). An OpenCL backend for
D.A.Accelerate also appears to be in the scope of what a good coder
can achieve in the timeframe of a GSoC project. (To be precise, I
think, a good coder can implement a working backend in that
timeframe, but it will probably require more work to generate well
optimised code.) Thanks, that's very interesting. What about an LLVM backend, would it
be useful? Perhaps it would be possible to use its vector operations
to use SIMD instructions of modern CPUs (I think GHC isn't there yet,
right?). This is just a thought :). I'm currently implementing an LLVM backend. I'm not planning on using SIMD
instructions in the first version, but it is an interesting idea for when a
basic LLVM works. Manuel _______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Scott Michel wrote,
Are you also planning a LLVM backend for ghc, in a general sense, or just for the accelerated work you're doing? It seems to me that ghc itself could be well served with a LLVM backend, especially if one relies on the JIT mode. That could help identify code paths in the core and runtime that are infrequently used, further optimizing ghc's overall performance.
I had a student implementing a LLVM backend for GHC last year. You can find the details at http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf We are planning to merge this work into the main GHC repository. (At the moment, this is not using the JIT, but that would be another interesting project.) Manuel

Hello Manuel, Monday, February 8, 2010, 4:21:59 AM, you wrote:
I had a student implementing a LLVM backend for GHC last year. You can find the details at
what's the status of port? can it compile full-scale programs, like darcs/ghc? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (8)
-
Brad Larsen
-
Bulat Ziganshin
-
Donnie Jones
-
Felipe Lessa
-
Jeff Heard
-
Manuel M T Chakravarty
-
scooter.phd@gmail.com
-
Scott Michel