
On 8/30/07, Hugh Perkins
On the whole, maps and folds may constitute the bulk of what we are trying to parallelize (certainly, SPJ's NDP focuses extensively on maps), so this is probably broadly compatible with the CUDA architecture?
Right. But the functions and data that we are trying to map and fold could be anything, so we are required to have the full functionality of Haskell running on the GPU - unless the compiler can smartly figure out what should run on the GPU and what shouldn't. All in all, this could be a fairly ambitious project. Another, more modest, approach would be to define a DSL, maybe along the lines of what Lennart Augustsson has been doing on his blog (http://augustss.blogspot.com/), and implement a compiler back end that generates GPU code from the DSL. Something similar for C++ is Michael McCool's Sh library (www.csee.umbc.edu/~olano/s2005c37/ch07.pdf) which has now developed into a more general purpose commercial product. It seems to me that this could be a killer application for Haskell without a major rewrite of the Haskell compiler. What's more, the same DSL could have different back ends targeting GPUs, multiple cores or even just single CPUs (where you'd still get the benefits of partial evaluation). -- Dan