Haskell on the Playstation 3? :-)

Game developers are really struggling to get performance out of the Playstation 3 console. This console has a single PowerPC CPU with 6 Cell SPU coprocessors, all running at 3.3GHz. These SPUs have 256KB very high speed local RAM, and data from the 512MB main memory can stream in and out via DMA into these SPUs. The problem is that with imperative approaches this is a nightmare to manage.. It would be a very cool project to show that Haskell could run on such a platform, making it easier to take advance of its awesome power J Oh well, I'm just brainstorming here. Cheers, Peter BTW: This Cell platform also seemed to be offered for scientific computing in standard workstation PCs, so it would not just be for the Playstation 3.

It would be a very cool project to show that Haskell could run on such a platform, making it easier to take advance of its awesome power J
It's funny. But 5 minutes ago I was thinking: did anyone compiled haskell application for Palm (m68k and/or Arm) that runs on Palm OS? I can literally quote you: "It would be a very cool project to show that Haskell could run on such a platform". Ofcourse this is completly other way around in terms of power and memory. :) But anyway, did anyone do it? Cheers, Radek. -- Codeside: http://codeside.org/ Przedszkole Miejskie nr 86 w Lodzi: http://www.pm86.pl/

At Sun, 26 Aug 2007 00:19:25 +0200, =?UTF-8?Q?Rados=C5=82aw_Grzanka?= wrote:
It's funny. But 5 minutes ago I was thinking: did anyone compiled haskell application for Palm (m68k and/or Arm) that runs on Palm OS?
I have looked into doing this in the past. Historically speaking, the first obstacle is all the Haskell compilers I looked at required libgmp -- which I could not get to compile for PalmOS. libgmp was used primarily (entirely?) for supporting the Integer data type. I believe there is now one (or more?) pure Haskell implementations of Integer -- so, it should be easier to remove the libgmp dependency. I also heard wild rumors that a future version of GHC might have backend that generates pure ANSI C. In theory, that should make it much easier to target PalmOS. I have never written anything for PalmOS, aside from some scheme programs using LispMe, but I think it might be a pretty unusual platform. For example, it seems like applications can not access for than 4k of ram without some hackery. Another option would be to port the yhi bytecode interpreter to run on PalmOS. I tried this, but I ran into three problems: 1. libgmp dependency 2. build system requires Python (scons). 3. I 'upgraded' to a Nokia 770, and suddenly did not care about PalmOS There is also an old project to port nhc98 to PalmOS -- not sure if it is still active, or how far they got. AFAIK, nothing was ever released. If PalmOS is really un-POSIX compatible, it may be easier to write a custom compiler that compiles YHC or GHC Core to PalmOS. Well, the first time you try to write a compiler from Core -> ??? is difficult, but the second time around is a lot easier ;) j.

Hi
Another option would be to port the yhi bytecode interpreter to run on PalmOS. I tried this, but I ran into three problems:
1. libgmp dependency
This is no longer an issue, we now have a flag to not require libgmp, which makes type Integer = Int
2. build system requires Python (scons).
Still alas, but we'd like to fix it.
If PalmOS is really un-POSIX compatible, it may be easier to write a custom compiler that compiles YHC or GHC Core to PalmOS. Well, the first time you try to write a compiler from Core -> ??? is difficult, but the second time around is a lot easier ;)
We have compilers from Yhc Core to everything nowadays. One to Lisp shouldn't be too tricky, if someone wanted to take that direction. Thanks Neil

On 26/08/2007, at 10:07 AM, Jeremy Shaw wrote:
There is also an old project to port nhc98 to PalmOS -- not sure if it is still active, or how far they got. AFAIK, nothing was ever released.
Yes, we were working on this at Macquarie Uni. The project has suffered a bit from a lack of resources. A while back we shifted our attention from nhc to yhc, since that seems to be a more likely base for success. We have a preliminary port of the yhc runtime which a student developed. It runs some simple programs on Palm OS, but it needs work to be usable by others. I expect to find time to devote to this later in the year, although we will also be evaluating whether putting effort into Palm OS is worth it any more, given its uncertain future. cheers, Tony Sloane

2007/8/30, Tony Sloane
On 26/08/2007, at 10:07 AM, Jeremy Shaw wrote:
There is also an old project to port nhc98 to PalmOS -- not sure if it is still active, or how far they got. AFAIK, nothing was ever released.
Yes, we were working on this at Macquarie Uni. The project has suffered a bit from a lack of resources. A while back we shifted our attention from nhc to yhc, since that seems to be a more likely base for success.
We have a preliminary port of the yhc runtime which a student developed. It runs some simple programs on Palm OS, but it needs work to be usable by others.
This is really cool. I'd love to write some Haskell for Palm. :)
I expect to find time to devote to this later in the year, although we will also be evaluating whether putting effort into Palm OS is worth it any more, given its uncertain future.
Yeah. This is actually a problem. Palm OS is rather doomed to be obsolete and Pocket PC is probably better target. Anyway, does anyone else experience a feeling that at the time of buying yourself new gadget you are already in "deprecated zone"? ;) Cheers, Radek. -- Codeside: http://codeside.org/ Przedszkole Miejskie nr 86 w Lodzi: http://www.pm86.pl/

On Aug 30, 2007, at 2:34 , Radosław Grzanka wrote:
obsolete and Pocket PC is probably better target. Anyway, does anyone else experience a feeling that at the time of buying yourself new gadget you are already in "deprecated zone"? ;)
I've been feeling that way since 1982.... -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On 8/26/07, Peter Verswyvelen
Game developers are really struggling to get performance out of the Playstation 3 console. This console has a single PowerPC CPU with 6 Cell SPU coprocessors, all running at 3.3GHz. These SPUs have 256KB very high speed local RAM, and data from the 512MB main memory can stream in and out via DMA into these SPUs. The problem is that with imperative approaches this is a nightmare to manage….
It would be a very cool project to show that Haskell could run on such a platform, making it easier to take advance of its awesome power J
Cool idea :-) Hmmm, random thought along similar lines, I mean I know the answer to this thought is no, but I'm curious: could we get Haskell to run on a graphics card??? I mean, I'm guessing the answer is no (not a difficult guess ;-) ), but curious what it would take to make a graphics card able to run Haskell?

On Thu, 2007-08-30 at 11:34 +0800, Hugh Perkins wrote:
On 8/26/07, Peter Verswyvelen
wrote: Game developers are really struggling to get performance out of the Playstation 3 console. This console has a single PowerPC CPU with 6 Cell SPU coprocessors, all running at 3.3GHz. These SPUs have 256KB very high speed local RAM, and data from the 512MB main memory can stream in and out via DMA into these SPUs. The problem is that with imperative approaches this is a nightmare to manage….
It would be a very cool project to show that Haskell could run on such a platform, making it easier to take advance of its awesome power J
Cool idea :-)
Hmmm, random thought along similar lines, I mean I know the answer to this thought is no, but I'm curious: could we get Haskell to run on a graphics card??? I mean, I'm guessing the answer is no (not a difficult guess ;-) ), but curious what it would take to make a graphics card able to run Haskell?
Either the language of the graphics card is Turing complete and the answer is yes or the language isn't and the answer is no.

On 8/30/07, Derek Elkins
Either the language of the graphics card is Turing complete and the answer is yes or the language isn't and the answer is no.
Well, a quick google for "are graphics cards turing complete?" suggests that some nVidia cards are Turing complete, but I couldnt find any reliable source (eg nvidia.com) to confirm this, and I dont know enough either to figure it out for myself, or even to confirm that this is both a necessary and sufficient condition for running Haskell. Related question: assuming temporarily that it is possible theoretically to get Haskell to run on a graphics card, to what extent would this actually be useful?

On 8/29/07, Hugh Perkins
Well, a quick google for "are graphics cards turing complete?" suggests that some nVidia cards are Turing complete
http://developer.nvidia.com/object/cuda.html It's a C compiler with multiprocessor streaming extensions that targets nvidia cards. But it's not that simple... -- Dan

Dan Piponi
http://developer.nvidia.com/object/cuda.html
It's a C compiler with multiprocessor streaming extensions that targets nvidia cards.
Whoa :-O Cool :-)
But it's not that simple...
Few things are ;-) Whats the catch? Can we use a graphics-card as an n-core machine, where n >= 1024?

Hello Hugh, Thursday, August 30, 2007, 11:01:02 AM, you wrote:
But it's not that simple...
Few things are ;-) Whats the catch? Can we use a graphics-card as an n-core machine, where n >= 1024?
no. it's more like 8-16 cores with 64-element SSE instructions http://developer.download.nvidia.com/compute/cuda/0_8/NVIDIA_CUDA_Programmin... -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

So, according to the blurb, and since this is product-specific, I dont know if this is allowed on the newsgroup?, but it does seem to be a fairly unique product? : - this technology works on GeGForce 8800 cards or better - there's a dedicated processing unit available called the "Tesla", which is available in a standalone version that you can plug into a PCI port, or in various pre-built 1U servers. The Tesla standalone card: - 128 thread processor - 518 gigaflops - 1.5 GB dedicated memory - Fits in one full-length, dual slot with one open PCI Express x16 slot - Retails for about 1500usd (so, I can afford one, I guess) For anyone who's counting, some sources put the computational capacity of the human brain at around 10petaflops, so 20,000 of these processors ought to be approaching that. (At a cost of 30 million dollars ;-) ) Anyway, 128 threads, if it is true, sounds fun. It's not 1024-threads, but it's decent, and it's more than the 32-threads where automatic threading is rumoured to bottom out.

Hello Hugh, Thursday, August 30, 2007, 2:46:51 PM, you wrote:
- this technology works on GeGForce 8800 cards or better
afaik, on any 8xxx cards - the only difference is number of threads
- 128 thread processor
it's the same as 8800GTX. please read CUDA manual first. these 128 threads are not independent, each 8 or 16 threads execute the same code -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 8/30/07, Bulat Ziganshin
it's the same as 8800GTX. please read CUDA manual first. these 128 threads are not independent, each 8 or 16 threads execute the same code
Hmmmm, yes you are right. The GPU contains 8 "multiprocessors", where each multiprocessor contains multiple processors that execute the same code at the same time ("data parallel"). There are 8 processors in each multiprocessor unit, which run at twice the clock speed, so in one clock cycle they can execute 16 threads; and in two clock cycles they can execute all 32 threads of a "warp" (group of threads running the same code). Sooo.... kindof an interesting architecture. To what extent do we think that this is representative of future "general-purpose" multi-core architectures? Looking at Haskell parallelization, things like maps, and folds of associative functions can be split across the threads of a single warp. On the other hand, things like independent lets that we want to run in parallel would need to be assigned to independent warps. On the whole, maps and folds may constitute the bulk of what we are trying to parallelize (certainly, SPJ's NDP focuses extensively on maps), so this is probably broadly compatible with the CUDA architecture?

On 8/30/07, Hugh Perkins
On the whole, maps and folds may constitute the bulk of what we are trying to parallelize (certainly, SPJ's NDP focuses extensively on maps), so this is probably broadly compatible with the CUDA architecture?
Right. But the functions and data that we are trying to map and fold could be anything, so we are required to have the full functionality of Haskell running on the GPU - unless the compiler can smartly figure out what should run on the GPU and what shouldn't. All in all, this could be a fairly ambitious project. Another, more modest, approach would be to define a DSL, maybe along the lines of what Lennart Augustsson has been doing on his blog (http://augustss.blogspot.com/), and implement a compiler back end that generates GPU code from the DSL. Something similar for C++ is Michael McCool's Sh library (www.csee.umbc.edu/~olano/s2005c37/ch07.pdf) which has now developed into a more general purpose commercial product. It seems to me that this could be a killer application for Haskell without a major rewrite of the Haskell compiler. What's more, the same DSL could have different back ends targeting GPUs, multiple cores or even just single CPUs (where you'd still get the benefits of partial evaluation). -- Dan

Although I'm sure a lot can be done on modern GPU's (especially the DirectX
10 cards = Nvidia 8x00, that can write back to main memory, called "geometry
shaders"), a Playstation 3 runs Linux, doesn't cost a lot, and it has 7 CPUs
running at 3+ GHz, and 6 of these have parallel vector processing
capacities. I think it should be easier (but far from easy) to implement
Haskell on a PS3 then it is on a GPU. I developed imperative software for
both, but not in depth, but to me GPUs are still too much oriented towards
graphics, whilest the PS3 CPUs are more general purpose (although they also
expect to process streams of data).
See http://cell.scei.co.jp/e_download.html
-----Original Message-----
From: haskell-cafe-bounces@haskell.org
[mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Dan Piponi
Sent: Thursday, August 30, 2007 6:05 PM
To: haskell-cafe@haskell.org
Subject: Re: Re[4]: [Haskell-cafe] Haskell on the Playstation 3? :-)
On 8/30/07, Hugh Perkins
On the whole, maps and folds may constitute the bulk of what we are trying to parallelize (certainly, SPJ's NDP focuses extensively on maps), so this is probably broadly compatible with the CUDA architecture?
Right. But the functions and data that we are trying to map and fold could be anything, so we are required to have the full functionality of Haskell running on the GPU - unless the compiler can smartly figure out what should run on the GPU and what shouldn't. All in all, this could be a fairly ambitious project. Another, more modest, approach would be to define a DSL, maybe along the lines of what Lennart Augustsson has been doing on his blog (http://augustss.blogspot.com/), and implement a compiler back end that generates GPU code from the DSL. Something similar for C++ is Michael McCool's Sh library (www.csee.umbc.edu/~olano/s2005c37/ch07.pdf) which has now developed into a more general purpose commercial product. It seems to me that this could be a killer application for Haskell without a major rewrite of the Haskell compiler. What's more, the same DSL could have different back ends targeting GPUs, multiple cores or even just single CPUs (where you'd still get the benefits of partial evaluation). -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 8/31/07, Dan Piponi
Right. But the functions and data that we are trying to map and fold could be anything, so we are required to have the full functionality of Haskell running on the GPU - unless the compiler can smartly figure out what should run on the GPU and what shouldn't. All in all, this could be a fairly ambitious project.
Well, yes, I didnt say it would be easy ;-) It could be useful to make a ballpark estimate of what kind of performance we'd get running Haskell on a GPU, compared to running on a CPU. Any ideas on how to do this? By the way, is there some reason we couldnt give the entire haskell program to every thread, and simply pass in something roughly equivalent to the program counter in with the data?

On Aug 29, 2007, at 23:34 , Hugh Perkins wrote:
Hmmm, random thought along similar lines, I mean I know the answer to this thought is no, but I'm curious: could we get Haskell to run on a graphics card???
I thought someone had done that recently as a graduate thesis. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (10)
-
Brandon S. Allbery KF8NH
-
Bulat Ziganshin
-
Dan Piponi
-
Derek Elkins
-
Hugh Perkins
-
Jeremy Shaw
-
Neil Mitchell
-
Peter Verswyvelen
-
Radosław Grzanka
-
Tony Sloane