Re: Re[4]: [Haskell-cafe] Newbie question about tuples

Thanks Bulat, but now you scattered my hopes that GHC would magically do all these optimizations for me ;-) I must say that although the performance of Haskell is not really a concern to me, I was a bit disappointed that even with all the tricks of the state monad, unboxing, and no-bounds-check, the matrix-vector multiplication was still 7 to 8 times slower than the C version. And at the end of the paper, it's only a factor 4 slower. Okay, going from 300x slower to 4x slower is impressive, but why is it *still* 4x slower? It would be interesting to compare the assembly code generated by the C compiler versus the GHC compiler; after all, we're just talking about a vector/matrix multiplication, which is just a couple of lines of assembly code... And now I'm again talking about performance, nooo! ;-)

bf3:
Thanks Bulat, but now you scattered my hopes that GHC would magically do all these optimizations for me ;-)
I must say that although the performance of Haskell is not really a concern to me, I was a bit disappointed that even with all the tricks of the state monad, unboxing, and no-bounds-check, the matrix-vector multiplication was still 7 to 8 times slower than the C version. And at the end of the paper, it's only a factor 4 slower. Okay, going from 300x slower to 4x slower is impressive, but why is it *still* 4x slower? It would be interesting to compare the assembly code generated by the C compiler versus the GHC compiler; after all, we're just talking about a vector/matrix multiplication, which is just a couple of lines of assembly code... And now I'm again talking about performance, nooo! ;-)
Yeah, there's some known low level issues in the code generator regarding heap and stack checks inside loops, and the use of registers on x86. But note this updated paper, http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html Add another core to your machine and it is no longer 4x slower :) Add 15 more cores and its really no longer 4x slower :) -- Don

Donald:
Yeah, there's some known low level issues in the code generator regarding heap and stack checks inside loops, and the use of registers on x86.
But note this updated paper, http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html
Add another core to your machine and it is no longer 4x slower :) Add 15 more cores and its really no longer 4x slower :
Maybe this is yet another newbie stupid question, but do you mean that GHC does automatic multi-threading? (Haskell seems very suitable for that) Otherwise adding an extra core does not really help does it?

bf3:
Donald:
Yeah, there's some known low level issues in the code generator regarding heap and stack checks inside loops, and the use of registers on x86.
But note this updated paper, http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html
Add another core to your machine and it is no longer 4x slower :) Add 15 more cores and its really no longer 4x slower :
Maybe this is yet another newbie stupid question, but do you mean that GHC does automatic multi-threading? (Haskell seems very suitable for that) Otherwise adding an extra core does not really help does it?
No, though that would be nice! You do have to program in a parallel manner, either by using forkIO, Control.Parallel, or parallel arrays. When you do, you have the option of such code scaling up to more cores relatively easily. My advice: starting writing threaded code now, with *lots* of threads, so your program will have the ability to start using +RTS -N16 when you get a new machine :) -- Don

Donald Bruce Stewart wrote:
bf3:
Maybe this is yet another newbie stupid question, but do you mean that GHC does automatic multi-threading? (Haskell seems very suitable for that) Otherwise adding an extra core does not really help does it?
No, though that would be nice! You do have to program in a parallel manner, either by using forkIO, Control.Parallel, or parallel arrays. When you do, you have the option of such code scaling up to more cores relatively easily.
My advice: starting writing threaded code now, with *lots* of threads, so your program will have the ability to start using +RTS -N16 when you get a new machine :)
I read somewhere that GHC's SMP support has been "tested up to 40 cores". Pray tell me, what the heck kind of machine has 40 cores? (And where can I buy mine from?? :-D LOL!) Writing parallel code is one of those things I keep meaning to try, but never actually get around to doing for real...

andrewcoppin:
Donald Bruce Stewart wrote:
bf3:
Maybe this is yet another newbie stupid question, but do you mean that GHC does automatic multi-threading? (Haskell seems very suitable for that) Otherwise adding an extra core does not really help does it?
No, though that would be nice! You do have to program in a parallel manner, either by using forkIO, Control.Parallel, or parallel arrays. When you do, you have the option of such code scaling up to more cores relatively easily.
My advice: starting writing threaded code now, with *lots* of threads, so your program will have the ability to start using +RTS -N16 when you get a new machine :)
I read somewhere that GHC's SMP support has been "tested up to 40 cores".
Pray tell me, what the heck kind of machine has 40 cores? (And where can I buy mine from?? :-D LOL!)
40 cpus. It's a midrange Sun Fire server, something like this one http://www.sun.com/servers/midrange/sunfire_e6900/index.xml You'll need more than spare change to get started though. *However* 8 core amd64 machines are practically commodity boxes now. Go get one. -- Don

Donald Bruce Stewart wrote:
andrewcoppin:
I read somewhere that GHC's SMP support has been "tested up to 40 cores".
Pray tell me, what the heck kind of machine has 40 cores? (And where can I buy mine from?? :-D LOL!)
40 cpus.
It's a midrange Sun Fire server, something like this one
http://www.sun.com/servers/midrange/sunfire_e6900/index.xml
You'll need more than spare change to get started though.
o_O *dies* ...which gets the question "where did *you* get one?!"
*However* 8 core amd64 machines are practically commodity boxes now. Go get one.
I'm currently sitting here typing on a 2-core AMD64 box. ;-) However, it seems socket-939 is history now, so... Besides, all the benchmarks seem to say Intel's Core 2 Duo is the faster product. Currently. But then, if I could figure out how to use my GPU... Not fantastically relevant, but... the makers of the Persistence of Vision Ray Tracer are currently working on a new multi-threaded beta. It's taken them *months*. AFAIK, it's written in C, and they had to spend forever removing global variables and whatnot. Huge internal restructuring to make it work properly. I almost want to sit down and code something in Haskell and see how many times slower it is... ;-) [The issue with that being 1. I can't figure out a really good set of abstractions, and 2. the type system hates me. Oh, and 3. it would have to save files in PPM format, because I can't figure out how to do bitmapped graphics or PNG writing in Haskell...]

Add another core to your machine and it is no longer 4x slower :) Add 15 more cores and its really no longer 4x slower :
Maybe this is yet another newbie stupid question, but do you mean that GHC does automatic multi-threading? (Haskell seems very suitable for
No, though that would be nice! You do have to program in a parallel
Yes, of course, I've been writing multi-threaded and SMP programs all the time using C++ but then a C/C++ implementation would still beat the Haskell one regarding raw performance :) I might be easier to do in Haskell though, although I guess the "easier" (like software transactional memory) also comes with a price. Peter

Donald Bruce Stewart wrote:
Yeah, there's some known low level issues in the code generator regarding heap and stack checks inside loops, and the use of registers on x86.
But note this updated paper, http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html
Add another core to your machine and it is no longer 4x slower :) Add 15 more cores and its really no longer 4x slower :)
I really love reading all these Haskell performance papers. :-D I love knowing about all the cool and clever stuff that gets put into GHC to make the code it produces ever faster. But... when your program runs really slowly, it's still not much comfort. :-( I can't wait to see what stream fusion is going to do to all my list and string processing code...
participants (3)
-
Andrew Coppin
-
dons@cse.unsw.edu.au
-
Peter Verswyvelen