Re: [Haskell] Google Summer of Code

Hi Bulat, I know you've talked about performance in the past, and I don't want to start a huge argument, but do you have recent data to back this up? IIRC you're using ghc 6.6, yes? I haven't looked at H.264 (and I realize it's compressed, so the situation is different from my work), however ghc can generate very fast code for binary I/O. Check out (shameless self-promotion) http://johnlato.blogspot.com for my recent writeup on creating a high-performance, pure-Haskell, Iteratee-based WAVE file reader. Sincerely, John Lato
anyway it's impossible due to slow code generated by ghc
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello John, Wednesday, February 11, 2009, 8:54:35 PM, you wrote:
I know you've talked about performance in the past, and I don't want to start a huge argument, but do you have recent data to back this up? IIRC you're using ghc 6.6, yes?
i don't seen examples of high-performance code written by anyone else which would be as simple and as fast as C analogue. afaik ghc now can generate good code for tight loops, but gcc optimizations goes far beyond this. if you know specialists writing HP code in haskell and results of their work - please point us to these code
I haven't looked at H.264 (and I realize it's compressed, so the situation is different from my work), however ghc can generate very fast code for binary I/O
it's exactly example of tight loop. and let's compare HP code written for this task with analogous code written in C. i expect that haskell code is much more complex
. Check out (shameless self-promotion) http://johnlato.blogspot.com for my recent writeup on creating a high-performance, pure-Haskell, Iteratee-based WAVE file reader.
afaiu, it's 20-line equivalent of 2-line C code: for (i=...) a[i] = b[i] does this need any more comments? unfortunately, your post doesn't contain equivalent C code and its performance measurements, so i can't consider this as argument for ghc/gcc comparison. all that i see there is large complex code that proves that HP haskell programming is much more complex than C one
Sincerely, John Lato
anyway it's impossible due to slow code generated by ghc
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Wed, Feb 11, 2009 at 8:05 PM, Bulat Ziganshin
Hello John,
Wednesday, February 11, 2009, 8:54:35 PM, you wrote:
I know you've talked about performance in the past, and I don't want to start a huge argument, but do you have recent data to back this up? IIRC you're using ghc 6.6, yes?
i don't seen examples of high-performance code written by anyone else which would be as simple and as fast as C analogue. afaik ghc now can generate good code for tight loops, but gcc optimizations goes far beyond this. if you know specialists writing HP code in haskell and results of their work - please point us to these code
I haven't looked at H.264 (and I realize it's compressed, so the situation is different from my work), however ghc can generate very fast code for binary I/O
it's exactly example of tight loop. and let's compare HP code written for this task with analogous code written in C. i expect that haskell code is much more complex
I think it's fair to point out that tight loops are nearly always the biggest bottlenecks of code, so generating good code for tight loops is pretty important. And ghc is still making large improvements with each release, whereas gcc isn't likely to get significantly better.
. Check out (shameless self-promotion) http://johnlato.blogspot.com for my recent writeup on creating a high-performance, pure-Haskell, Iteratee-based WAVE file reader.
afaiu, it's 20-line equivalent of 2-line C code:
for (i=...) a[i] = b[i]
does this need any more comments?
I think you've misunderstood my code. Look at Oleg's IterateeM and see if you think that's really all it's doing.
unfortunately, your post doesn't contain equivalent C code and its performance measurements, so i can't consider this as argument for ghc/gcc comparison. all that i see there is large complex code that proves that HP haskell programming is much more complex than C one
Use libsndfile for comparison. http://www.mega-nerd.com/libsndfile/. I actually haven't looked at the code, although it's very highly regarded in the audio community (and I've seen the author post on this list on occasion). Using libsndfile-1.0.18: wc wav.c 1786 7833 57922 wav.c compared to my source: wc Wave.hs 412 2215 15472 Wave.hs And there you are. I will admit that I have implemented the entire wave spec, but only because of lack of time. Cheers, John

Hello John, Wednesday, February 11, 2009, 11:55:47 PM, you wrote:
it's exactly example of tight loop. and let's compare HP code written for this task with analogous code written in C. i expect that haskell code is much more complex
I think it's fair to point out that tight loops are nearly always the biggest bottlenecks of code, so generating good code for tight loops is pretty important.
it's important, but doesnt't make whole game. and while i said that ghc improved tight loops compilation, this doesn't mean that it becomes the same as in gcc. it just started to put loop variables into register
And ghc is still making large improvements with each release, whereas gcc isn't likely to get significantly better.
yes, it's close to perfect
afaiu, it's 20-line equivalent of 2-line C code:
for (i=...) a[i] = b[i]
does this need any more comments?
I think you've misunderstood my code. Look at Oleg's IterateeM and see if you think that's really all it's doing.
what else does the code that you've citated? you are wrote that it just copies 16-bit words into doubles
Use libsndfile for comparison. http://www.mega-nerd.com/libsndfile/.
it's one method of miscomparing haskell to C - compare hand-tuned haskell code with some C code which may be just not optimal. ig you want to make fair comparison, you should write best code in both languages
I actually haven't looked at the code, although it's very highly regarded in the audio community (and I've seen the author post on this list on occasion). Using libsndfile-1.0.18: wc wav.c 1786 7833 57922 wav.c
compared to my source: wc Wave.hs 412 2215 15472 Wave.hs
And there you are. I will admit that I have implemented the entire wave spec, but only because of lack of time.
when you don't need speed, you may write more compact code in haskell than in C. so the best way is to split your task into speed-critical part and the rest and use C++ for the first and Haskell for the second -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello John,
Wednesday, February 11, 2009, 11:55:47 PM, you wrote:
it's exactly example of tight loop. and let's compare HP code written for this task with analogous code written in C. i expect that haskell code is much more complex
I think it's fair to point out that tight loops are nearly always the biggest bottlenecks of code, so generating good code for tight loops is pretty important.
it's important, but doesnt't make whole game. and while i said that ghc improved tight loops compilation, this doesn't mean that it becomes the same as in gcc. it just started to put loop variables into register
And ghc is still making large improvements with each release, whereas gcc isn't likely to get significantly better.
yes, it's close to perfect
afaiu, it's 20-line equivalent of 2-line C code:
for (i=...) a[i] = b[i]
does this need any more comments?
I think you've misunderstood my code. Look at Oleg's IterateeM and see if you think that's really all it's doing.
what else does the code that you've citated? you are wrote that it just copies 16-bit words into doubles
Use libsndfile for comparison. http://www.mega-nerd.com/libsndfile/.
it's one method of miscomparing haskell to C - compare hand-tuned haskell code with some C code which may be just not optimal. ig you want to make fair comparison, you should write best code in both languages
I actually haven't looked at the code, although it's very highly regarded in the audio community (and I've seen the author post on this list on occasion). Using libsndfile-1.0.18: wc wav.c 1786 7833 57922 wav.c
compared to my source: wc Wave.hs 412 2215 15472 Wave.hs
And there you are. I will admit that I have implemented the entire wave spec, but only because of lack of time.
when you don't need speed, you may write more compact code in haskell than in C. so the best way is to split your task into speed-critical part and the rest and use C++ for the first and Haskell for the second
I think what's frustrating about this continued dialogue with Bulat re. performance is that , a) the experience he bases his remarks upon was several year ago b) he's making blanket generic statements, using that old data d) a lot of people have written a lot of fast code without trouble c) he's not acknowledging the great improvements over this time So its very difficult to have these conversations. They're stuck in the same old pattern. Meanwhile, GHC keeps getting smarter and smarter. Bulat: time to update your results! Check out what GHC is doing these days, and come back with an analysis of what still needs to be improved. We can't wait to hear! -- Don

Hello Don, Thursday, February 12, 2009, 12:23:16 AM, you wrote:
Check out what GHC is doing these days, and come back with an analysis of what still needs to be improved. We can't wait to hear!
can you point me to any haskell code that is as fast as it's C equivalent? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Don,
Thursday, February 12, 2009, 12:23:16 AM, you wrote:
Check out what GHC is doing these days, and come back with an analysis of what still needs to be improved. We can't wait to hear!
can you point me to any haskell code that is as fast as it's C equivalent?
You should do your own benchmarking! -- Don

Hello Don, Thursday, February 12, 2009, 3:45:36 AM, you wrote:
You should do your own benchmarking!
well, when you say that ghc can generate code that is fast as gcc, i expect that you can supply some arguments. is the your only argument that ghc was improved in last years? :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

These seem to be good starting points:
http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploitin...
http://donsbot.wordpress.com/2008/06/04/haskell-as-fast-as-c-working-at-a-hi...
http://haskell.org/haskellwiki/Wc
On Wed, Feb 11, 2009 at 8:15 PM, Bulat Ziganshin
Hello Don,
Thursday, February 12, 2009, 3:45:36 AM, you wrote:
You should do your own benchmarking!
well, when you say that ghc can generate code that is fast as gcc, i expect that you can supply some arguments. is the your only argument that ghc was improved in last years? :)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

A: "X has some problems with runtime performance." B: "My work solves all your problems. There is no problem." "Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy" - Alan Perlis. can /= can be bothered. :) Ben. On 12/02/2009, at 5:26 PM, Daniel Peebles wrote:
These seem to be good starting points:
http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploitin... http://donsbot.wordpress.com/2008/06/04/haskell-as-fast-as-c-working-at-a-hi... http://haskell.org/haskellwiki/Wc
On Wed, Feb 11, 2009 at 8:15 PM, Bulat Ziganshin
wrote: Hello Don,
Thursday, February 12, 2009, 3:45:36 AM, you wrote:
You should do your own benchmarking!
well, when you say that ghc can generate code that is fast as gcc, i expect that you can supply some arguments. is the your only argument that ghc was improved in last years? :)

Check out what GHC is doing these days, and come back with an analysis of what still needs to be improved. We can't wait to hear! can you point me to any haskell code that is as fast as it's C equivalent? You should do your own benchmarking!
Please, folks! This is hardly helpful. It isn't difficult to find examples of fast Haskell code by Don, nor is it difficult to find benchmarking and analyses of slow GHC-generated code by Bulat. But expecting "your" readers to search for such fragments instead of providing direct references weakens your cases. In fact, I'm not aware of any documents that would make a coherent argument either way: - most of Don's examples focus on small bits of code where (a) it is still possible to associate core output with parts of the input (making the dire state of tool support less of an issue) and (b) few real-life awkwardnesses interfere with the optimization of pure algorithms (by which I mean code artifacts such as error handling etc confusing GHC's optimization phases). - most of Bulat's analyses refer to the via-C path, which used to be the fastest path, but appears to have an uncertain future (not that I'm aware of any claims, let alone documentation showing that the now standard compilation path in GHC is at least as good as via-C used to be). That summary is of course only based on the fragments I have chosen to find - if you want a fairer summary, you need to provide concrete and uptodate references!-) And if you don't want to be asked to repeat your argument every time this thread comes up, you need to present it in easily referenced form outside this mailing list. In particular, I would suggest that 1. performance issues are recorded as GHC trac tickets, with test cases, (there is an explicit ticket type for this) so that we could check easily whether any general GHC improvements have had any effect on which known bad cases. Note that performance isn't a high-priority issue per se at GHC HQ (due to manpower), unless it is a show-stopper or many users report it as important to them. Nevertheless, the issues should be recorded, so that we know where we stand. 2. performance wisdom is recorded on the Haskell/GHC wikis, or in tutorials/papers. There is a lot of information on small tricks already, but very little on what to do when those small tricks no longer suffice (eg, after one has found out that profiling lies, that core isn't easy to link back to source, when the code in question is not just pure and simple algorithm but riddled with error handling imported from libraries). Personally, I find the situation wrt performance of GHC-generated code very unsatisfactory, not because it isn't good (more often than not, it is), but because there is very little help when it isn't. And the first step towards improving that situation (eg, with tools helping to analyse core, or better understood profiling tools, or support for by-hand application of optimization transformation/undo/modify/retarget/replay..) is to acknowledge that there is a problem. Pointing out that some experienced hackers have been through this, and are now able to make do for themselves, is a non-solution (unless you want to secure consulting fees;-). Just as declarative languages allow us to ignore operational details until/unless they really are needed, offering a clear path on how to specify declarative aspects of our code, so there should be a clear path on how to specify operational details if that becomes necessary. A path that any mildly experienced Haskeller should be able to follow through good documentation, examples, and tools. Claus PS. Talking about "high-performance computing" seems slightly misleading in this context, if the only goal is to have code as fast as typical C. From what I've heard from HPC folks, generic compilers or generic C code are not tough benchmarks from their point of view, platform vendor compilers and platform-tuned Fortran code are. Of course, that info is second-hand, and somewhat dated, but I think it would help to state our goals explicitly, to avoid confusion or unreasonable expectations.

On Thu, Feb 12, 2009 at 8:11 AM, Bulat Ziganshin
Wednesday, February 11, 2009, 11:55:47 PM, you wrote:
And ghc is still making large improvements with each release, whereas gcc isn't likely to get significantly better.
yes, it's close to perfect
LOL!
participants (7)
-
Ben Lippmeier
-
Bulat Ziganshin
-
Claus Reinke
-
Daniel Peebles
-
Don Stewart
-
John Lato
-
Toby Hutton