Re[2]: [Haskell-cafe] Re: speed: ghc vs gcc

I am no longer sure that this conversation is producing useful information or a learning experience for any involved party, and suggest it ends. In the meantime, a brief summary: - Straightforward and simple Haskell code, written by an individual aware of issues with tail recursion and stream fusion, is frequently within 3x the speed of GCC code when compiled with appropriate optimizations in GHC. - When performance is an absolute necessity, Haskell code can sometimes be manually modified (e.g. with manual loop unrolls) to equal GCC in performance. - The amount of effort required for manually unrolling loops is greater than simply using standard GCC optimizations, and in practice few programmers will want to make this extra effort. For your everyday programmer, achieving that sort of performance is typically easier when writing in C. - If anybody wants a Haskell compiler to achieve GCC-level speeds while writing short and concise Haskell, it is preferable to add a ticket with examples already written out and exhaustively examined to minimize the effort Simon needs to make in working on the problem. GHC has had considerably fewer person-hours devoted to it than GCC, and only additional person-hours can improve it. - Even though it might not match with the amount of effort an average Haskell programmer would put into their code, it is *still* constructive to attempt to optimize Haskell code exhaustively and at a low level, if only to help us learn what sorts of transformations it would be useful for GHC to do automatically. This sort of information, I imagine, would be of significant use to Simon when attempting to improve GHC. Nobody is opining that everyday Haskell code will compile to code as speedy as GCC output. People *are* attempting to find out how Haskell code can be hacked at a low level to run as fast as GCC code, but not because they're claiming GHC output is as fast as GCC output. Rather, they're attempting to find out a) what sorts of approaches a really determined Haskell programmer can use to improve speed b) the sorts of optimizations a later version of GHC might be able to make, and c) what sort of code its optimizer might produce. This is not being reported as "fair Haskell vs C++ comparison" -- this is a collaborative effort to *improve* Haskell that was motivated by Bulat's original comparison. Can we move on? Louis Wasserman wasserman.louis@gmail.com

Hello Louis, Saturday, February 21, 2009, 4:16:10 AM, you wrote:
In the meantime, a brief summary:
a minor correction: the best gcc result shown in the thread was 50x faster than Don's one, so you need to miltiple all ratios by a factor of 50
Straightforward and simple Haskell code, written by an individual aware of issues with tail recursion and stream fusion, is frequently within 3x the speed of GCC code when compiled with appropriate optimizations in GHC.
yes, within 150x margin
When performance is an absolute necessity, Haskell code can sometimes be manually modified (e.g. with manual loop unrolls) to equal GCC in performance.
yes, to make it only 50x slower while being only 7 times larger (i mean source lines)
Can we move on?
yes, we can! :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Observation:
The best gcc result shown in the thread, if I recall, precomputed the result
of the full computation at compiletime and simply outputted it, when we
looked at the assembly.
While I will accept that this could be seen as an optimization GHC should
have made, I do not accept that this will be the case with most everyday
code a programmer writes, as most code is not used to simply compute
arithmetic constants.
For code that actively requires computation at runtime, I have seen no
examples of an instance where well-optimized GHC is actually dozens or
hundreds of times slower than GCC output.
Louis Wasserman
wasserman.louis@gmail.com
On Sat, Feb 21, 2009 at 5:21 PM, Bulat Ziganshin
Hello Louis,
Saturday, February 21, 2009, 4:16:10 AM, you wrote:
In the meantime, a brief summary:
a minor correction: the best gcc result shown in the thread was 50x faster than Don's one, so you need to miltiple all ratios by a factor of 50
Straightforward and simple Haskell code, written by an individual aware of issues with tail recursion and stream fusion, is frequently within 3x the speed of GCC code when compiled with appropriate optimizations in GHC.
yes, within 150x margin
When performance is an absolute necessity, Haskell code can sometimes be manually modified (e.g. with manual loop unrolls) to equal GCC in performance.
yes, to make it only 50x slower while being only 7 times larger (i mean source lines)
Can we move on?
yes, we can! :)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello Louis, Sunday, February 22, 2009, 2:30:23 AM, you wrote: yes, you are right. Don also compared results of 64x-reduced computation with full one. are you think that these results are more fair?
Observation:
The best gcc result shown in the thread, if I recall, precomputed the result of the full computation at compiletime and simply outputted it, when we looked at the assembly.
While I will accept that this could be seen as an optimization GHC should have made, I do not accept that this will be the case with most everyday code a programmer writes, as most code is not used to simply compute arithmetic constants.
For code that actively requires computation at runtime, I have seen no examples of an instance where well-optimized GHC is actually dozens or hundreds of times slower than GCC output.
Louis Wasserman wasserman.louis@gmail.com
On Sat, Feb 21, 2009 at 5:21 PM, Bulat Ziganshin
wrote: Hello Louis,
Saturday, February 21, 2009, 4:16:10 AM, you wrote:
In the meantime, a brief summary:
a minor correction: the best gcc result shown in the thread was 50x faster than Don's one, so you need to miltiple all ratios by a factor of 50
Straightforward and simple Haskell code, written by an individual aware of issues with tail recursion and stream fusion, is frequently within 3x the speed of GCC code when compiled with appropriate optimizations in GHC.
yes, within 150x margin
When performance is an absolute necessity, Haskell code can sometimes be manually modified (e.g. with manual loop unrolls) to equal GCC in performance.
yes, to make it only 50x slower while being only 7 times larger (i mean source lines)
Can we move on?
yes, we can! :)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

I said nothing about fairness, and *never at any point said I thought Don's results were more useful or fair.* What makes you think that's what I meant to imply? You have not responded to my separate concern that
For code that actively requires computation at runtime, I have seen no examples of an instance where well-optimized GHC is actually dozens or hundreds of times slower than GCC output.
Rather than accusing me of taking sides, if you'd take an actual
apples-to-apples comparison, citing the best Haskell results and best GCC
results -- without using examples from either language which performed
computation at compile-time that would not be possible in everyday programs
-- my original claims were true: that GHC code is frequently within 3x the
speed of GCC code, and hacked-up GHC code can reach and match GCC
performance -- though I agree those hacks require an impractical blowup in
code size. (Depending on your individual interpretation of what an average
Haskell program looks like, I concede that 3x might be off by a factor of 2
or so -- but not the factor of 50 you claimed.)
Don's "-D64" results, while *not* a useful gcc-vs-ghc comparison, are
relevant if really determined Haskellers are interested in learning how to
obtain the absolute optimal perfection from their code. Don's results *are*
useful, but not in the way you say we're claiming.
Louis Wasserman
wasserman.louis@gmail.com
On Sat, Feb 21, 2009 at 5:35 PM, Bulat Ziganshin
Hello Louis,
Sunday, February 22, 2009, 2:30:23 AM, you wrote:
yes, you are right. Don also compared results of 64x-reduced computation with full one. are you think that these results are more fair?
Observation:
The best gcc result shown in the thread, if I recall, precomputed the result of the full computation at compiletime and simply outputted it, when we looked at the assembly.
While I will accept that this could be seen as an optimization GHC should have made, I do not accept that this will be the case with most everyday code a programmer writes, as most code is not used to simply compute arithmetic constants.
For code that actively requires computation at runtime, I have seen no examples of an instance where well-optimized GHC is actually dozens or hundreds of times slower than GCC output.
Louis Wasserman wasserman.louis@gmail.com

On Sat, Feb 21, 2009 at 11:35 PM, Bulat Ziganshin wrote: Hello Louis, Sunday, February 22, 2009, 2:30:23 AM, you wrote: yes, you are right. Don also compared results of 64x-reduced
computation with full one. are you think that these results are more
fair? Yes. Clearly so.
It still computes the result from scratch - it just uses a trick which
generates better code. This is clearly a useful and worthwhile exercise as
it shows A) A neat trick with TH, B) A reasonably practical way to produce
fast code for the critical parts of a Haskell app, C) a motivating example
for implementing a compiler optimization to do it automatically.
Just outputting the precomputed result means nothing.
--
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862

Sebastian, that's not Bulat's point. He's saying that if we make that optimization in Haskell, we should at least make the same optimization in GCC for fair comparison. (Though I'm not entirely sure that that optimization would be of any use to GCC, but that's a linguistic concern, no more.) His point is valid. But Don's results *not* obtained by optimizing in this fashion are valid comparisons, and the results obtained with this optimization are useful for other reasons. Louis Wasserman wasserman.louis@gmail.com On Sat, Feb 21, 2009 at 5:55 PM, Sebastian Sylvan < sylvan@student.chalmers.se> wrote:
On Sat, Feb 21, 2009 at 11:35 PM, Bulat Ziganshin < bulat.ziganshin@gmail.com> wrote:
Hello Louis,
Sunday, February 22, 2009, 2:30:23 AM, you wrote:
yes, you are right. Don also compared results of 64x-reduced computation with full one. are you think that these results are more fair?
Yes. Clearly so. It still computes the result from scratch - it just uses a trick which generates better code. This is clearly a useful and worthwhile exercise as it shows A) A neat trick with TH, B) A reasonably practical way to produce fast code for the critical parts of a Haskell app, C) a motivating example for implementing a compiler optimization to do it automatically.
Just outputting the precomputed result means nothing.
-- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

No, he asked if comparing the D64 version with the straight gcc one was
"more fair" then comparing a version that precomputes the result with one
that doesn't. That's what I responded to.
On Sat, Feb 21, 2009 at 11:59 PM, Louis Wasserman wrote: Sebastian, that's not Bulat's point. He's saying that if we make that
optimization in Haskell, we should at least make the same optimization in
GCC for fair comparison. (Though I'm not entirely sure that that
optimization would be of any use to GCC, but that's a linguistic concern, no
more.) His point is valid. But Don's results *not* obtained by optimizing in this
fashion are valid comparisons, and the results obtained with this
optimization are useful for other reasons. Louis Wasserman
wasserman.louis@gmail.com On Sat, Feb 21, 2009 at 5:55 PM, Sebastian Sylvan <
sylvan@student.chalmers.se> wrote: On Sat, Feb 21, 2009 at 11:35 PM, Bulat Ziganshin <
bulat.ziganshin@gmail.com> wrote: Hello Louis, Sunday, February 22, 2009, 2:30:23 AM, you wrote: yes, you are right. Don also compared results of 64x-reduced
computation with full one. are you think that these results are more
fair? Yes. Clearly so.
It still computes the result from scratch - it just uses a trick which
generates better code. This is clearly a useful and worthwhile exercise as
it shows A) A neat trick with TH, B) A reasonably practical way to produce
fast code for the critical parts of a Haskell app, C) a motivating example
for implementing a compiler optimization to do it automatically. Just outputting the precomputed result means nothing. --
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862 --
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862

Hello Louis, Sunday, February 22, 2009, 2:59:05 AM, you wrote:
Sebastian, that's not Bulat's point. He's saying that if we make that optimization in Haskell, we should at least make the same optimization in GCC for fair comparison. (Though I'm not entirely sure that that optimization would be of any use to GCC, but that's a linguistic concern, no more.)
:))) it was *ghc* who replaced 64 additions with just one: sum += [n..n+k] ==> sum += n*k+const my first program had this optimization by mistake. i've found way to avoid it completely, Don found way to use it only in Haskell :)
His point is valid. But Don's results *not* obtained by optimizing in this fashion are valid comparisons, and the results obtained with this optimization are useful for other reasons.
of course these results are useful! my own goal was just to make fair comparison. i'm bothered when people said that ghc should be used for something like video codecs based on those "let's optimize only for haskell" pseudo-benchmarks. if Don was omitted unoptimized gcc results from is chart and you don't wrote those "conclusions" based on the chart, i will never make this comment
Louis Wasserman wasserman.louis@gmail.com
On Sat, Feb 21, 2009 at 5:55 PM, Sebastian Sylvan
wrote:
On Sat, Feb 21, 2009 at 11:35 PM, Bulat Ziganshin
wrote: Hello Louis,
Sunday, February 22, 2009, 2:30:23 AM, you wrote:
yes, you are right. Don also compared results of 64x-reduced computation with full one. are you think that these results are more fair?
Yes. Clearly so. It still computes the result from scratch - it just uses a trick which generates better code. This is clearly a useful and worthwhile exercise as it shows A) A neat trick with TH, B) A reasonably practical way to produce fast code for the critical parts of a Haskell app, C) a motivating example for implementing a compiler optimization to do it automatically.
Just outputting the precomputed result means nothing.
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat, you've some serious lessons to learn on how to interact with online communities. First, 1. Stop posting replies to every post on this thread 2. Read some of the fine literature on how to be a productive, contributing member of a mailing list community, http://haskell.org/haskellwiki/Protect_the_community 3. Then see if you can rephrase your concerns in a form that will be useful. Claus (as always) has made a fine suggestion: http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html 3. Come back with some analysis, or a ticket, and authentically try to collaborate with people here to improve or fix the problems you see. I'm setting your moderation bit now, and in public so we all know what is going on, so your posts will bounce until you do something constructive. This will likely expire in a few days - just enough to calm things down. -- Don (on jerk police protrol today) P.S. if anyone strongly objects, let's talk offline how better to manage things.

I think this thread has stopped being useful and started going round in circles, so I've blocked all messages to it and... On Sat, Feb 21, 2009 at 04:28:21PM -0800, Donald Bruce Stewart wrote:
I'm setting your moderation bit now
...reverted this. Thanks Ian

Bulat, Thank you for being productive. =) of course these results are useful! my own goal was just to make fair comparison. i'm bothered when people said that ghc should be used for something like video codecs based on those "let's optimize only for haskell" pseudo-benchmarks. if Don was omitted unoptimized gcc results from is chart and you don't wrote those "conclusions" based on the chart, i will never make this comment Haskellers do need a baseline, y'know. What I mean by that is, attempting to write Haskell code that is as fast as gcc-optimized "typical" code is a useful enterprise. Of course it's possible to write a faster gcc program than the one that Don compared to, but it's still a useful benchmark, and of course it's not fair to optimize the heck out of Haskell code and leave gcc code untouched and then say a comparison between *those* pieces of code is fair. I think we all agree on my original point now, that - Straightforward and simple Haskell code, written by an individual aware of issues with tail recursion and stream fusion, is frequently within 3x the speed of *straightforward and simple* GCC code when compiled with appropriate optimizations in GHC. Sebastian, yes, there's still useful conversation to be had about more super-heavily-optimized code. (Bulat, if you'd like to continue posting examples of more heavily optimized C and its outputted assembly, I think that'd still provide a useful comparison in a conversation that can be continued civilly on all sides.) Louis Wasserman wasserman.louis@gmail.com

Hello Sebastian, Sunday, February 22, 2009, 2:55:38 AM, you wrote:
yes, you are right. Don also compared results of 64x-reduced computation with full one. are you think that these results are more fair?
Yes. Clearly so. It still computes the result from scratch - it just uses a trick which generates better code. This is clearly a useful and worthwhile exercise as it shows A) A neat trick with TH, B) A reasonably practical way to produce fast code for the critical parts of a Haskell app, C) a motivating example for implementing a compiler optimization to do it automatically.
yes, but does you know why his last program is 64x faster than simple code? it's because *gcc* optimize it this way. the first program i published there does it by mistake, then i fixed it by using 'xor' instead of (+) and published here that i've considered most fair comparison OTOH Don used this gcc optimization to generate faster code for haskell. he doesn't used this trick for C++ and doesn't omitted unoptimized gcc results from the chart. as a result people who don't analyzed details made conclusion that ghc outperformed gcc here so i have made experiment with cheating the same way, but in more obvious manner. and i got 3 angry answers in 5 minutes. so what are the difference? you don't catched details of Don comparison or you bothered only by gcc-related cheating? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sun, Feb 22, 2009 at 12:10 AM, Bulat Ziganshin wrote: Hello Sebastian, Sunday, February 22, 2009, 2:55:38 AM, you wrote: yes, you are right. Don also compared results of 64x-reduced
computation with full one. are you think that these results are more
fair? Yes. Clearly so.
It still computes the result from scratch - it just uses a trick
which generates better code. This is clearly a useful and worthwhile
exercise as it shows A) A neat trick with TH, B) A reasonably
practical way to produce fast code for the critical parts of a
Haskell app, C) a motivating example for implementing a compiler
optimization to do it automatically. yes, but does you know why his last program is 64x faster than simple
code? it's because *gcc* optimize it this way. the first program i
published there does it by mistake, then i fixed it by using 'xor'
instead of (+) and published here that i've considered most fair
comparison OTOH Don used this gcc optimization to generate faster code for
haskell. he doesn't used this trick for C++ and doesn't omitted
unoptimized gcc results from the chart. as a result people who don't
analyzed details made conclusion that ghc outperformed gcc here so i have made experiment with cheating the same way, but in more
obvious manner. and i got 3 angry answers in 5 minutes. so what are
the difference? you don't catched details of Don comparison or you
bothered only by gcc-related cheating? Bulat, please stop insulting everyone whenever you discuss something. Was
that sentence really necessary? You really think it's productive to
insinuate that I'm intellectualy dishonest?
I'm afraid I don't understand what you're talking about, could you try being
a bit clearer?
As I understand it you compared a gcc version which printed the precomputed
result, with a GHC version which computed the result at runtime, and got the
"150x" figure from that. Is this incorrect? If so, say so.
Don't accuse everyone who disagrees with you of being dishonest. NOBODY in
this thread has said anything to even remotely suggest that they think it's
okay to "cheat" in favour of Haskell and consider it fair, yet you jump to
this conclusion every single time. Please, give people the benefit of the
doubt? Just because someone disagrees with you does not make them stupid or
dishonest. Maybe they're actually right, or maybe you didn't make your point
clear enough, or maybe they just misunderstood you. Either way, please try
to be civil.
The only argument anyone made towards "cheating" on the gcc side is that ANY
program which just prints a precomputed result is worthless for comparisoin
(regardless of which side is doing it).
--
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862

It's not practical at all. It's monstrously more complicated than C. It would be much simpler to do it in C and use FFI. Regards, John A. De Goes N-BRAIN, Inc. The Evolution of Collaboration http://www.n-brain.net | 877-376-2724 x 101 On Feb 21, 2009, at 4:55 PM, Sebastian Sylvan wrote:
B) A reasonably practical way to produce fast code for the critical parts of a Haskell app,

Am Sonntag, 22. Februar 2009 00:21 schrieb Bulat Ziganshin:
Hello Louis,
Saturday, February 21, 2009, 4:16:10 AM, you wrote:
In the meantime, a brief summary:
a minor correction: the best gcc result shown in the thread was 50x faster than Don's one, so you need to miltiple all ratios by a factor of 50
You're referring to the freak result of Dan Doel? Come on, be serious, please. I have a Haskell result that runs in 7ms, too. Just use a rewrite rule and hey presto :)
Straightforward and simple Haskell code, written by an individual aware of issues with tail recursion and stream fusion, is frequently within 3x the speed of GCC code when compiled with appropriate optimizations in GHC.
yes, within 150x margin
Bulat, your obsession has become obnoxious and ridiculous.
When performance is an absolute necessity, Haskell code can sometimes be manually modified (e.g. with manual loop unrolls) to equal GCC in performance.
yes, to make it only 50x slower while being only 7 times larger (i mean source lines)
Can we move on?
yes, we can! :)
Apparently not :(

Hello Daniel, Sunday, February 22, 2009, 2:36:57 AM, you wrote:
You're referring to the freak result of Dan Doel? Come on, be serious, please. I have a Haskell result that runs in 7ms, too. Just use a rewrite rule and hey presto :)
Dan, why you have not said the same about test where ghc becomes 4x faster than gcc?
Bulat, your obsession has become obnoxious and ridiculous.
i really, really wonder why cheating on gcc side is "obsession that become obnoxious and ridiculous" while cheating on ghc side is ok for you. can you explain? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Am Sonntag, 22. Februar 2009 00:50 schrieb Bulat Ziganshin:
Hello Daniel,
Sunday, February 22, 2009, 2:36:57 AM, you wrote:
You're referring to the freak result of Dan Doel? Come on, be serious, please. I have a Haskell result that runs in 7ms, too. Just use a rewrite rule and hey presto :)
Dan, why you have not said the same about test where ghc becomes 4x faster than gcc?
Because, as has been explained several times to you, it was not a test of ghc's code generation vs. gcc's, and it was NOT meant to be. It was explicitely an investigation of how effective an improvement loop-unrolling could be.
Bulat, your obsession has become obnoxious and ridiculous.
i really, really wonder why cheating on gcc side is "obsession that become obnoxious and ridiculous" while cheating on ghc side is ok for you. can you explain?
Seriously???? You have one case where gcc calculated the result for an extremely simple programme during the compilation and you use that to say that gcc's code is routinely 50-150 times faster than ghc's. Very convincing.
participants (7)
-
Bulat Ziganshin
-
Daniel Fischer
-
Don Stewart
-
Ian Lynagh
-
John A. De Goes
-
Louis Wasserman
-
Sebastian Sylvan