Can Haskell outperform C++?

Janek S.

6 May 2012 6 May '12

6:40 a.m.

Hi, a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed. Jan

Show replies by date

Ivan Lazar Miljenovic

6 May 6 May

7:44 a.m.

On 6 May 2012 16:40, Janek S. wrote:

...

Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

How about http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod... ? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Artur

8:41 a.m.

On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:

...

On 6 May 2012 16:40, Janek S. wrote:

...
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed. How about http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod... ?

Hi, isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

Yves Parès

8:58 a.m.

I do not agree: the fib function is tail-recursive, any good C compiler is able to optimize away the calls and reduce it to a mere loop. At least that's what I learnt about tail recursion in C with GCC. 2012/5/6 Artur

...

On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:

...
On 6 May 2012 16:40, Janek S. wrote:

...
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

How about http://donsbot.wordpress.com/**2007/11/29/use-those-extra-** cores-and-beat-c-today-**parallel-haskell-redux/http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod... ?

Hi,

isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

Roman Cheplyaka

9:09 a.m.

It is not tail-recursive. * Yves Parès [2012-05-06 10:58:45+0200]

...

I do not agree: the fib function is tail-recursive, any good C compiler is able to optimize away the calls and reduce it to a mere loop. At least that's what I learnt about tail recursion in C with GCC.

2012/5/6 Artur

...
On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:

...
On 6 May 2012 16:40, Janek S. wrote:

...
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

How about http://donsbot.wordpress.com/**2007/11/29/use-those-extra-** cores-and-beat-c-today-**parallel-haskell-redux/http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod... ?

Hi,

isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

...

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Roman I. Cheplyaka :: http://ro-che.info/

Yves Parès

9:12 a.m.

Sorry sorry sorry ^^ I read too fast and realized I was wrong before you sent this. Okay so then it would be nice that somewhere with higher skills tells us where does Haskell recursive calls stand when compared to C's. 2012/5/6 Roman Cheplyaka

...

It is not tail-recursive.

...
I do not agree: the fib function is tail-recursive, any good C compiler is able to optimize away the calls and reduce it to a mere loop. At least that's what I learnt about tail recursion in C with GCC.

2012/5/6 Artur

...
On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:

...
On 6 May 2012 16:40, Janek S. wrote:

...
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone

* Yves Parès [2012-05-06 10:58:45+0200] provide

...
...
...
...
a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

How about http://donsbot.wordpress.com/**2007/11/29/use-those-extra-** cores-and-beat-c-today-**parallel-haskell-redux/< http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod...

...
...
?

Hi,

isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe< http://www.haskell.org/mailman/listinfo/haskell-cafe>

...
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Roman I. Cheplyaka :: http://ro-che.info/

Austin Seipp

9:51 a.m.

In this case it doesn't matter; while it isn't technically tail recursive, GCC is very capable of transforming it into a direct loop likely because it knows about the associative/commutative properties of "+" so it's able to re-arrange the body as it sees fit since combined, both calls are in 'tail position.' With GCC 4.6.3 at -O3 optimization level on my Ubuntu 12.04 machine, the example in the linked article executes in 9s (Intel Core i5-2520M CPU @ 2.50GHz, dual core with 4 total logical cores) and the body of fib does no contain any call instructions - in fact, it seems to not only transform the body into a loop, but unroll a very good portion of it leading to very very fast code. Technically, if you want to be a pedant, the benchmarks aren't even equivalent anyway; GCC is able to use the native 'long long' datatype while GHC promotes the results of 'fib' to Integer, and thus is backed by GMP and a lot of extra code. GMP is fast, but it's not as fast as a native data type. It should use Int64.That change alone speeds up the benchmark by approximately 30% for me using GHC 7.4.1 (~30s at -N4 vs ~19s at -N4.) I don't think the point of that post was to outwit the C compiler, but show how easy it is to add parallelism to a Haskell program (ironically, pseq/par has proven quite difficult to leverage for nice wall-clock speedups in practice, hence the advent of libraries like strategies and more recently monad-par, that make speedups easier to get.) I do think it's much easier to add parallelism to Haskell programs - even if they are not as fast, it's far easier to write, understand, and safer to do so. And ultimately all this code is very old (5+ years) and not reflective of either toolchain anymore. GHC has had immesurable amounts of overhauls in its parallelism support as well as the libraries supporting it, and both GHC and GCC have far better optimisation capabilities. On Sun, May 6, 2012 at 4:09 AM, Roman Cheplyaka wrote:

...

It is not tail-recursive.

* Yves Parès [2012-05-06 10:58:45+0200]

...
I do not agree: the fib function is tail-recursive, any good C compiler is able to optimize away the calls and reduce it to a mere loop. At least that's what I learnt about tail recursion in C with GCC.

2012/5/6 Artur

...
On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:

...
On 6 May 2012 16:40, Janek S. wrote:

...
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

How about http://donsbot.wordpress.com/**2007/11/29/use-those-extra-** cores-and-beat-c-today-**parallel-haskell-redux/http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod... ?

Hi,

isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

...
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Roman I. Cheplyaka :: http://ro-che.info/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Regards, Austin

Bartosz Milewski

8:15 p.m.

An equivalent parallel version in C++11 would look something like this: long long fib(long long n) { if (n< 2) { return 1; } std::future<long long> r = std::async(fib, n-2); long long l = fib(n-1); return r.get() + l; } My bet though is that it would perform worse that the serial version because of much higher overheads in starting threads in C++. [:Bartosz:] On 5/6/2012 02:51, Austin Seipp wrote:

...

In this case it doesn't matter; while it isn't technically tail recursive, GCC is very capable of transforming it into a direct loop likely because it knows about the associative/commutative properties of "+" so it's able to re-arrange the body as it sees fit since combined, both calls are in 'tail position.'

With GCC 4.6.3 at -O3 optimization level on my Ubuntu 12.04 machine, the example in the linked article executes in 9s (Intel Core i5-2520M CPU @ 2.50GHz, dual core with 4 total logical cores) and the body of fib does no contain any call instructions - in fact, it seems to not only transform the body into a loop, but unroll a very good portion of it leading to very very fast code.

Technically, if you want to be a pedant, the benchmarks aren't even equivalent anyway; GCC is able to use the native 'long long' datatype while GHC promotes the results of 'fib' to Integer, and thus is backed by GMP and a lot of extra code. GMP is fast, but it's not as fast as a native data type. It should use Int64.That change alone speeds up the benchmark by approximately 30% for me using GHC 7.4.1 (~30s at -N4 vs ~19s at -N4.)

I don't think the point of that post was to outwit the C compiler, but show how easy it is to add parallelism to a Haskell program (ironically, pseq/par has proven quite difficult to leverage for nice wall-clock speedups in practice, hence the advent of libraries like strategies and more recently monad-par, that make speedups easier to get.) I do think it's much easier to add parallelism to Haskell programs - even if they are not as fast, it's far easier to write, understand, and safer to do so. And ultimately all this code is very old (5+ years) and not reflective of either toolchain anymore. GHC has had immesurable amounts of overhauls in its parallelism support as well as the libraries supporting it, and both GHC and GCC have far better optimisation capabilities.

On Sun, May 6, 2012 at 4:09 AM, Roman Cheplyaka wrote:

...
It is not tail-recursive.

* Yves Parès [2012-05-06 10:58:45+0200]

...
I do not agree: the fib function is tail-recursive, any good C compiler is able to optimize away the calls and reduce it to a mere loop. At least that's what I learnt about tail recursion in C with GCC.

2012/5/6 Artur

...
On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:

...
On 6 May 2012 16:40, Janek S. wrote:

...
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

How about http://donsbot.wordpress.com/**2007/11/29/use-those-extra-** cores-and-beat-c-today-**parallel-haskell-redux/http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-tod... ?

Hi, isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Roman I. Cheplyaka :: http://ro-che.info/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Roman Cheplyaka

9:23 a.m.

* Artur [2012-05-06 11:41:58+0300]

...

isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

I think the point was to take two programs of similar code complexity (or, rather, simplicity) implementing the same algorithm (modulo parallelism). So I'm not sure what exactly you're objecting to. If you're saying that there are better algorithms to compute Fibonacci numbers, it's not relevant — the important thing that the two programs are computing the same thing in the same way. If you're saying that in C an explicit stack should have been used instead of recursion, then it would increase the code complexity while having non-obvious performance benefits. In any case, assuming that on this particular task Haskell is x times slower than C, taking sufficiently many cores and large enough N would outweigh that. The again, I don't think that article was meant as a fair comparison between Haskell and C on an average task. (The chosen example consists of one loop which is easily parallelisable.) All it demonstrates is that it is very easy to exploit parallelism in Haskell when there's an opportunity. -- Roman I. Cheplyaka :: http://ro-che.info/

Axel

10:03 a.m.

Well, I believe that comparison is about solving a task, not writing code in some particular way. I get your point, but I think that when comparing language speed, you should use the best techniques each one has to offer, it makes no sense otherwise. If there was some kind of comparison made recently between Haskell and C++, I'd be really glad to read a report. I like Haskell much more than C++, and it'd be good to know, how write code in Haskell of speed equal or greater to that of C++ (at least in some typical functional language tasks, like math and data processing). On Sun, May 6, 2012 at 12:23 PM, Roman Cheplyaka wrote:

...

* Artur [2012-05-06 11:41:58+0300]

...
isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

I think the point was to take two programs of similar code complexity (or, rather, simplicity) implementing the same algorithm (modulo parallelism).

So I'm not sure what exactly you're objecting to.

If you're saying that there are better algorithms to compute Fibonacci numbers, it's not relevant — the important thing that the two programs are computing the same thing in the same way.

If you're saying that in C an explicit stack should have been used instead of recursion, then it would increase the code complexity while having non-obvious performance benefits.

In any case, assuming that on this particular task Haskell is x times slower than C, taking sufficiently many cores and large enough N would outweigh that.

The again, I don't think that article was meant as a fair comparison between Haskell and C on an average task. (The chosen example consists of one loop which is easily parallelisable.) All it demonstrates is that it is very easy to exploit parallelism in Haskell when there's an opportunity.

-- Roman I. Cheplyaka :: http://ro-che.info/

Jerzy Karczmarczuk

10:09 a.m.

Roman Cheplyaka:

...

If you're saying that in C an explicit stack should have been used instead of recursion, then it would increase the code complexity while having non-obvious performance benefits. This is a fragment of a bigger message I won't comment.

But THIS is a little dubious. You may accuse anything of anything by such vacuous statements as "non-obvious performance benefits". If the stack frame allocation policy is lousy (not because of incompetence of the compiler writers, but because of its "universalism"), then the gain may be quite visible. My favourite examples are the classical "flood fill" algorithms for filling closed regions in computer graphics, and also some models of percolation (finding paths in rather big graphs). Everything depends on the language used, but I have seen the acceleration by a factor of 5 after having replaced the recursion by explicit stacks + loops. Jerzy Karczmarczuk

Ertugrul Söylemez

11:52 a.m.

Roman Cheplyaka wrote:

...

...
isn't it that particular Haskell code is outperforming C (22 seconds vs. 33), just because the author uses recursion in C? I surely love Haskell, and the way it's code is easy parallelized, but that example seams not fair.

I think the point was to take two programs of similar code complexity (or, rather, simplicity) implementing the same algorithm (modulo parallelism).

So I'm not sure what exactly you're objecting to.

I think while this is a valid argument it is not very convincing. You have to acknowledge that C and C++ can give better performance at number crunching. The gain is small enough that personally I wouldn't go through the trouble of writing my algorithms in C/C++, but simple-minded people often have a desire to get the best performance possible, in which case you really want to use C, C++, Fortran or whatever high level assembler language you like. In other words: Writing Haskell in C is just as wrong as writing C in Haskell. A comparison of algorithm /implementation styles/ does not have much semantic content. It just shows that Haskell rocks at high level optimization while C rocks at low level optimization. By the same experiment you can see that Haskell is bad at low level optimization while C is bad at high level optimization. In order to compare code performance you have to implement the same algorithm in the idiomatic style in both languages. This experiment would show that Haskell, even though it gets close, is still slower than C. It would however show that interpreted Python and Ruby are considerably slower than both. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Ryan Newton

11 May 11 May

12:44 a.m.

...

through the trouble of writing my algorithms in C/C++, but simple-minded people often have a desire to get the best performance possible, in which case you really want to use C, C++, Fortran or whatever high level assembler language you like.

I think this is a bit of an unfair accusation ("simple-minded"). Performance is only relevant to certain domains, sure. But program performance is an entire *industry*. And I'd argue it's of massive importance to the world at large. Intel has an army of "AE"s (application engineers) that go out to other companies to make applications perform better. There are plenty of compute bound industries -- i.e. movie companies are limited by what kind of frame they can render in ~24 hrs; sequencing centers are limited by certain very slow bioinformatics algorithms; you can look almost anywhere you like for examples. As a community I think we have to face the fact that writing the hot inner loop of your application as idiomatic Haskell is not [yet] going to give you C/Fortran performance off the bat. Though in some cases there's not really anything stopping us but more backend/codegen work (I'm thinking of arithmetically intensive loops with scalars only). For example, the following Mandel kernel is in many ways the *same* as the C version: https://github.com/simonmar/monad-par/blob/662fa05b2839c8a0a6473dc490ead8dd5... We have the types; we've got strictness (for this loop); but the C version was 6X faster when I tested it. -Ryan

Manuel M T Chakravarty

2:09 a.m.

Ryan Newton:

...

As a community I think we have to face the fact that writing the hot inner loop of your application as idiomatic Haskell is not [yet] going to give you C/Fortran performance off the bat. Though in some cases there's not really anything stopping us but more backend/codegen work (I'm thinking of arithmetically intensive loops with scalars only). For example, the following Mandel kernel is in many ways the *same* as the C version:

https://github.com/simonmar/monad-par/blob/662fa05b2839c8a0a6473dc490ead8dd5...

We have the types; we've got strictness (for this loop); but the C version was 6X faster when I tested it.

Did you use the LLVM backend? I am asking because I have seen dramatic differences between the code generators in similar example. Have a look at https://wiki.cse.unsw.edu.au/cs3141/12s1/Parallelism%20notes With GHC's native code generator, the C version is much faster than the Haskell version (by a factor of 2 or 3 IIRC), but using GHC's LLVM backend, the Haskell version is even a few percent faster than the C version on my machine. (This is on OS X using llvm-gcc to compile the C code — that is, the code generator for the C and Haskell version goes via LLVM.) I think that is an interesting example, because it shows (a) just how much of a difference a good code generator can make and (b) that using the same code generator, a Haskell compiler has no inherent disadvantage to a C compiler. (Nevertheless, especially for small examples, it is much easier to write a slow Haskell than to write a slow C program.) Manuel

Ertugrul Söylemez

3:53 a.m.

Ryan Newton wrote:

...

I think this is a bit of an unfair accusation ("simple-minded"). Performance is only relevant to certain domains, sure. But program performance is an entire *industry*. And I'd argue it's of massive importance to the world at large. Intel has an army of "AE"s (application engineers) that go out to other companies to make applications perform better. There are plenty of compute bound industries -- i.e. movie companies are limited by what kind of frame they can render in ~24 hrs; sequencing centers are limited by certain very slow bioinformatics algorithms; you can look almost anywhere you like for examples.

I'm sorry for the rough words. I didn't mean to overgeneralize. Of course in certain domains a reasonable performance is desirable, but the first question you should ask is whether you want high level or low level performance. Haskell performs amazingly at the high level and suboptimal at the low level. My point is: If you need C-like performance at a certain spot there is really no excuse for writing the entire application in C. Haskell has a working, powerful enough FFI. Also idiomatic Haskell code nowadays performs close to C. If your code doesn't, chances are that it's not even idiomatic. Sorting a list is easy and beautiful in code. But it's far from idiomatic. Using ST with destructive update is also not idiomatic. One of my performance masterpieces is the "instinct" AI library (see Hackage). It uses only immutable vectors and performs very heavy Double calculations, yet performs better than the same code with mutable arrays did. With a few years of Haskell experience in my backpack I know how to utilize laziness to get amazing performance for code that most people would feel must be written with destructively updating loop. And the resulting code is just beautiful to read and watch at work, a great demonstration of the wonders the GHC developers have created.

...

As a community I think we have to face the fact that writing the hot inner loop of your application as idiomatic Haskell is not [yet] going to give you C/Fortran performance off the bat. Though in some cases there's not really anything stopping us but more backend/codegen work (I'm thinking of arithmetically intensive loops with scalars only). For example, the following Mandel kernel is in many ways the *same* as the C version:

https://github.com/simonmar/monad-par/blob/662fa05b2839c8a0a6473dc490ead8dd5...

We have the types; we've got strictness (for this loop); but the C version was 6X faster when I tested it.

Well, if it's "in many ways the same as C", then again it's probably not idiomatic Haskell. I don't know what a Mandel kernel is, so I can't really tell you how I would write the code without further study. However, I'm doing a lot of number crunching and my code usually gets really close to C, and especially for Integer-heavy calculations actually outperforms C. I have done the comparison. Using GMP with all the features it provides in the mpz_* family of functions is in fact slower than the same in Haskell (GHC 7.4.1 on Intel i5 64 bits here), and from my C times I have experience using GMP properly including smart allocation, etc. If you want high performance Haskell, writing C in Haskell is /not/ the solution. It /will/ result in suboptimal code, likely considerably slower than the same code in C. Greets, Ertugrul -- Key-ID: E5DD8D11 "Ertugrul Soeylemez " FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/

Gregg Lebovitz

12:41 p.m.

Chris Wong

12 May 12 May

5:02 a.m.

On Sat, May 12, 2012 at 12:41 AM, Gregg Lebovitz wrote:

...

I would find it useful to pull all this information together into a single document that discusses all the performance issues in one place and shares the real life experience is dealing with each issue. I see this as a best practice paper rather than a research document.

Does such a document exist? If not, I am willing try and start one.

http://www.haskell.org/haskellwiki/Performance ;)

Gregg Lebovitz

3:20 p.m.

Chris, Thanks you for indulging me. I will eventually get use to the idea that there is a wiki page for almost every topic :-) Gregg On 5/12/2012 1:02 AM, Chris Wong wrote:

...

On Sat, May 12, 2012 at 12:41 AM, Gregg Lebovitz wrote:

...
I would find it useful to pull all this information together into a single document that discusses all the performance issues in one place and shares the real life experience is dealing with each issue. I see this as a best practice paper rather than a research document.

Does such a document exist? If not, I am willing try and start one. http://www.haskell.org/haskellwiki/Performance

;)

Yves Parès

15 May 15 May

9:59 a.m.

Yet this resource seems a little outdated: http://www.haskell.org/haskellwiki/Performance/Strings does not speak about Text http://www.haskell.org/haskellwiki/Performance/Arrays about Vector And what's the status of the Streams IO approach detailed in http://www.haskell.org/haskellwiki/Performance/IO ? Has it evolved into enumerators/conduits, or was that an unrelated approach? 2012/5/12 Chris Wong

...

On Sat, May 12, 2012 at 12:41 AM, Gregg Lebovitz wrote:

...
I would find it useful to pull all this information together into a single document that discusses all the performance issues in one place and shares the real life experience is dealing with each issue. I see this as a best practice paper rather than a research document.

Does such a document exist? If not, I am willing try and start one.

http://www.haskell.org/haskellwiki/Performance

;)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ertugrul Söylemez

13 May 13 May

4:55 a.m.

Gregg Lebovitz wrote:

...

Ryan writes:

With a few years of Haskell experience in my backpack I know how to utilize laziness to get amazing performance for code that most people would feel must be written with destructively updating loop.

That was me actually. Just a minor side note that some people may consider a nitpick, but I'd like to keep being the author of my own statements. Thanks. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Dmitry Vyal

14 May 14 May

8:03 a.m.

On 05/11/2012 07:53 AM, Ertugrul Söylemez wrote:

...

My point is: If you need C-like performance at a certain spot there is really no excuse for writing the entire application in C. Haskell has a working, powerful enough FFI. Also idiomatic Haskell code nowadays performs close to C. If your code doesn't, chances are that it's not even idiomatic. Sorting a list is easy and beautiful in code. But it's far from idiomatic. Using ST with destructive update is also not idiomatic. One of my performance masterpieces is the "instinct" AI library (see Hackage). It uses only immutable vectors and performs very heavy Double calculations, yet performs better than the same code with mutable arrays did. With a few years of Haskell experience in my backpack I know how to utilize laziness to get amazing performance for code that most people would feel must be written with destructively updating loop. And the resulting code is just beautiful to read and watch at work, a great demonstration of the wonders the GHC developers have created. Hello Ertugrul,

Out of the curios, did you compare the performance of Instinct with implementations in languages, associated with numerical computations, like Octave?

Ertugrul Söylemez

9:27 a.m.

Dmitry Vyal wrote:

...

...
My point is: If you need C-like performance at a certain spot there is really no excuse for writing the entire application in C. Haskell has a working, powerful enough FFI. Also idiomatic Haskell code nowadays performs close to C. If your code doesn't, chances are that it's not even idiomatic. Sorting a list is easy and beautiful in code. But it's far from idiomatic. Using ST with destructive update is also not idiomatic. One of my performance masterpieces is the "instinct" AI library (see Hackage). It uses only immutable vectors and performs very heavy Double calculations, yet performs better than the same code with mutable arrays did. With a few years of Haskell experience in my backpack I know how to utilize laziness to get amazing performance for code that most people would feel must be written with destructively updating loop. And the resulting code is just beautiful to read and watch at work, a great demonstration of the wonders the GHC developers have created.

Out of the curios, did you compare the performance of Instinct with implementations in languages, associated with numerical computations, like Octave?

No, sorry. Greets, Ertugrul -- Key-ID: E5DD8D11 "Ertugrul Soeylemez " FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/

Ryan Newton

2:26 p.m.

...

Well, if it's "in many ways the same as C", then again it's probably not idiomatic Haskell.

It's just a recursive equation for mandelbrot fractals. I should have been precise, I didn't mean that the source is literally the *same* as the C source (i.e. there's no for loop, no mutable variables), rather that it presents the compiler backend with the same advantages as the C backend would have with the equivalent loop. That is, there's no control flow obfuscation (dictionaries, calls to unknown targets), no problems with laziness, and the data representations are completely known. mandel :: Int -> Complex Double -> Int mandel max_depth c = loop 0 0 where loop i !z | i == max_depth = i | magnitude z >= 2.0 = i | otherwise = loop (i+1) (z*z + c) It's also a loop that is readily recognized as a loop. Now, to my knowledge, GHC doesn't explicitly recognize loops in any stage of the compiler (so as to perform loop optimizations), something which other functional compilers sometimes do. But, anyway, it turns out that my example above *is easily transformed from a bad GHC performance story into a good one*. If you'll bear with me, I'll show how below. First, Manuel makes a good point about the LLVM backend. My "6X" anecdote was from a while ago and I didn't use llvm [1]. I redid it just now with 7.4.1+LLVM, results below. (The below table should read correctly in fixed width font, but you can also see the data in the spreadsheet herehttps://docs.google.com/spreadsheet/ccc?key=0AvzAHqQmHo87dHU0T0lCb1I4MFJmM2s... .) Time (ms) Compiled File size Comple+Runtime (ms) GHC 7.4.1 O0 2444 1241K GHC 7.4.1 O2 925 1132K 1561 GHC 7.4.1 O2 llvm 931 1133K GHC 7.0.4 O2 via-C 684 974K So LLVM didn't help [1]. And in fact the deprecated via-C backend did the best! Compare with GCC: G++ O0 300 9K 531 G++ O3 110 7K 347 G++ O3 recursive 116 9K Uh oh, the "6X" gap I mentioned is now closer to 9X. And, in fact, performing a mini "language shootout" on the above code, reveals that GHC is doing worse than not only OCaml, but Chez Scheme, in spite of dynamic type checks, and a necessarily boxed representation of complex numbers: Chez Scheme 8.4 284 2.7K notStandalone 372 OCaml 166 180K 301 At least Python does worse! Python 2.6 1973 NA 1973 *So here's the catch:* If you check the Core and STG GHC 7.4 is actually compiling the above loop very well. This microbenchmark turns into just a "magnitude" microbenchmark. The implementation of Data.Complex uses an EXTREMELY expensive method to avoid overflowhttps://github.com/ghc/packages-base/blob/master/Data/Complex.hs#L115 [2]. Since OCaml did well above, I took a look at their standard library's implementation, on line 51 herehttp://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/stdlib/complex.ml?revision=11156&view=markup. They use a nice little math trick (the extra division) that is also mentioned on Wikipedia. By implementing the same trick in Haskell, replacing the standard "magnitude" functionhttps://github.com/rrnewton/MandelMicrobench/blob/97c3275ad94cbce57a68881733..., we get the following results. GHC 7.4.1 No Overflow Avoidance 39 1127K 674 GHC 741, OCaml-style Overflow avoidance 74 1127K Wow, now not only is the Haskell version faster than OCaml/Scheme, *but it is 48% faster than C++*, which is appropriate to the title of this email! Of course, this probably means that C++'s abs is also doing something overly expensive for overflow avoidance (or that their representation of complex numbers is not fully unpacked and stored in registers) I haven't tracked it down yet. But in any case, I'll be submitting a library patch. *The moral, I think, is that community members could do a great deal to help "Haskell" performance by simply microbenchmarking and optimizing library routines in base!* Cheers, -Ryan P.S. You can check out the above benchmarks from here: https://github.com/rrnewton/MandelMicrobench https://github.com/rrnewton/MandelMicrobench [1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM. [2] P.P.P.S. It turns out there was already a ticket ( http://hackage.haskell.org/trac/ghc/ticket/2450) regarding magnitude's performance. But it still has bad performance even after a small refactoring was performed.

Manuel M T Chakravarty

15 May 15 May

12:55 a.m.

Ryan Newton:

...

But, anyway, it turns out that my example above is easily transformed from a bad GHC performance story into a good one. If you'll bear with me, I'll show how below. First, Manuel makes a good point about the LLVM backend. My "6X" anecdote was from a while ago and I didn't use llvm [1]. I redid it just now with 7.4.1+LLVM, results below. (The below table should read correctly in fixed width font, but you can also see the data in the spreadsheet here.)

Time (ms) Compiled File size Comple+Runtime (ms) GHC 7.4.1 O0 2444 1241K GHC 7.4.1 O2 925 1132K 1561 GHC 7.4.1 O2 llvm 931 1133K GHC 7.0.4 O2 via-C 684 974K

So LLVM didn't help [1]. And in fact the deprecated via-C backend did the best!

I would classify that as a bug.

...

[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM.

There is some work underway to improve the situation, but I am sure more could be done. Manuel

wren ng thornton

16 May 16 May

10:54 a.m.

On 5/10/12 8:44 PM, Ryan Newton wrote:

...

...
through the trouble of writing my algorithms in C/C++, but simple-minded people often have a desire to get the best performance possible, in which case you really want to use C, C++, Fortran or whatever high level assembler language you like.

I think this is a bit of an unfair accusation ("simple-minded").

Well, yes and no. On the one hand, characterizing those who desire the best performance possible as "simple-minded" is, at best, a gross over-generalization. Like you, I work in a field where optimization is king (e.g., in machine translation, program runtimes are measured in days). But on the other hand, there are quite a lot of people who focus excessively on "optimization" when the actual differences for the code they're writing are either negligible (e.g., because of I/O bottlenecks) or uninteresting (e.g., a 2x slowdown is a nuisance rather than a crisis). For those who focus on optimization when the benefits are negligible or uninteresting, I think it's fair to characterize that focus as "simple-minded". This focus seems especially common when people start talking about "which language is faster"--- as opposed to, say, which library is faster, or which implementation of a given algorithm is faster. In some cases the question of language speed is legitimate, but IME it's far more often used to prop up prejudices about which language is "better"--- i.e., which group of human beings (defined by their choice of programming language) is "better". I agree that, as a community, we need to come to a realistic understanding of the performance characteristics of Haskell compared to C etc. I don't like prejudice and propaganda for Haskell any more than I like prejudice and propaganda for C etc. In some cases it's true that Haskell out-performs C (or C++, or Java); but in many cases it does not, and we need to acknowledge that. The real problem, I feel, is that it's not always clear which category any given program falls into. In some cases the "idiomatic" Haskell is the fast one, in some cases its the slow one; but in all cases, what is meant by "idiomatic Haskell" is a moving target with poorly specified meaning. The advent of ByteStrings was a major coup in figuring out exactly where and why Haskell is slow and how to fix it. Iteratees are another one, though the details are still being worked out. However, things like strictness annotations on (non-accumulator) function arguments, the choice whether to use ST or not, the choice whether to CPS-transform or not, etc, are still very much a black art. Even with a thorough understanding of the STG-machine and the GHC compilation pipeline, it's not always clear whether they will help or hurt. ST in particular is especially finicky, to the point where I wonder if it's ever a win (outside of in-place updates on arrays). So while I think most language performance comparisons are dubious to begin with, I don't think the comparison of Haskell to C++ (or C, or Java,...) can even be construed as something with a meaningful answer. Haskell is just too different, and its in-the-large cost model is too poorly understood. I'm not even aware of any helpful generalizations over comparing two specific programs. Even when restricting to things like parsing or compute-intensive numeric code, the comparison comes back with mixed answers. -- Live well, ~wren

Yves Parès

11:43 a.m.

...

On the one hand, characterizing those who desire the best performance possible as "simple-minded" is, at best, a gross over-generalization. Like you, I work in a field where optimization is king (e.g., in machine translation, program runtimes are measured in days).

...

simple-minded people often have a desire to get the best performance

You misread the logical implication, Ertugrul said: possible Which means: A is simple-minded ==> A will (most probably) want to get the best perf. You interpreted it as: A wants to get the best perf. ==> A is simple-minded I agree with you, the second implication is wrong, but that's not what was meant. (Moral: Should people use more logical operators and less words we would undertand better). 2012/5/16 wren ng thornton

...

On 5/10/12 8:44 PM, Ryan Newton wrote:

...
through the trouble of writing my algorithms in C/C++, but simple-minded

...
people often have a desire to get the best performance possible, in which case you really want to use C, C++, Fortran or whatever high level assembler language you like.

I think this is a bit of an unfair accusation ("simple-minded").

Well, yes and no.

On the one hand, characterizing those who desire the best performance possible as "simple-minded" is, at best, a gross over-generalization. Like you, I work in a field where optimization is king (e.g., in machine translation, program runtimes are measured in days).

But on the other hand, there are quite a lot of people who focus excessively on "optimization" when the actual differences for the code they're writing are either negligible (e.g., because of I/O bottlenecks) or uninteresting (e.g., a 2x slowdown is a nuisance rather than a crisis). For those who focus on optimization when the benefits are negligible or uninteresting, I think it's fair to characterize that focus as "simple-minded". This focus seems especially common when people start talking about "which language is faster"--- as opposed to, say, which library is faster, or which implementation of a given algorithm is faster. In some cases the question of language speed is legitimate, but IME it's far more often used to prop up prejudices about which language is "better"--- i.e., which group of human beings (defined by their choice of programming language) is "better".

I agree that, as a community, we need to come to a realistic understanding of the performance characteristics of Haskell compared to C etc. I don't like prejudice and propaganda for Haskell any more than I like prejudice and propaganda for C etc. In some cases it's true that Haskell out-performs C (or C++, or Java); but in many cases it does not, and we need to acknowledge that. The real problem, I feel, is that it's not always clear which category any given program falls into. In some cases the "idiomatic" Haskell is the fast one, in some cases its the slow one; but in all cases, what is meant by "idiomatic Haskell" is a moving target with poorly specified meaning.

The advent of ByteStrings was a major coup in figuring out exactly where and why Haskell is slow and how to fix it. Iteratees are another one, though the details are still being worked out. However, things like strictness annotations on (non-accumulator) function arguments, the choice whether to use ST or not, the choice whether to CPS-transform or not, etc, are still very much a black art. Even with a thorough understanding of the STG-machine and the GHC compilation pipeline, it's not always clear whether they will help or hurt. ST in particular is especially finicky, to the point where I wonder if it's ever a win (outside of in-place updates on arrays).

So while I think most language performance comparisons are dubious to begin with, I don't think the comparison of Haskell to C++ (or C, or Java,...) can even be construed as something with a meaningful answer. Haskell is just too different, and its in-the-large cost model is too poorly understood. I'm not even aware of any helpful generalizations over comparing two specific programs. Even when restricting to things like parsing or compute-intensive numeric code, the comparison comes back with mixed answers.

-- Live well, ~wren

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

19 May 19 May

5:57 a.m.

On 5/16/12 7:43 AM, Yves Parès wrote:

...

...
On the one hand, characterizing those who desire the best performance possible as "simple-minded" is, at best, a gross over-generalization. Like you, I work in a field where optimization is king (e.g., in machine translation, program runtimes are measured in days).

You misread the logical implication, Ertugrul said:

...
simple-minded people often have a desire to get the best performance possible

For the record, I was replying to Ryan's response to Ertugrul, rather than to Ertugrul directly. I make no claims about what Ertugrul said or the (in)validity thereof. However, while the "logical" interpretation of Ertugrul's words may be that simple-mindedness implies performance-desire, that interpretation is not the only one available to the standard interpretation of his words, nor IMO the dominant interpretation. It is equally valid to interpret them as saying that the people under discussion are simpletons, and that those people desire the best performance possible. (I.e., an attributive rather than restrictive reading of the adjective.) This latter interpretation is perfectly valid ---as the semantics of the utterance---, but is pejorative of the people under discussion; and that pejoration is what Ryan was (fairly) calling Ertugrul out on. While I think it was fair for Ryan to call Ertugrul out on the matter, I also thought the subject warranted further discussion since the pejorative claim comes from somewhere, and so dismissing it entirely fails to address the underlying issue. Not that I think Ryan was dismissing it outright, just that it would be easy to interpret his words as doing so. -- Live well, ~wren

Ertugrul Söylemez

6:57 a.m.

wren ng thornton wrote:

...

However, while the "logical" interpretation of Ertugrul's words may be that simple-mindedness implies performance-desire, that interpretation is not the only one available to the standard interpretation of his words, nor IMO the dominant interpretation. It is equally valid to interpret them as saying that the people under discussion are simpletons, and that those people desire the best performance possible. (I.e., an attributive rather than restrictive reading of the adjective.) This latter interpretation is perfectly valid ---as the semantics of the utterance---, but is pejorative of the people under discussion; and that pejoration is what Ryan was (fairly) calling Ertugrul out on.

Well, the meaning was the logical one: Simple-mindedness implies desire for maximum performance possible. Even that is an overgeneralization. People can be simple-minded in many ways. I hoped that my point would come across. Ryan got right that it was indeed an accusation. I did not specify to what group it was directed, which was probably the reason for the confusion. The unconditional desire for maximum possible object code performance is usually very stupid, not to mention impossible to reach with any high level language and any multi-tasking operating system. To get the maximum possible performance you must write an ring 0 application in assembly language and boot directly into it. Haskell delivers reasonable performance for almost all non-embedded applications, and for the extreme edge cases one can still switch to C or assembly using the FFI. Haskell's average penalty compared to C is no reason to write the entire application in C. That was my main point. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Ryan Newton

21 May 21 May

2:33 p.m.

...

The unconditional desire for maximum possible object code performance is usually very stupid, not to mention impossible to reach with any high level language and any multi-tasking operating system.

Definitely. I don't know if we have a catchy term for "kneejerk optimization" or if it falls under the broader umbrella of "premature optimization" [including misdirected or unneeded optimization]. I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

...

Haskell's average penalty compared to C is no reason to write the entire application in C.

Yes, this seems to be a separate disease. Not just using low-level langs, per se, but using them for *everything*. I have worked at places in industry where teams automatically use C++ for everything. For example, they use it for building all complete GUI applications, which surprises me a little bit. I would have thought in recent years that almost everyone was using *something* else (Java,Python, whatever) to do much of the performance-non-critical portions of their application logic. -Ryan

Ertugrul Söylemez

2:43 p.m.

Ryan Newton wrote:

...

I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

Not necessarily. For example the 'nub' function from Data.List could be much faster. Unfortunately this would also change its type. O(n²) complexity is really the best you can get with the Eq constraint. You have to change to Ord for better performance. In other words: Some optimizations change the semantics, and semantics is taken very seriously in Haskell, for which I'm grateful. Greets, Ertugrul -- Key-ID: E5DD8D11 "Ertugrul Soeylemez " FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/

Yves Parès

2:53 p.m.

...

Not necessarily. For example the 'nub' function from Data.List could be much faster. Unfortunately this would also change its type. O(n²) complexity is really the best you can get with the Eq constraint.

Why not in that kind of cases provide a second function (named differently), together with the original function, and specify they're differences (i.e. wrt performances) in the doc? It seems like a pretty quick and honest trade-off to me. 2012/5/21 Ertugrul Söylemez

...

Ryan Newton wrote:

...
I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

Not necessarily. For example the 'nub' function from Data.List could be much faster. Unfortunately this would also change its type. O(n²) complexity is really the best you can get with the Eq constraint. You have to change to Ord for better performance.

In other words: Some optimizations change the semantics, and semantics is taken very seriously in Haskell, for which I'm grateful.

Greets, Ertugrul

-- Key-ID: E5DD8D11 "Ertugrul Soeylemez " FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thomas DuBuisson

3:39 p.m.

On Mon, May 21, 2012 at 7:53 AM, Yves Parès wrote:

...

...
Not necessarily. For example the 'nub' function from Data.List could be much faster. Unfortunately this would also change its type. O(n²) complexity is really the best you can get with the Eq constraint.

Why not in that kind of cases provide a second function (named differently), together with the original function, and specify they're differences (i.e. wrt performances) in the doc? It seems like a pretty quick and honest trade-off to me.

WRT nub, Bart Massey did exactly this in his "nubOrd" proposal. He obtained consensus then failed to finish the ticket [1]. If this particular case is of interest to you or anyone else then I suggest you take the patches, re-propose and see it finished. If you are interested in this general category of issue, I think this is a case study in how costly even our seemingly light weight proposals process is in terms of proposer time investment. Cheers, Thomas [1] http://hackage.haskell.org/trac/ghc/ticket/2717

...

2012/5/21 Ertugrul Söylemez

...
Ryan Newton wrote:

...
I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

Not necessarily. For example the 'nub' function from Data.List could be much faster. Unfortunately this would also change its type. O(n²) complexity is really the best you can get with the Eq constraint. You have to change to Ord for better performance.

In other words: Some optimizations change the semantics, and semantics is taken very seriously in Haskell, for which I'm grateful.

Greets, Ertugrul

-- Key-ID: E5DD8D11 "Ertugrul Soeylemez " FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Yves Parès

2:51 p.m.

...

I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

And even before optimizing "like crazy", I think the functions that are "more often" good should be emphasized: Prelude should export foldl' together with/instead of foldl, sum should have its sum' counterpart (or even be rewritten to use foldl'), and so on... It baffles me that functions that are said to be more efficient in the majority of cases are not the default.

...

I have worked at places in industry where teams automatically use C++ for everything.

Have they looked at you like if you were an alien (and even said you were not a serious developper) when you emitted the possibility of evaluating the feasibility of using/making a more expressive language for a specific task? 2012/5/21 Ryan Newton

...

The unconditional desire for maximum possible object code

...
performance is usually very stupid, not to mention impossible to reach with any high level language and any multi-tasking operating system.

Definitely. I don't know if we have a catchy term for "kneejerk optimization" or if it falls under the broader umbrella of "premature optimization" [including misdirected or unneeded optimization].

I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

...
Haskell's average penalty compared to C is no reason to write the entire application in C.

Yes, this seems to be a separate disease. Not just using low-level langs, per se, but using them for *everything*. I have worked at places in industry where teams automatically use C++ for everything. For example, they use it for building all complete GUI applications, which surprises me a little bit. I would have thought in recent years that almost everyone was using *something* else (Java,Python, whatever) to do much of the performance-non-critical portions of their application logic.

-Ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

23 May 23 May

3:23 a.m.

On 5/21/12 10:51 AM, Yves Parès wrote:

...

...
I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

And even before optimizing "like crazy", I think the functions that are "more often" good should be emphasized: Prelude should export foldl' together with/instead of foldl, sum should have its sum' counterpart (or even be rewritten to use foldl'), and so on... It baffles me that functions that are said to be more efficient in the majority of cases are not the default.

Part of this is due to legacy and the Haskell Report. For example, the definition of sum is specified by the Report. And unfortunately there exist (rare) cases where the semantics of sum and sum' differ: namely when the type supports lazy addition (e.g., Peano numbers) and therefore can return a partial answer to an infinite summation. That said, the GHC philosophy in most of these cases has been to: (a) use their own definitions with a CPP flag USE_REPORT_PRELUDE in case you *really* want the Report's definition[1], and (b) to "secretly" replace foldl by foldl' in cases where it can determine the function argument to be strict (and therefore that the change doesn't alter the semantics). That doesn't fix everything, and it's no justification for not having the newer Reports give more sane definitions, but it's better than nothing. Part of the issue re fixing the Report is that there's a tension between correctness and expediency, as well as between different notions of "correctness". By and large, Haskell has managed to stay out of theoretical arguments about choosing what "correct" means when it is still controversial in mathematics. (Whence many of the complaints about the non-mathematical nature of the numeric type classes.) But which of the foldr vs foldl vs foldl' definitions of summation is the "correct" definition? Mathematically, infinite summations should be allowed, and summations of infinite quantities should be allowed. However, pragmatically, infinite summations are of a different nature than finite summations; e.g., when they converge it'd be nice to evaluate them in finite time. So on the one hand, a naive view of "correctness" suggests that we ought to allow these summations as just more of the same; but on the other hand, an alternative notion of "correctness" suggests that while infinite sums and sums of infinites should be handled, they should be handled by different means than traditional finite sums of finite values. The foldl' definition of summation only allows finite summations of finite[2] values; the foldl definition only allows finite summations, but they can be summations of infinite values; the foldr definition allows both infinite values and infinitely many summands, though it's not especially useful for handling infinite summations[3]. We can probably exclude foldr with all fairness, and tell folks to handle infinite summations in another way. But between foldl and foldl' there's (room for) a lot more debate about which should be blessed by the Prelude. Do we want to support lazy numbers out of the box, or do we want to support performance for strict numbers out of the box? While there's been a growing strictification of Haskell for performance reasons, do recall that laziness was one of the initial reasons for defining Haskell in the first place. Hence the old Report's definition. The role of Haskell as a research engine has moved on since then, but the decision to codify that is still non-trivial. [1] http://hackage.haskell.org/packages/archive/base/4.5.0.0/doc/html/src/Data-L... [2] For the purpose of discussion, I'm considering the Float and Double values for infinities to be "finite", because they are identical to the finite values with respect to strictness and size of representation. [3] It could take infinitely long to return, even for types which allow lazily returning partial values. For example, data Peano = Z | S Peano instance Num Peano where ... zeros = Z : zeros -- Has to traverse the whole list, just in case there's a (S _) in -- there somewhere, before it knows whether to return Z or (S _). main = print (foldr (+) Z zeros) -- Live well, ~wren

Yves Parès

7:51 a.m.

I understand your concerns about modifying the current ideology, it's fair enough. Actually I'm myself more in favor of adding strict couterparts, and export them conjointly, to support both the mathematical roots and the performances of the operations that are done most of time (Which kind of numbers do developers work with more often? Regular Ints and Doubles or Peano lazy numbers?) 2012/5/23 wren ng thornton

...

On 5/21/12 10:51 AM, Yves Parès wrote:

...
I do think we have the opposite problem, however, in much Haskell code --

...
people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

And even before optimizing "like crazy", I think the functions that are "more often" good should be emphasized: Prelude should export foldl' together with/instead of foldl, sum should have its sum' counterpart (or even be rewritten to use foldl'), and so on... It baffles me that functions that are said to be more efficient in the majority of cases are not the default.

Part of this is due to legacy and the Haskell Report. For example, the definition of sum is specified by the Report. And unfortunately there exist (rare) cases where the semantics of sum and sum' differ: namely when the type supports lazy addition (e.g., Peano numbers) and therefore can return a partial answer to an infinite summation.

That said, the GHC philosophy in most of these cases has been to:

(a) use their own definitions with a CPP flag USE_REPORT_PRELUDE in case you *really* want the Report's definition[1], and

(b) to "secretly" replace foldl by foldl' in cases where it can determine the function argument to be strict (and therefore that the change doesn't alter the semantics).

That doesn't fix everything, and it's no justification for not having the newer Reports give more sane definitions, but it's better than nothing.

Part of the issue re fixing the Report is that there's a tension between correctness and expediency, as well as between different notions of "correctness". By and large, Haskell has managed to stay out of theoretical arguments about choosing what "correct" means when it is still controversial in mathematics. (Whence many of the complaints about the non-mathematical nature of the numeric type classes.) But which of the foldr vs foldl vs foldl' definitions of summation is the "correct" definition? Mathematically, infinite summations should be allowed, and summations of infinite quantities should be allowed. However, pragmatically, infinite summations are of a different nature than finite summations; e.g., when they converge it'd be nice to evaluate them in finite time. So on the one hand, a naive view of "correctness" suggests that we ought to allow these summations as just more of the same; but on the other hand, an alternative notion of "correctness" suggests that while infinite sums and sums of infinites should be handled, they should be handled by different means than traditional finite sums of finite values.

The foldl' definition of summation only allows finite summations of finite[2] values; the foldl definition only allows finite summations, but they can be summations of infinite values; the foldr definition allows both infinite values and infinitely many summands, though it's not especially useful for handling infinite summations[3]. We can probably exclude foldr with all fairness, and tell folks to handle infinite summations in another way. But between foldl and foldl' there's (room for) a lot more debate about which should be blessed by the Prelude. Do we want to support lazy numbers out of the box, or do we want to support performance for strict numbers out of the box? While there's been a growing strictification of Haskell for performance reasons, do recall that laziness was one of the initial reasons for defining Haskell in the first place. Hence the old Report's definition. The role of Haskell as a research engine has moved on since then, but the decision to codify that is still non-trivial.

[1] http://hackage.haskell.org/**packages/archive/base/4.5.0.0/** doc/html/src/Data-List.html#**sumhttp://hackage.haskell.org/packages/archive/base/4.5.0.0/doc/html/src/Data-L...

[2] For the purpose of discussion, I'm considering the Float and Double values for infinities to be "finite", because they are identical to the finite values with respect to strictness and size of representation.

[3] It could take infinitely long to return, even for types which allow lazily returning partial values. For example,

data Peano = Z | S Peano instance Num Peano where ...

zeros = Z : zeros

-- Has to traverse the whole list, just in case there's a (S _) in -- there somewhere, before it knows whether to return Z or (S _). main = print (foldr (+) Z zeros)

-- Live well, ~wren

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

Sam Martin

21 May 21 May

3:47 p.m.

...

Yes, this seems to be a separate disease. Not just using low-level langs, per se, but using them for *everything*. I have worked at places in industry where teams automatically use C++ for everything. For example, they use it for building all complete GUI applications, which surprises me a little bit. I would have thought in recent years that almost everyone was using *something* else (Java,Python, whatever) to do much of the performance-non-critical portions of their application logic.

I think "disease" might be overstating this somewhat :) In defence of using C++ for everything: Interfacing different languages is not exactly trivial, and in some cases, impossible. If you are writing a program or system that has significant performance requirements, you might just be better off doing the whole thing in C/C++ and living with the annoyance of doing GUIs, or whatever, in C++. The overhead of pulling in another language may simply outweigh the convenience. In addition, it's pretty much impossible for me to use Haskell to write portions (or all of) a game that would include a console version. This is largely for build system and platform support issues. Doing everything in C++ is nearly the only option here. This situation could be improved though, by making it far easier to embed Haskell within C/C++. It's not difficult by design, but there are large engineering obstacles in the way, and it's hard to see where the effort to remove these could come from. But then it would be more plausible to argue that people are missing out by not using a mix of C++ and Haskell. Cheers, Sam

Yves Parès

4:27 p.m.

...

If you are writing a program or system that has significant performance requirements, you might just be better off doing the whole thing in C/C++ and living with the annoyance of doing GUIs

...

In addition, it's pretty much impossible for me to use Haskell to write

I fail to see how the GUI part would suffer from lack of performance if the rest of the system is fine. I would hate to be bold, but to me this case sounds a little bit like "MVC done wrong" if the breaking GUI apart from the rest of the software is really that impossible. Anyway only benchmarks could tell if in such case the coding of the GUI in Haskell and the integration with the rest burns the performances to the ground. portions (or all of) a game that would include a console version. This is largely for build system and platform support issues. Doing everything in C++ is nearly the only option here. I worked in a video game company too (I refered to it when I answered to Ryan about companies automatically using C++), and I agree, the first unbreakable obstacle to the implementation of some parts of the application in Haskell (or in anything else than C/C++) that comes to mind is the fact that in the end it must run not only on personal computers. The main issue is that those systems are way too closed by their manufacturers. Second issue may be RAM (way scarcer than on PCs: e.g. 512Mo in all for the PS3), but --again-- benchmarks wrt that would be more enlightening than my own opinion. 2012/5/21 Sam Martin

...

...
Yes, this seems to be a separate disease. Not just using low-level langs, per se, but using them for *everything*. I have worked at places in industry where teams automatically use C++ for everything. For example, they use it for building all complete GUI applications, which surprises me a little bit. I would have thought in recent years that almost everyone was using *something* else (Java,Python, whatever) to do much of the performance-non-critical portions of their application logic.

I think "disease" might be overstating this somewhat :) In defence of using C++ for everything: Interfacing different languages is not exactly trivial, and in some cases, impossible.

If you are writing a program or system that has significant performance requirements, you might just be better off doing the whole thing in C/C++ and living with the annoyance of doing GUIs, or whatever, in C++. The overhead of pulling in another language may simply outweigh the convenience.

In addition, it's pretty much impossible for me to use Haskell to write portions (or all of) a game that would include a console version. This is largely for build system and platform support issues. Doing everything in C++ is nearly the only option here.

This situation could be improved though, by making it far easier to embed Haskell within C/C++. It's not difficult by design, but there are large engineering obstacles in the way, and it's hard to see where the effort to remove these could come from. But then it would be more plausible to argue that people are missing out by not using a mix of C++ and Haskell.

Cheers, Sam

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Stephen Tetley

5:08 p.m.

On 21 May 2012 17:27, Yves Parès wrote:

...

I fail to see how the GUI part would suffer from lack of performance if the rest of the system is fine. I would hate to be bold, but to me this case sounds a little bit like "MVC done wrong" if the breaking GUI apart from the rest of the software is really that impossible.

A few years ago one of the architects at Adobe published some slides on the software engineering aspects of PhotoShop, unfortunately I couldn't find them on the web when I tried recently but I believe it stated the codebase was well over 1 million lines of C++ and the GUI (including Adobe's own frameworks) accounted for more than half of that... GUI's often *are* the program rather than a way in to use it.

Isaac Gouy

6:42 p.m.

...

From: Stephen Tetley

...

Sent: Monday, May 21, 2012 10:08 AM Subject: Re: [Haskell-cafe] Can Haskell outperform C++?

On 21 May 2012 17:27, Yves Parès wrote:

...
I fail to see how the GUI part would suffer from lack of performance if the rest of the system is fine. I would hate to be bold, but to me this case sounds a little bit like "MVC done wrong" if the breaking GUI apart from the rest of the software is really that impossible.

A few years ago one of the architects at Adobe published some slides on the software engineering aspects of PhotoShop, unfortunately I couldn't find them on the web when I tried recently but I believe it stated the codebase was well over 1 million lines of C++ and the GUI (including Adobe's own frameworks) accounted for more than half of that...

GUI's often *are* the program rather than a way in to use it.

What about a more recent code base, this is from a 2008 presentation on Adobe Lightroom 2: - "63% of the main Lightroom-team authored code is Lua" - "16% C++" - "12% ObjC" - "9% C" http://www.troygaul.com/LrExposedC4.html

Gregg Lebovitz

16 May 16 May

2:40 p.m.

Wren, I see at least three different issues being discussed here. I think it is important to delineate them: 1) Does Haskell and its libraries need performance improvements? Probably yes. Some of the performance issues seem to be related to the way the language is implemented and others by how it is defined. Developers really do run into performance issues with Haskell and either learn to work around the issue or try to fix the offending implementation. The wiki performance page gives insight into some of the performance issues and how address them. 2) Are language performance comparisons useful? They can be, if the comparison is focusing on what each language does best. They aren't useful if they focus on benchmarks that aren't relevant to the target domain of each language. I think the problem with current comparisons, is that they are designed to favor imperative languages. 3) Are the negative perceptions generated by popular performance comparisons important? Only if you care about the broader adoption of the language. If you don't care, then the point is moot. If you do care, then yes the comparisons are unfair and simple minded, but they do have an affect on the percepts of the language. It is not uncommon for management to raise roadblocks to the introduction of new technologies due to flawed reports that were published in popular magazines. If Haskell were a product and I was responsible for its success, then I would be using common marketing techniques to combat the negative perceptions. One is a technique called "changing the playing field" which means producing documents that reset the criteria for the evaluations and comparisons. This is not "deception", but merely making sure that potential users understand where and how to make the proper comparisons, or more importantly that my "champion" was well armed with documentation to counter argue popular but flawed comparisons. On 5/16/2012 6:54 AM, wren ng thornton wrote:

...

Haskell is just too different, and its in-the-large cost model is too poorly understood. I'm not even aware of any helpful generalizations over comparing two specific programs. Even when restricting to things like parsing or compute-intensive numeric code, the comparison comes back with mixed answers.

Kevin Charter

5:44 p.m.

On Wed, May 16, 2012 at 8:40 AM, Gregg Lebovitz wrote:

...

1) Does Haskell and its libraries need performance improvements? Probably yes. Some of the performance issues seem to be related to the way the language is implemented and others by how it is defined. Developers really do run into performance issues with Haskell and either learn to work around the issue or try to fix the offending implementation. The wiki performance page gives insight into some of the performance issues and how address them.

I think there is a closely-related issue: that you'll need to learn how to debug subtle performance problems *early* in your Haskell programming career. In particular - you will have space leaks and related performance problems in naively-written programs - you will consequently need to learn how to use strictness annotations and other related language features, and how to use the profiler to determine where they're needed For example, imagine you're new to the language, and as an exercise decide to write a program that counts the characters on standard input and writes the count to standard output. A naive program in, say, Python will probably use constant space and be fairly fast. A naive program in Haskell stands a good chance of having a space leak, building a long chain of thunks that isn't forced until it needs to write the final answer. On small inputs, you won't notice. The nasty surprise comes when your co-worker says "cool, let's run it on this 100 MB log file!" and your program dies a horrible death. If your friend is a sceptic, she'll arch an eyebrow and secretly think your program -- and Haskell -- are a bit lame. The example is contrived, but it's a very common task to consume some big stream of input and produce a much smaller summary of it. I think it's fair to say that you have to be a slightly sophisticated Haskell programmer to do those kinds of things correctly, at least compared to more mainstream languages. My experience is that this is perhaps the most important 'problem' with Haskell performance. Not so much that it's typically two or three or ten times slower than language X, but that it's easy to have a bad experience *early on*, one that seems inconsistent with the language shoot-out and other performance comparisons. Kevin -- Kevin Charter kevin.charter@acm.org

Gregg Lebovitz

7:17 p.m.

Kevin Charter

8 p.m.

On Wed, May 16, 2012 at 1:17 PM, Gregg Lebovitz wrote:

...

Also interesting is that in all my interviews, GHC performance was never raised. No one said "I have to drop into C to solve that performance problem."

That's been my experience too. I've so far been able to get acceptable performance with GHC when I've made good use of the tools for finding space and time leaks. And it hasn't been hard, just something I had to learn to do.

...

Gregg

Kevin -- Kevin Charter kevin.charter@acm.org

Ben Gamari

8 p.m.

Kevin Charter writes: snip

...

For example, imagine you're new to the language, and as an exercise decide to write a program that counts the characters on standard input and writes the count to standard output. A naive program in, say, Python will probably use constant space and be fairly fast. A naive program in Haskell stands a good chance of having a space leak, building a long chain of thunks that isn't forced until it needs to write the final answer. On small inputs, you won't notice. The nasty surprise comes when your co-worker says "cool, let's run it on this 100 MB log file!" and your program dies a horrible death. If your friend is a sceptic, she'll arch an eyebrow and secretly think your program -- and Haskell -- are a bit lame.

I, for one, can say that my initial impressions of Haskell were quite scarred by exactly this issue. Being in experimental physics, I often find myself writing data analysis code doing relatively simple manipulations on large(ish) data sets. My first attempt at tackling a problem in Haskell took nearly three days to get running with reasonable performance. I can only thank the wonderful folks in #haskell profusely for all of their help getting through that period. That being said, it was quite frustrating at the time and I often wondered how I could tackle a reasonably large project if I couldn't solve even a simple problem with halfway decent performance. If it weren't for #haskell, I probably would have given up on Haskell right then and there, much to my deficit. While things have gotten easier, even now, nearly a year after writing my first line of Haskell, I still occassionally find a performance issue^H^H^H^H surprise. I'm honestly not really sure what technical measures could be taken to ease this learning curve. The amazing community helps quite a bit but I feel that this may not be a sustainable approach if Haskell gains greater acceptance. The profiler is certainly useful (and much better with GHC 7.4), but there are still cases where the performance hit incurred by the profiling instrumentation precludes this route of investigation without time consuming guessing at how to pare down my test case. It's certainly not an easy problem. Cheers, - Ben

Gregg Lebovitz

8:37 p.m.

Ben, This is precisely the kind of problem I am currently thinking about and why I asked for pointers to documents on best practices. The current model for gaining the necessary experience to use Haskell effectively does not scale well. Here are some ideas on how to address the knowledge issue: 1) Outstanding best practices documents that capture this knowledge and provides useful answers. Organizing this information in an online document that can be searched by keyword or index would be really helpful. The hard part will be maintaining it. As someone already pointed out the wiki entry on performance is already dated. 2) Training presentations derived from #1 that provides lots of examples. 3) Online Training Videos based on #2 above 4) Profiling tools that uses the captured knowledge above, highlights problem areas, and provides guidance on correcting them. As I mentioned in a previous email. I am willing to take on organization #1, but I will need help from people on this list. I can start with the performance wiki, but we will still need examples of where people get into trouble and the ways around them. Gregg On 5/16/2012 4:00 PM, Ben Gamari wrote:

...

Kevin Charter writes:

snip

...
For example, imagine you're new to the language, and as an exercise decide to write a program that counts the characters on standard input and writes the count to standard output. A naive program in, say, Python will probably use constant space and be fairly fast. A naive program in Haskell stands a good chance of having a space leak, building a long chain of thunks that isn't forced until it needs to write the final answer. On small inputs, you won't notice. The nasty surprise comes when your co-worker says "cool, let's run it on this 100 MB log file!" and your program dies a horrible death. If your friend is a sceptic, she'll arch an eyebrow and secretly think your program -- and Haskell -- are a bit lame.

I, for one, can say that my initial impressions of Haskell were quite scarred by exactly this issue. Being in experimental physics, I often find myself writing data analysis code doing relatively simple manipulations on large(ish) data sets. My first attempt at tackling a problem in Haskell took nearly three days to get running with reasonable performance. I can only thank the wonderful folks in #haskell profusely for all of their help getting through that period. That being said, it was quite frustrating at the time and I often wondered how I could tackle a reasonably large project if I couldn't solve even a simple problem with halfway decent performance. If it weren't for #haskell, I probably would have given up on Haskell right then and there, much to my deficit.

While things have gotten easier, even now, nearly a year after writing my first line of Haskell, I still occassionally find a performance issue^H^H^H^H surprise. I'm honestly not really sure what technical measures could be taken to ease this learning curve. The amazing community helps quite a bit but I feel that this may not be a sustainable approach if Haskell gains greater acceptance. The profiler is certainly useful (and much better with GHC 7.4), but there are still cases where the performance hit incurred by the profiling instrumentation precludes this route of investigation without time consuming guessing at how to pare down my test case. It's certainly not an easy problem.

Cheers,

- Ben

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

19 May 19 May

8:15 a.m.

On 5/16/12 4:37 PM, Gregg Lebovitz wrote:

...

1) Outstanding best practices documents that capture this knowledge and provides useful answers. Organizing this information in an online document that can be searched by keyword or index would be really helpful. The hard part will be maintaining it. As someone already pointed out the wiki entry on performance is already dated.

In light of that fact, and the general high-pace of innovation in Haskell, I think that rather than trying to keep a wiki up-to-date, it would probably be better to create a series of time-stamped "formal" documents--- like how HCAR is published. The benefit of such an approach is two-fold. On the one hand, there's a date right there which shows when the contents were relevant (e.g., on which version of GHC one particular style performed better than another). On the other hand, when putting out the next issue/version the authors are forced to revisit the old content and ask whether it's still relevant or whether the tides have shifted; and if they're not willing to commit one way or another, the content can be dropped without eliding the fact (written in the previous issue) that it used to be an optimization. Another thing I think is important is to give the actual reasons why some particular thing is an optimization, and more particularly why it is no longer an optimization once things have changed. A big issue I've run into with blog-posts etc on performance in Haskell is that there's a lot of folkloreism and just-so stories which tell people "just do X", without explaining how X differs from Y, why X is better or worse than Y, or the circumstances under which you may prefer Y to X. Without providing the details about why something is an optimization/pessimization, we don't provide users with the tools to figure things out for themselves in this everchanging world. -- Live well, ~wren

Yves Parès

16 May 16 May

8:45 p.m.

...

The profiler is certainly useful (and much better with GHC 7.4)

What are the improvements in that matter? (I just noticed that some GHC flags wrt profiling have been renamed) 2012/5/16 Ben Gamari

...

Kevin Charter writes:

...
For example, imagine you're new to the language, and as an exercise decide to write a program that counts the characters on standard input and writes the count to standard output. A naive program in, say, Python will

snip probably

...
use constant space and be fairly fast. A naive program in Haskell stands a good chance of having a space leak, building a long chain of thunks that isn't forced until it needs to write the final answer. On small inputs, you won't notice. The nasty surprise comes when your co-worker says "cool, let's run it on this 100 MB log file!" and your program dies a horrible death. If your friend is a sceptic, she'll arch an eyebrow and secretly think your program -- and Haskell -- are a bit lame.

I, for one, can say that my initial impressions of Haskell were quite scarred by exactly this issue. Being in experimental physics, I often find myself writing data analysis code doing relatively simple manipulations on large(ish) data sets. My first attempt at tackling a problem in Haskell took nearly three days to get running with reasonable performance. I can only thank the wonderful folks in #haskell profusely for all of their help getting through that period. That being said, it was quite frustrating at the time and I often wondered how I could tackle a reasonably large project if I couldn't solve even a simple problem with halfway decent performance. If it weren't for #haskell, I probably would have given up on Haskell right then and there, much to my deficit.

While things have gotten easier, even now, nearly a year after writing my first line of Haskell, I still occassionally find a performance issue^H^H^H^H surprise. I'm honestly not really sure what technical measures could be taken to ease this learning curve. The amazing community helps quite a bit but I feel that this may not be a sustainable approach if Haskell gains greater acceptance. The profiler is certainly useful (and much better with GHC 7.4), but there are still cases where the performance hit incurred by the profiling instrumentation precludes this route of investigation without time consuming guessing at how to pare down my test case. It's certainly not an easy problem.

Cheers,

- Ben

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ben Gamari

9:18 p.m.

Yves Parès writes:

...

...
The profiler is certainly useful (and much better with GHC 7.4)

What are the improvements in that matter? (I just noticed that some GHC flags wrt profiling have been renamed)

The executive summary can be found in the release notes[1]. There was also a talk I remember watching a while ago which gave a pretty nice overview. I can't recall, but I might have been this[2]. Lastly, profiling now works with multiple capabilities[3]. Cheers, - Ben [1] http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/release-7-4-1.html [2] http://www.youtube.com/watch?v=QBFtnkb2Erg [3] https://plus.google.com/107890464054636586545/posts/hdJAVufhKrD

Ertugrul Söylemez

6 May 6 May

11:40 a.m.

"Janek S." wrote:

...

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

While Haskell usually gets close to C/C++ speed for number crunching algorithms it seldomly beats them. Of course the execution model gives potential to do so, but currently GHC doesn't do enough low level optimization. It is amazing at high level optimization so you often get similar speeds for much less code that is close to the problem domain. Combined with the fact that it's more difficult to write incorrect programs in Haskell there is an overall save in time to get to the result and that is amplified whenever you have to change the program later. Where Haskell really gets beyond C and C++ is in concurrency. Multithreaded network servers written in Haskell are not only a lot easier to write, but they also perform better than well written server programs in C/C++. This is because Haskell's concurrency goes well with its parallel execution model, where in machine code you don't really have a notion of procedures. You have thunks and they can not only be executed in parallel, but they can also be moved between threads freely. Imagine that in C/C++ you would move a client to another thread in the middle of a function. The Haskell run-time system does this all the time to distribute computing time evenly between all CPUs. Of course this preemptive execution model can be replicated in C/C++, but that is an amazingly difficult task, so difficult that you would practically write very verbose Haskell in C/C++. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

wren ng thornton

11:32 p.m.

On 5/6/12 2:40 AM, Janek S. wrote:

...

Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

For a quantitative comparison in a very narrow domain, check out: Duncan Coutts, Don Stewart, Roman Leshchinskiy (2007). Rewriting Haskell Strings. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166 Fundamentally, neither language can be faster than the other since it's possible to hack around in assembly code with both of them. Therefore, asking which is faster when implementing identical algorithms is not a meaningful question. In reality, most people don't spend time microoptimizing assembly code these days; instead, people implement whatever seems good enough given the constraints on development time and execution time. Thus, the real question should be: which algorithms are made easy to implement by the language?--- either in terms of raw syntax (hence maintainability), or in terms of man-hours worked (hence development time). The ease of parallelism in Haskell is a prime example of the fact that, while all of it is technically possible to re-implement in C with identical performance, Haskell is simply better because the parallel code is already implemented and programs using it are statically checked for type safety, which reduces the development time significantly. But for the most part, that isn't a comparison of languages, it's a comparison of libraries (for which the language of Haskell facilitates making good libraries). Once you approach the real question and try to quantify it, you're going to run into the issues that arise any time you try to quantify human populations. What works well for one project or company is not necessarily going to generalize to a different set of developers; thus, in order to ensure ecological validity in your comparison, you'll need to compare lots of different kinds of groups. But of course, your ability to do so is biased by the social environment; e.g., imperative languages are more mainstream and therefore afford different developer communities than functional languages do. Thus, not only can you not come up with a simple metric that does justice to the question, but the answer to the question is going to evolve and change over time ---even if the underlying languages and compilers do not change--- because society changes and evolves over time. For as often as this question is asked, and for as much as it could be interesting to actually get an answer, this isn't the sort of research that computer scientists are (generally) trained to do. Which is why it isn't generally done. -- Live well, ~wren

Silvio Frischknecht

8 May 8 May

5:01 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...

Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program?

http://shootout.alioth.debian.org/ compares a lot of programming languages on a lot of problems. Haskell is never as fast as C++, but for some problems it comes close. Also I challenge anyone to improve one of the haskell programs there. It'd be cool if we could make haskell get a higher rank. I recently managed to improve the Fasta algorithm, but not by much. Also I think the benchmarks don't use llvm flag. It says somewhere that they don't measure llvm because they don't have time. But I think they are refering to clang. So maybe someone should tell them to turn on the llvm flag since it makes a lot of haskell programs faster. Silvio -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJPqKicAAoJEDLsP+zrbatWmSIP+gNSI61KMvY2VRsWRhd7+j5U YAO3CnBTt6lSbMNplFf5AZnbPnY3NVJSG2ZUN12n7ZcaOawmwub3H+N9e0XTXv38 vOEGzFb/t/OTIx4GHXiWz6kZfzyiTVQpEhzoqx/ZX4KZqrUBt5ZuSpmWtPrPXCfF joCZiBZIwxfOkI/l1zsb8vW3Xaqxs9dfRQOJEf7GLB5zGXwUA2ZLWG5HYvAUHu0G 9xB9cBgaX7HKdBwo3af2WZEFznZnZ1LUpc8TuacL54wVmLpLNJU2EaeqH2LsH0R3 VxX9yU1TIQUBubDa1Tui73xJ2L4Ns7AVD7Kx14yK5cu61wpz/KeUOU/wgedp+wNe o54alfqzrfOC+GAJGJFg+3aIkXuij4j1jTXZ+/3ya7nofcBZwtqoZUWCvjYSoEaI xecxuHLCK2pd3zwPk7ZUnG0Mo0vu4ZpVKpCs4u8nPRzVl0101mGkHSVTVXjVP8R/ d3AIPwy74B4nvCk9gohxHwtsvsmzxoRZr7E5XkGDTQdbj0Ly5bJfBW3c1X/jvq9c FHvxCspERGf6S+aX9L6lg9v3/aje/av2q0zUL7jizA4no3q7D/ZvWkmIWF5ySyRh +QrC39I6GHDMvxXn0HIp9m2226sNGL4vpvBTgucJWJcHUX+FdytFIe7VQ0ZvdXvx IjxCrgMAPVy5/TH44PP+ =TaFj -----END PGP SIGNATURE-----

Yves Parès

10:18 a.m.

One thing that baffles me is the comparison Haskell V. Java: http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ghc&lang2=java Would've expected always shorter code and better performances on average. 2012/5/8 Silvio Frischknecht

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...
Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program?

http://shootout.alioth.debian.org/ compares a lot of programming languages on a lot of problems. Haskell is never as fast as C++, but for some problems it comes close.

Also I challenge anyone to improve one of the haskell programs there. It'd be cool if we could make haskell get a higher rank. I recently managed to improve the Fasta algorithm, but not by much. Also I think the benchmarks don't use llvm flag. It says somewhere that they don't measure llvm because they don't have time. But I think they are refering to clang. So maybe someone should tell them to turn on the llvm flag since it makes a lot of haskell programs faster.

Silvio -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPqKicAAoJEDLsP+zrbatWmSIP+gNSI61KMvY2VRsWRhd7+j5U YAO3CnBTt6lSbMNplFf5AZnbPnY3NVJSG2ZUN12n7ZcaOawmwub3H+N9e0XTXv38 vOEGzFb/t/OTIx4GHXiWz6kZfzyiTVQpEhzoqx/ZX4KZqrUBt5ZuSpmWtPrPXCfF joCZiBZIwxfOkI/l1zsb8vW3Xaqxs9dfRQOJEf7GLB5zGXwUA2ZLWG5HYvAUHu0G 9xB9cBgaX7HKdBwo3af2WZEFznZnZ1LUpc8TuacL54wVmLpLNJU2EaeqH2LsH0R3 VxX9yU1TIQUBubDa1Tui73xJ2L4Ns7AVD7Kx14yK5cu61wpz/KeUOU/wgedp+wNe o54alfqzrfOC+GAJGJFg+3aIkXuij4j1jTXZ+/3ya7nofcBZwtqoZUWCvjYSoEaI xecxuHLCK2pd3zwPk7ZUnG0Mo0vu4ZpVKpCs4u8nPRzVl0101mGkHSVTVXjVP8R/ d3AIPwy74B4nvCk9gohxHwtsvsmzxoRZr7E5XkGDTQdbj0Ly5bJfBW3c1X/jvq9c FHvxCspERGf6S+aX9L6lg9v3/aje/av2q0zUL7jizA4no3q7D/ZvWkmIWF5ySyRh +QrC39I6GHDMvxXn0HIp9m2226sNGL4vpvBTgucJWJcHUX+FdytFIe7VQ0ZvdXvx IjxCrgMAPVy5/TH44PP+ =TaFj -----END PGP SIGNATURE-----

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Daniël de Kok

12:43 p.m.

On May 8, 2012, at 12:18 PM, Yves Parès wrote:

...

Would've expected always shorter code

It's not so surprising if you consider that some of the programs are practically imperative programs in Haskell. To give an example: http://shootout.alioth.debian.org/u32/program.php?test=fannkuchredux&lang=ghc&id=3 -- Daniël

Simon Marlow

12:41 p.m.

On 06/05/2012 07:40, Janek S. wrote:

...

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

On the subject of parallelism in particular, there is no fully implicit parallelism in Haskell at the moment. When people ask about this I typically point to this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.145.8183 which shows that it is possible to get some gains doing this, but it is hard and the gains were not great. However, what Haskell does give you is a guarantee that you can safely call any function in parallel (with itself or with something else), as long as the function does not have an IO type. This is about as close to automatic parallelisation as you can practically get. Take any pure function from a library that someone else wrote, and you can use it with parMap or call it from multiple threads, and reasonably expect to get a speedup. Furthermore with parMap you're guaranteed that the result will be deterministic, and there are no race conditions or deadlocks to worry about. Cheers, Simon

Paolino

19 May 19 May

10:44 a.m.

Hello, I would like to add questions to yours. I'm not sure that C++ programs are same performance as C actually, so I can have a bad logic. How much is hard to port a haskell program to C ? If it will become harder and harder, (i.e. for parallelizations) than it's fair to choose haskell for performance, but if it's not, I think it's hard to think that such a high level language could ever compile down to something running faster than its port to C. Many of the abstractions used for nice haskell code are based on boxing values, so we have to take away all the boxing to make it run faster, and so we are porting to C. Will hardware really go for hundreds of cores ? If yes than all languages supporting simple parallelization will be used to code, but this will not make C slower, just more difficult to use in contests. Also a leading haskell point is , how does it take to make a correct code, which can be performance in the wild. So it makes some sense to choose haskell for performance to me, but not to try to compete in sequential algorithmic performance, with the exception for who do it to learn what is behind the abstractions that makes their code slow and not compile down to perfect C. paolino P.S. The performance problems are actually learning haskell programming abstractions and the intuition development on when to use each. 2012/5/6 Janek S. :

...

Hi,

a couple of times I've encountered a statement that Haskell programs can have performance comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell, compiler can reason and make guarantess about the code and use that knowledge to automatically parallelize the program without any explicit parallelizing commands in the code. I haven't seen any sort of evidence that would support such claims. Can anyone provide a code in Haskell that performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of course allowed.

Jan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Richard O'Keefe

21 May 21 May

2:17 a.m.

...

How much is hard to port a haskell program to C ? If it will become harder and harder, (i.e. for parallelizations) than it's fair to choose haskell for performance, but if it's not, I think it's hard to think that such a high level language could ever compile down to something running faster than its port to C.

There is a logic programming language called Mercury; it has strict polymorphic types and strict modes and it supports functional syntax as well as Horn clause syntax. You could think of it as 'strict Clean with unification'. In the early days, they had a list processing benchmark where the idiomatic Mercury version of the program was faster than the idiomatic C version of the program, despite the fact that at the time Mercury was compiling via C. The answer was that the kind of C code being generated by Mercury was not the kind of code any sane programmer would ever have written by hand. It really does depend on how you write it.

...

Will hardware really go for hundreds of cores ?

You can already buy a >700 core machine (I have _no_ idea how many chips are involved in that) for Java.

4795

Age (days ago)

4812

Last active (days ago)

List overview

Download

55 comments

27 participants

participants (27)

Artur
Austin Seipp
Axel
Bartosz Milewski
Ben Gamari
Chris Wong
Daniël de Kok
Dmitry Vyal
Ertugrul Söylemez
Gregg Lebovitz
Isaac Gouy
Ivan Lazar Miljenovic
Janek S.
Jerzy Karczmarczuk
Kevin Charter
Manuel M T Chakravarty
Paolino
Richard O'Keefe
Roman Cheplyaka
Ryan Newton
Sam Martin
Silvio Frischknecht
Simon Marlow
Stephen Tetley
Thomas DuBuisson
wren ng thornton
Yves Parès