Re: [Haskell-cafe] Can Haskell outperform C++?

From: "Richard O'Keefe"
Subject: Re: [Haskell-cafe] Can Haskell outperform C++? To: Haskell Cafe Message-ID: <5F6605A2-DFE0-4AEA-9987-3B07DEF34C42@cs.otago.ac.nz> Content-Type: text/plain; charset=us-ascii On 17/05/2012, at 2:04 PM, Gregg Lebovitz wrote:
Richard,
Thank you. This is an example of what I had in mind when I talked about changing the playing field. Do you have a slide deck for this lecture that you would be willing to share with me? I am very interested in learning more.
No slide deck required. The task is "generating alternating permutations".
Method 1: generate permutations using a backtracking search; when a permutation is generated, check if it is alternating.
Method 2: use the same backtracking search, but only allow extensions that preserve the property of being an alternating permutation.
Gregg, what makes Method 2 so much harder than Method 1 to implement in C or C++? Regards, RW

Roman, I think this question is for Richard. I haven't had a chance to play with these methods. I will try to do that today. Gregg On 5/17/2012 6:07 AM, Roman Werpachowski wrote:
From: "Richard O'Keefe"
Subject: Re: [Haskell-cafe] Can Haskell outperform C++? To: Haskell Cafe Message-ID:<5F6605A2-DFE0-4AEA-9987-3B07DEF34C42@cs.otago.ac.nz> Content-Type: text/plain; charset=us-ascii On 17/05/2012, at 2:04 PM, Gregg Lebovitz wrote:
Richard,
Thank you. This is an example of what I had in mind when I talked about changing the playing field. Do you have a slide deck for this lecture that you would be willing to share with me? I am very interested in learning more.
No slide deck required. The task is "generating alternating permutations".
Method 1: generate permutations using a backtracking search; when a permutation is generated, check if it is alternating.
Method 2: use the same backtracking search, but only allow extensions that preserve the property of being an alternating permutation. Gregg,
what makes Method 2 so much harder than Method 1 to implement in C or C++?
Regards, RW
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 17/05/2012, at 10:07 PM, Roman Werpachowski wrote:
No slide deck required. The task is "generating alternating permutations".
Method 1: generate permutations using a backtracking search; when a permutation is generated, check if it is alternating.
Method 2: use the same backtracking search, but only allow extensions that preserve the property of being an alternating permutation.
Gregg,
what makes Method 2 so much harder than Method 1 to implement in C or C++?
It was me, not Gregg. There was and is no claim that method 2 is "much harder to implement in C or C++". In fact both methods *were* implemented easily in C. The claim was and remains solely that THE TIME DIFFERENCE BETWEEN *ALGORITHMS* can be bigger than THE TIME DIFFERENCE BETWEEN *LANGUAGES* and is in this particular case. (And that's despite the fact that the C version kept the set of unused elements as a one-native-word bit mask, while the Prolog version kept it as a linked list.) There is also a second claim, which I thought was uncontentious: the relative difficulty of tasks varies with language.

Date: Fri, 18 May 2012 15:30:09 +1200 From: "Richard O'Keefe"
Subject: Re: [Haskell-cafe] Can Haskell outperform C++? To: Roman Werpachowski Cc: haskell-cafe@haskell.org Message-ID: Content-Type: text/plain; charset=us-ascii On 17/05/2012, at 10:07 PM, Roman Werpachowski wrote:
No slide deck required. The task is "generating alternating permutations".
Method 1: generate permutations using a backtracking search; when a permutation is generated, check if it is alternating.
Method 2: use the same backtracking search, but only allow extensions that preserve the property of being an alternating permutation.
Gregg,
what makes Method 2 so much harder than Method 1 to implement in C or C++?
It was me, not Gregg.
My apologies to you and Gregg.
There was and is no claim that method 2 is "much harder to implement in C or C++". In fact both methods *were* implemented easily in C.
OK, got that now. So Haskell doesn't have a *big* advantage over C w/r to the ease of implementation of both algorithms?
The claim was and remains solely that THE TIME DIFFERENCE BETWEEN *ALGORITHMS* can be bigger than THE TIME DIFFERENCE BETWEEN *LANGUAGES* and is in this particular case.
Yes, but aren't the differences in the same ballpark, and if we want to compare *languages*, we should use identical algorithms to make the comparison fair.
(And that's despite the fact that the C version kept the set of unused elements as a one-native-word bit mask, while the Prolog version kept it as a linked list.)
There is also a second claim, which I thought was uncontentious: the relative difficulty of tasks varies with language.
It matters much less if you write the code to be run multiple times. Regards, RW

There was and is no claim that method 2 is "much harder to implement in C or C++". In fact both methods *were* implemented easily in C.
OK, got that now. So Haskell doesn't have a *big* advantage over C w/r to the ease of implementation of both algorithms?
In the case of these specific algorithms, no. In the case of other backtracking algorithms, it could have quite a big advantage.
The claim was and remains solely that THE TIME DIFFERENCE BETWEEN *ALGORITHMS* can be bigger than THE TIME DIFFERENCE BETWEEN *LANGUAGES* and is in this particular case.
Yes, but aren't the differences in the same ballpark,
(a) only to the degree that you are willing to call all language differences "in the same ballpark". (b) no, because the speed difference between the algorithms grows without bound as the problem size increases, while the speed difference between the languages does not. Here's another example: the UNIX 'tsort' command. I have versions in Java, Smalltalk, and C. The Java and Smalltalk versions are about the same speed, linear in the size of the graph. The C version appears to be the original UNIX code, and it is *quadratic* in the number of nodes. Result: Smalltalk running faster than C by a ratio that increases without bound as the input grows. This is actually a case where it was easier to write fast code than slow code in Smalltalk, easier to write slow code than fast code in C. (Built in hash tables, what else?)
and if we want to compare *languages*, we should use identical algorithms to make the comparison fair.
In the permutation generation example, I was talking about
four programs:
Language X Language Y
Method 1 Program X1 Program Y1 -- identical algorithms
Method 2 Program X2 Program Y2 -- identical algorithms
However, "identical" isn't clearly defined.
For example, is a Java program that uses HashMap
There is also a second claim, which I thought was uncontentious: the relative difficulty of tasks varies with language.
It matters much less if you write the code to be run multiple times.
You misunderstand. The issue is whether you bother to write the more efficient code *at all*. The tsort code in C I was looking at was real production code from a UNIX system that is still on the market, and in the 20 or so years since it was written, nobody had ever bothered to fix the blatant efficiency bug. In other languages, it would have been easier to write the program without the bug in the first place.

----- Original Message -----
From: "ok@cs.otago.ac.nz"
Sent: Friday, May 18, 2012 9:38 AM Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
-snip-
and if we want to compare *languages*, we should use identical algorithms to make the comparison fair.
In the permutation generation example, I was talking about four programs: Language X Language Y Method 1 Program X1 Program Y1 -- identical algorithms Method 2 Program X2 Program Y2 -- identical algorithms
However, "identical" isn't clearly defined.
Moreover, being absolutely sure that the algorithms are in some sense "identical" might make comparison pointless - for example, when the same assembly is generated by gcc from a C program and gnat from an Ada program. -snip-
In the 'tsort' case, it turns out that the Java and Smalltalk versions are I/O bound with over 90% of the time spent just reading the data.
My guess is that they could be written to do better than that - but it's idiotic of me to say so without understanding the specifics, please forgive me ;-)
They have I/O libraries with very different structures, so what does "identical algorithms" mean? If you are using dictionaries/hashmaps, and the two languages have implementations that compute different hash functions for strings, is _that_ using the same implementation?
Of course, to some degree, user defined hash functions remedy that specific problem. I agree with the thrust of your comments - even programming languages (and implementations) that seem similar, are often so different (when we get down to specifics) that comparison between programs written in different languages is a matter of making the best of a bad job. But we're still going to ask - Will my program be faster if I write it in language X? - and we're still going to wish for a simpler answer than - It depends how you write it!

I've been following the topic in both threads. Very nice discussion.
On 18 May 2012 18:51, Isaac Gouy
Moreover, being absolutely sure that the algorithms are in some sense "identical" might make comparison pointless - for example, when the same assembly is generated by gcc from a C program and gnat from an Ada program.
Suppose we take the most efficient algorithm, in terms of time and space. The demonstrations given before this post indicate the language can be more important than the algorithm. There are two different takes on the situation. On one hand we have time and space. On the other, man-hours and reliability. Both languages deal with this ratio.
But we're still going to ask - Will my program be faster if I write it in language X? - and we're still going to wish for a simpler answer than - It depends how you write it!
Change that question to: Will my language be faster if I write program X? Good C++ implementations are easy to find. However, everything is exposed. De-bugging is taken for granted. Unless everyone in a team follows some strict practice, productivity may be at risk. It is more difficult to provide reliability (i.e. programs as proofs [1]). It is everywhere - that only increases probability to find more optimized algorithms in C++ libraries. The Haskell implementation enforces an interface to the programmer (the low level time-space management, i.e. GC, IO monads, parallelism). Improvements in implementation increase efficiency of the program (i.e. optimization in standard libraries [2]). Risk of damaging productivity is low. Taking that you could derive one efficient algorithm down to assembly, the fastest language is a matter of implementation. Then, in equal grounds, it's rather easy choice. -- [1] http://homepages.inf.ed.ac.uk/wadler/papers/frege/frege.pdf [2] https://groups.google.com/d/msg/haskell-cafe/KIxGd4babKE/J7LV3EGtutsJ

On 19/05/2012, at 5:51 AM, Isaac Gouy wrote:
In the 'tsort' case, it turns out that the Java and Smalltalk versions are I/O bound with over 90% of the time spent just reading the data.
My guess is that they could be written to do better than that - but it's idiotic of me to say so without understanding the specifics, please forgive me ;-)
Actually, I/O bound is *good*. Here's the times from the C version, which has been hacked hard in order to be as fast as I could make it. N total input process output 1000; 0.004618 = 0.004107 + 0.000438 + 0.000073 2000; 0.014467 = 0.012722 + 0.001609 + 0.000136 5000; 0.059810 = 0.051308 + 0.008199 + 0.000303 10000; 0.204111 = 0.150638 + 0.052800 + 0.000673 20000; 0.717362 = 0.518343 + 0.197655 + 0.001364 50000; 3.517340 = 2.628550 + 0.885456 + 0.003331 N here is the number of nodes in the graph; the number of edges is floor(N**1.5 * 0.75). Input is the read-word + look up in hash table time. Process is the compute-the-transitive-closure time. Output is the time to write the node names in order. All node names had the form x## with ## being 1..10000. This is with my own low level code; using scanf(%...s) pushed the input time up by 40%. The Mac OS X version of the tsort command took 31.65 CPU seconds for N=10,000, of which 28.74 CPU seconds was 'system'. Like I said, the languages I used in this test
... have I/O libraries with very different structures, so what does "identical algorithms" mean? If you are using dictionaries/hashmaps, and the two languages have implementations that compute different hash functions for strings, is _that_ using the same implementation?
Of course, to some degree, user defined hash functions remedy that specific problem.
While creating other, and perhaps worse, ones. For example, in the Smalltalk code, if you use a Dictionary of Strings, you're getting Robert Jenkin's hash function in optimised C. If you supply your own, you're getting a very probably worse hash function and it's going to run rather slower. And above all, the stuff you are benchmarking is no longer code that people are actually likely to write.
But we're still going to ask - Will my program be faster if I write it in language X? - and we're still going to wish for a simpler answer than - It depends how you write it!
Here's another little example. I had a use for the Singular Value Decomposition in a Java program. Should I use pure Java or native C? Pure Java taken straight off the CD-ROM that came with a large book of numerical algorithms in Java: T seconds. After noticing that the code was just warmed over Fortran, and was varying the leftmost subscript fastest (which is good for Fortran, bad for most other languages) and swapping all the matrix dimensions: T/2 seconds. After rewriting in C: T/4 seconds. After rewriting the C code to call the appropriate BLAS and thereby using tuned code for the hardware, T/7 seconds. Since this was going to take hundreds of seconds per run, the answer was easy. A simple little thing like using a[i][j] vs a[j][i] made a factor of 2 difference to the overall speed. "It depends" is the second best answer we can get. The best answer is one that says _what_ it depends on.

From: Richard O'Keefe
Sent: Sunday, May 20, 2012 3:41 PM
On 19/05/2012, at 5:51 AM, Isaac Gouy wrote:
In the 'tsort' case, it turns out that the Java and Smalltalk versions are I/O bound with over 90% of the time spent just reading the data.
My guess is that they could be written to do better than that - but it's idiotic of me to say so without understanding the specifics, please forgive me ;-)
Actually, I/O bound is *good*.
Why would that be good or bad? I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs. -snip-
Of course, to some degree, user defined hash functions remedy that specific problem.
While creating other, and perhaps worse, ones.
For example, in the Smalltalk code, if you use a Dictionary of Strings, you're getting Robert Jenkin's hash function in optimised C. If you supply your own, you're getting a very probably worse hash function and it's going to run rather slower. And above all, the stuff you are benchmarking is no longer code that people are actually likely to write.
I think you're being a touch quarrelsome :-) Do *all* Smalltalk language implementations provide that same hash function in optimised C? Have Smalltalk language implementations *always* provided that same hash function in optimised C? How might that hash function be used when the (not necessarily Smalltalk) language implementation does not provide it? Have you researched what code people are actually likely to write, or are you just speculating? ;-) -snip-
But we're still going to ask - Will my program be faster if I write it in language X? - and we're still going to wish for a simpler answer than - It depends how you write it!
Here's another little example. I had a use for the Singular Value Decomposition in a Java program. Should I use pure Java or native C?
Pure Java taken straight off the CD-ROM that came with a large book of numerical algorithms in Java: T seconds.
After noticing that the code was just warmed over Fortran, and was varying the leftmost subscript fastest (which is good for Fortran, bad for most other languages) and swapping all the matrix dimensions: T/2 seconds.
After rewriting in C: T/4 seconds.
After rewriting the C code to call the appropriate BLAS and thereby using tuned code for the hardware, T/7 seconds.
Since this was going to take hundreds of seconds per run, the answer was easy.
Maybe the more interesting question was - Should I use a scripting language + BLAS.
"It depends" is the second best answer we can get. The best answer is one that says _what_ it depends on.
That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)

On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
Actually, I/O bound is *good*.
Why would that be good or bad?
The context here is a UNIX-style topological sorting program. Being I/O bound means that the program is limited by how fast it can read the data. If 90% of the time goes into reading the data, that means that the *algorithmic* part is fast enough. There may be very little one can do about the I/O part.
I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs.
I didn't _think_ I'd omitted anything important. Oh well. For 25,000 nodes and 2,989,635 edges, Java (first version) 4.83 seconds -- uses ArrayList, HashMap Java (2nd version) 4.17 seconds -- uses own IntList Java (3rd version) 4.09 seconds -- different approach Smalltalk 4.40 seconds C 0.92 seconds -- my code Mac OS X tsort(1) 150.35 seconds -- 11.84 user + 138.51 system For 50,000 nodes and 8,385,254 edges, Java (first version) ran out of memory after 89.54 seconds (default heap) Java (2nd version) 13.31 seconds -- avoid Integer boxing! Java (3rd version) 15.07 seconds Smalltalk 14.52 seconds C 2.36 seconds Mac OS X tsort(1) 482.32 seconds -- 41.55 user + 440.77 system The Mac OS X tsort(1) is presumably written in C. Symbols have been stripped, so even with Instruments I can't see where the time is going. Judging from its memory behaviour, I believe that most of the time is going in the input phase, which suggests a spectacularly inefficient symbol table, which suggests that it might be using a not-very-modified version of the Unix V7 code, which used a linear search...
Of course, to some degree, user defined hash functions remedy that specific problem.
While creating other, and perhaps worse, ones.
For example, in the Smalltalk code, if you use a Dictionary of Strings, you're getting Robert Jenkin's hash function in optimised C. If you supply your own, you're getting a very probably worse hash function and it's going to run rather slower. And above all, the stuff you are benchmarking is no longer code that people are actually likely to write.
I think you're being a touch quarrelsome :-)
That upset me. All I was saying is that surely the only *point* of talking about the performance of *languages* is to get some idea of how well programs are likely to do, and not any old specially crafted code, but the kind of code that is likely to be written for purposes other than benchmarking. So the more you bash on a particular example to get the time down, the less representative it is of the kind of code people are likely to write *for purposes other than benchmarking*. Just today I marked a student program where their program took 15 minutes and my model answer took a touch under 4 milliseconds. I explained to them _that_ their program was spectacularly slow. They already knew _why_ it was. I also explained the one trick (lazy initialisation) that could have hugely improved the time. I also told them that they had made the right decision, to optimise *their* time, not the computer's, in their particular context.
Do *all* Smalltalk language implementations provide that same hash function in optimised C?
*That* function, no. *Some* hash function as a "primitive" (meaning optimised C), well, I don't have every Smalltalk, but the ones I do have, I've checked, and yes they do.
Have Smalltalk language implementations *always* provided that same hash function in optimised C?
Obviously not: Smalltalk-80 ran on Xerox D-machines, which didn't *have* C. "Primitives" were implemented in microcode.
Have you researched what code people are actually likely to write, or are you just speculating? ;-)
I'm in my 6th decade. I've seen a lot of code in a lot of languages.
"It depends" is the second best answer we can get. The best answer is one that says _what_ it depends on.
That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)
The subject line has the obvious and boring answer "yes, of course". I recall one time when I wanted to analyse some data and typed up Fortran code for a suitable technique from a book. It didn't work. After struggling to debug it for a while, I implemented the whole thing from scratch in Prolog. Then I went back to the Fortran code, spotted my typing mistake, and fixed it. Result, two working programs. The Prolog program (compiled to a VM which was emulated; the compiler was not very clever) ran faster than the Fortran program (compiled to native code by a seriously optimising compiler from a major computer manufacturer). Reason? Same algorithm, different data structure. The data structure was one that was easy to express in Prolog (or Haskell!) but would have been insanely difficult in Fortran 77. This century's Fortran is of course another matter. The point here of course is that there are data structures and algorithms that are easy to express in some languages but hard in others, so that in code written *for purposes other than benchmarking*, they just would not be _used_ in one of those languages.

From: Richard O'Keefe Sent: Monday, May 21, 2012 6:54 PM
On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
Actually, I/O bound is *good*.
Why would that be good or bad?
The context here is a UNIX-style topological sorting program. Being I/O bound means that the program is limited by how fast it can read the data. If 90% of the time goes into reading the data, that means that the *algorithmic* part is fast enough.
There may be very little one can do about the I/O part.
Maybe you could say how the Java I/O is being done.
I didn't _think_ I'd omitted anything important. Oh well.
-snip-
For 50,000 nodes and 8,385,254 edges, Java (first version) ran out of memory after 89.54 seconds (default heap) Java (2nd version) 13.31 seconds -- avoid Integer boxing! Java (3rd version) 15.07 seconds Smalltalk 14.52 seconds C 2.36 seconds
fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk. fwiw my expectation is that should be possible to make the Java program considerably faster. Of course, what I expect and what happens are often very different ;-)
Of course, to some degree, user defined hash functions remedy that specific problem.
While creating other, and perhaps worse, ones.
For example, in the Smalltalk code, if you use a Dictionary of Strings, you're getting Robert Jenkin's hash function in optimised C. If you supply your own, you're getting a very probably worse hash function and it's going to run rather slower. And above all, the stuff you are benchmarking is no longer code that people are actually likely to write.
I think you're being a touch quarrelsome :-)
That upset me.
I'm sorry that gentle comment upset you.
All I was saying is that surely the only *point* of talking about the performance of *languages* is to get some idea of how well programs are likely to do, and not any old specially crafted code, but the kind of code that is likely to be written for purposes other than benchmarking. So the more you bash on a particular example to get the time down, the less representative it is of the kind of code people are likely to write *for purposes other than benchmarking*.
Certainly less representative of the kind of code students are likely to write for course credits, and the kind of code people are likely to write to complete some project task before they hand it off to someone else, and the kind of code people are likely to write until their program is actually put into use and suddenly they are working the weekend to make their program faster. A more positive way to think of - code written for purposes of benchmarking - is that it's like code written to address a performance hot spot.
Just today I marked a student program where their program took 15 minutes and my model answer took a touch under 4 milliseconds. I explained to them _that_ their program was spectacularly slow. They already knew _why_ it was. I also explained the one trick (lazy initialisation) that could have hugely improved the time. I also told them that they had made the right decision, to optimise *their* time, not the computer's, in their particular context.
The whole point is carried by your final assertion. Here's another anecdote - in my first week on site, I overheard a group of engineers arguing that Smalltalk should be abandoned because calculation times were incredibly slow. I peeked at the code, saw that it was littered with unnecessary numeric conversions (plus unnecessary arbitrary precision arithmetic), fixed and timed a sample, and left them with a lot of rework to do all across their library code. The "kind of code people are likely to write" is sometimes best described as bad. That can have consequences which spiral out of proportion -- an engineer writes some bad Smalltalk, an app performs so slowly the business group loses money, the manager of the business group notices and is shown that the slow app is the problem (and it really is the problem), the manager now "knows" that "Smalltalk is slow", the manager moves the business group away from Smalltalk, the manager is promoted and moves the whole organization away from Smalltalk. That's also an anecdote.
*That* function, no. *Some* hash function as a "primitive" (meaning optimised C), well, I don't have every Smalltalk, but the ones I do have, I've checked, and yes they do.
Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?
Have you researched what code people are actually likely to write, or are you just speculating? ;-)
I'm in my 6th decade. I've seen a lot of code in a lot of languages.
So just speculating.
The subject line has the obvious and boring answer "yes, of course".
I agree, because of the implicit qualification - "if we write the C++ badly enough" :-) -snip-
The point here of course is that there are data structures and algorithms that are easy to express in some languages but hard in others, so that in code written *for purposes other than benchmarking*, they just would not be _used_ in one of those languages.
Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse. (And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)

On 23/05/2012, at 4:54 AM, Isaac Gouy wrote:
There may be very little one can do about the I/O part.
Maybe you could say how the Java I/O is being done.
For 50,000 nodes and 8,385,254 edges, Java (first version) ran out of memory after 89.54 seconds (default heap) Java (2nd version) 13.31 seconds -- avoid Integer boxing! Java (3rd version) 15.07 seconds Smalltalk 14.52 seconds C 2.36 seconds
fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.
My own experience is that Java is anywhere between 2 times slower and 150 times slower than C. This is not for number crunching; one _would_ expect Java to be about the same as C for that because it is working at the same level as C. But string processing and text I/O using the java.io.* classes aren't brilliant. Reader r = new BufferedReader(new InputStreamReader(System.in)); This "layers of wrappers" approach to text I/O adds overheads of its own, but less than I'd thought. Using System.in directly takes the time down from 15.07 seconds to 11.11 seconds. The code used Character.isWhitespace(c) to test for a white space character; replacing that by c <= ' ' brought the time down further to 10.87 seconds. With both of these changes, we are moving away from recommended good practice; the faster code is the kind of code people are not supposed to write any more. Characters are read one at a time using r.read(). There are no fast "skip characters in this set" or "take characters in this set and return them as a string" methods in the Reader or InputStream interfaces, and StreamTokenizer reads characters one at a time using .read() also, so would be slower. As for Smalltalk, you must be thinking of free Smalltalk systems that lacked a JIT. Commercial Smalltalks have excellent JITs (many HotSpot ideas were copied from them) and there is now a pretty good one for the free systems Squeak and Pharo. These particular measurements were made using my own Smalltalk compiler which is an oddity amongst Smalltalks: a whole program compiler that compiles via C. Yes, most of the good ideas came from INRIA, although ST/X does something not entirely dissimilar.
fwiw my expectation is that should be possible to make the Java program considerably faster.
I have used profiling to find where the time was going. Approximately 70% is spent in the one method that reads a "word": - a loop skipping white space characters - a loop reading other characters and adding them to a StringBuilder - [looking the string up in a HashMap and creating and entering a new Node if necessary -- this time is not part of that 70%]. Any reasonable person would expect file reading to be buffered and for the read-one-byte method to usually just fiddle a few integers and fetch an element of an array. In fact the method is native, so every character requires switching from the Java universe to the C one and back. (The Smalltalk equivalent is pretty close to fgetc().) The only way to make this Java program 'considerably faster' is to not read characters (or bytes) in the natural Java way. (The way, in fact, that java.io.StreamTokenizer does.) And it's not INTERESTING, and it's not about LANGUAGES. There is NOTHING about the Java language that makes code like this necessarily slow. It's the LIBRARY. The java.io library was designed for flexibility, not speed. That's why there is a java.nio library. Haskell is in pretty much the same boat. I've always liked the I/O model in the Haskell report. I've expected simplicity from it, not speed, and that's what I get. But as all the new approaches to I/O (like iteratees, which make my head hurt) make perfectly clear, the limitations of the IO module are not limitations of Haskell, or of JHC, but of a particular library. And that's the point I was making with this example. Why does Smalltalk come out in the middle of the Java results? A balance between a language penalty (tagged integer arithmetic is a lot slower than native integer arithmetic) and a library bonus (a leaner meaner I/O design where there are wrappers if you want them but you very seldom need them). It's the great advantage of using libraries rather than syntax: libraries can be changed.
All I was saying is that surely the only *point* of talking about the performance of *languages* is to get some idea of how well programs are likely to do, and not any old specially crafted code, but the kind of code that is likely to be written for purposes other than benchmarking. So the more you bash on a particular example to get the time down, the less representative it is of the kind of code people are likely to write *for purposes other than benchmarking*.
Certainly less representative of the kind of code students
Leave the students out of it. It's less representative of the kind of code I see written by Sun or in Apache stuff.
The "kind of code people are likely to write" is sometimes best described as bad.
At last, something we can agree about!
*That* function, no. *Some* hash function as a "primitive" (meaning optimised C), well, I don't have every Smalltalk, but the ones I do have, I've checked, and yes they do.
Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?
Neither. I am making the point that many benchmarks benchmark library code rather than languages or compilers per se, and that the concept of "same algorithm" is as a result very fuzzy.
Have you researched what code people are actually likely to write, or are you just speculating? ;-)
I'm in my 6th decade. I've seen a lot of code in a lot of languages.
So just speculating.
How is "I have seen a lot of code" construed as "just speculating"? I know what code people are likely to write because I have seen the code that many people DID write. I know what code people are likely to write in Java because I have seen more Java (written by people who really should know what they are doing) than I ever wanted to see AND because I know how they are TOLD to write.
The subject line has the obvious and boring answer "yes, of course".
I agree, because of the implicit qualification - "if we write the C++ badly enough" :-)
Otherwise construed, "the way C++ is usually written". C++ *can* be extremely fast. One of my colleagues is very good at it. But one of his key rules is "never use the standard template library, including C++ strings."
The point here of course is that there are data structures and algorithms that are easy to express in some languages but hard in others, so that in code written *for purposes other than benchmarking*, they just would not be _used_ in one of those languages.
Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse.
(And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)
My evidence may be anecdotal, but it is better than arguing with NO evidence. The example you are commenting on here is one where I reported an emulated Prolog program running faster than native Fortran 77. Recall that Fortran 77 - did not include recursion - did not include pointers - did not include user defined data types The kind of dynamic tree construction and walking that is so trivial and every day in Haskell was extremely difficult to express in Fortran 77. Not necessarily impossible. A parser for English was once written in Fortran, and I think that was before Fortran 77 even. But hard. A programming language that supports concurrency, or backtracking, or constraint programming, or type level arithmetic, or dependent types, or ... will make some things thinkable and straightforward to express that would be neither in a language without such support.

From: Richard O'Keefe
Sent: Tuesday, May 22, 2012 7:59 PM
But string processing and text I/O using the java.io.* classes aren't brilliant.
Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode? -snip-
Using System.in directly takes the time down from 15.07 seconds to 11.11 seconds. -snip- With both of these changes, we are moving away from recommended good practice; the faster code is the kind of code people are not supposed to write any more.
Says who? Is that on your own authority or some other source you can point us to? -snip-
As for Smalltalk, you must be thinking of free Smalltalk systems that lacked a JIT. Commercial Smalltalks have excellent JITs (many HotSpot ideas were copied from them) ...
As for Smalltalk, I earned my crust with commercial Smalltalks for a decade.
These particular measurements were made using my own Smalltalk compiler which is an oddity amongst Smalltalks: a whole program compiler that compiles via C. Yes, most of the good ideas came from INRIA, although ST/X does something not entirely dissimilar.
Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation! You have got to be joking.
fwiw my expectation is that should be possible to make the Java program considerably faster.
-snip-
Any reasonable person would expect ...
I'm happy to hear what *you* would expect. -snip-
And it's not INTERESTING, and it's not about LANGUAGES. There is NOTHING about the Java language that makes code like this necessarily slow. It's the LIBRARY. The java.io library was designed for flexibility, not speed. That's why there is a java.nio library.
Here's the gorilla in the room question - So why doesn't your program use java.nio?
And that's the point I was making with this example. Why does Smalltalk come out in the middle of the Java results? A balance between a language penalty (tagged integer arithmetic is a lot slower than native integer arithmetic) and a library bonus (a leaner meaner I/O design where there are wrappers if you want them but you very seldom need them). It's the great advantage of using libraries rather than syntax: libraries can be changed.
No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down. -snip-
Neither. I am making the point that many benchmarks benchmark library code rather than languages or compilers per se, and that the concept of "same algorithm" is as a result very fuzzy.
Thank you, for stating your point clearly. -snip-
How is "I have seen a lot of code" construed as "just speculating"?
You seem to be generalizing from what you recollect without any way to control for the particularities of the situations you observed, or the particularities of your recollection. You don't seem to have data - just memories. -snip-
My evidence may be anecdotal, but it is better than arguing with NO evidence.
imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.

On 24/05/2012, at 4:39 AM, Isaac Gouy wrote:
From: Richard O'Keefe
Sent: Tuesday, May 22, 2012 7:59 PM But string processing and text I/O using the java.io.* classes aren't brilliant.
Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?
No. Amongst other things, I have my own ByteString and ByteStringBuilder classes that are basically clones of String and StringBuilder, and using them makes surprisingly little direct difference; the point is saving memory. I have obtained large speedups in Java using Java by dodging around the Java libraries. Other people have reported the same to me.
With both of these changes, we are moving away from recommended good practice; the faster code is the kind of code people are not supposed to write any more.
Says who? Is that on your own authority or some other source you can point us to?
It looks increasingly as though there is no point in this discussion. Is there ANY conceivable criticism of Java that will not elicit ad hominem attacks from you? I have read more Java textbooks than I wished to. I was on Sun's Java techniques and tips mailing list for years. I could go on, but is there, *really*, any point?
These particular measurements were made using my own Smalltalk compiler which is an oddity amongst Smalltalks: a whole program compiler that compiles via C. Yes, most of the good ideas came from INRIA, although ST/X does something not entirely dissimilar.
Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!
You have got to be joking.
Why? On various benchmarks, sometimes VisualWorks is better, sometimes my system is better. My system is utterly naive, incorporating almost none of the classic Smalltalk optimisations. I redid the test using VisualWorks NonCommercial. It took about twice as long as my Smalltalk did. According to 'TimeProfiler profile: [...]', 98% of the time is in the load phase; half of that is down to the hash table. A surprisingly small part of the rest is due to actual input (ExternalReadStream>>next); quite a bit goes into building strings and testing characters. Why the difference? With all due respect, VisualWorks still has the classic Smalltalk implementation of hash tables. Mine is different. This is a library issue, not a language issue. One of the tasks in reading is skipping separators. Since it's used a lot in parsing input, my library pushes that right down to the bottom level of ReadStream and ChannelInputStream. VisualWorks uses a single generic implementation that doesn't get up to the low level dodges mine does. And so on. All *library* issues, not *compiler* or *language* issues. Which is the whole point of this thread, as far as I am concerned. C, Java, Smalltalk: this real example is dominated by *library* level issues, not language issues or compiler issues.
And it's not INTERESTING, and it's not about LANGUAGES. There is NOTHING about the Java language that makes code like this necessarily slow. It's the LIBRARY. The java.io library was designed for flexibility, not speed. That's why there is a java.nio library.
Here's the gorilla in the room question - So why doesn't your program use java.nio?
Because that would be insane. This is a program I originally whipped up in less than an hour for two reasons: (A) I wanted to provide some students with an example of a "work-list" algorithm that had some realism to it. For that purpose, the program had to be READABLE. (B) To my astonishment, the tsort(1) programs in OpenSolaris and Mac OS X 10.6.8 turned out to be grotesquely slow for non-toy graphs. I was expecting to have a use for the program myself, so as it stood, the Java version was already quite fast enough to be useful. (As in, a LOT faster than the system version, even though the system version was written in C.) The one issue I had with the first version was not time, but space, so I explored two ways of making it take less space. There is no NEED to rewrite the program to use java.nio; having replaced the system version of the command the Java version was no longer the bottleneck in my intended use. For me personally, having no experience with java.nio, it was *easier* to rewrite the program from scratch in C than to overcome the java.nio learning curve. And in any case, I knew very well that I could get near enough to the same order of improvement using InputStream and wrapping my own buffering code over that (I've done that before). Above all, since the students were even less familiar with nio than I am, using nio would have destroyed the program's utility for purpose (A). As for the Smalltalk version, I often rewrite small things into Smalltalk in order to find out what I'm doing wrong in my implementation.
And that's the point I was making with this example. Why does Smalltalk come out in the middle of the Java results? A balance between a language penalty (tagged integer arithmetic is a lot slower than native integer arithmetic) and a library bonus (a leaner meaner I/O design where there are wrappers if you want them but you very seldom need them). It's the great advantage of using libraries rather than syntax: libraries can be changed.
No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.
No, I chose to - use the official Java plain text I/O library - the way the official Java series books and tutorials say it should be used - with a MINIMUM of wrapper layers. And it was FAST ENOUGH TO BE USEFUL. No, I chose to use that library THE WAY IT IS INTENDED TO BE USED. It is the simplest most straightforward way to go. It's the *same* "algorithm" that the C and Smalltalk versions use.
imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.
"Past century"? Insults, is it? As for "how much better programs using other data structures and algorithms perform", this whole thread is about how well programs using the SAME data structures and algorithms perform, and whether we can assign much meaning to that. How could it possibly be better to do something irrelevant to the topic?

Sorry Bryan, there are a couple of comments I should make a final reply to - I'll ignore the rest.
From: Richard O'Keefe
Sent: Wednesday, May 23, 2012 10:52 PM
-snip-
Says who? Is that on your own authority or some other source you can point us to?
It looks increasingly as though there is no point in this discussion. Is there ANY conceivable criticism of Java that will not elicit ad hominem attacks from you?
It isn't an ad hominem attack to ask you who's the authority that made some recommendation. -snip-
Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!
You have got to be joking.
Why?
Because you omitted basic information about the measurements you presented. -snip-
imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.
"Past century"? Insults, is it?
No, it's an echo of the words you used - "...insanely difficult in Fortran 77. This century's Fortran is of course another matter."

On 5/22/12 12:54 PM, Isaac Gouy wrote:
On 5/21/2012 6:54 PM, Richard O'Keefe wrote:
For 50,000 nodes and 8,385,254 edges, Java (first version) ran out of memory after 89.54 seconds (default heap) Java (2nd version) 13.31 seconds -- avoid Integer boxing! Java (3rd version) 15.07 seconds Smalltalk 14.52 seconds C 2.36 seconds
fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.
FWIW, that matches my expectations pretty well. Naive/standard Java performing slower than Smalltalk; highly tweaked Java using non-standard data types performing on-par with or somewhat faster than Smalltalk. That C is 7x faster is a bit on the high end, but for something like tsort I could imagine it'd be possible. Do bear in mind that Java doesn't optimize ---that's the JIT's job--- and that the standard datatypes for Java are only good enough to suffice for casual use and to be better than *naively* implementing them yourself. They are extremely mediocre if you have any sort of "specialized" needs. If you know what you're doing in designing data structures, you can get amazing mileage out of writing a hand-tuned implementation that (1) avoids boxing native types, (2) uses a non-generic structure which has good complexities for the kinds of operations you'll actually be using, and (3) doesn't bother trying to implement the ridiculous APIs of the standard data types. But even still, in my experience of using Smalltalk, the standard data structures are much better done and so they will be on-par with what you'd get from hand-tuning for Java. I've spent a lot of time trying to get decent performance out of Java, not so much with Smalltalk; but the performance with Smalltalk was sufficient that it wasn't needed so badly. -- Live well, ~wren

From: wren ng thornton
Sent: Tuesday, May 22, 2012 9:30 PM
-snip-
FWIW, that matches my expectations pretty well. Naive/standard Java performing slower than Smalltalk; highly tweaked Java using non-standard data types performing on-par with or somewhat faster than Smalltalk.
I have no difficulty believing that if you are talking about a 1996 Java reference implementation and a 1996 Smalltalk JIT VM. I could believe that if you are comparing a naive Java program with a highly tweaked Smalltalk program.
That C is 7x faster is a bit on the high end, but for something like tsort I could imagine it'd be possible.
It's possible because it's possible to write a Java program to be slower than it need be :-)
Do bear in mind that Java doesn't optimize ---that's the JIT's job
What are we supposed to make of that? Why write that and not -- Do bear in mind that Smalltalk doesn't optimize that's the JIT's job -- or -- Do bear in mind that C doesn't optimize that's the compiler's job. -snip-
But even still, in my experience of using Smalltalk, the standard data structures are much better done and so they will be on-par with what you'd get from hand-tuning for Java. I've spent a lot of time trying to get decent performance out of Java, not so much with Smalltalk; but the performance with Smalltalk was sufficient that it wasn't needed so badly.
Do you have a specific example that you can share?

Oops, forgot to reply-to-all. This was a minor clarification on Wren's behalf (he can correct me if I'm wrong). But I agree with Bryan that it's time for the thread to die:
Do bear in mind that Java doesn't optimize ---that's the JIT's job
What are we supposed to make of that?
Why write that and not -- Do bear in mind that Smalltalk doesn't optimize that's the JIT's job -- or -- Do bear in mind that C doesn't optimize that's the compiler's job.
I believe this was referring to the fact that javac isn't an aggressive
optimizing compiler on the way from source to bytecode, i.e. it's the
bytecode->asm leg where the optimization effort is focused.
As an outsider to things Java that's something I've had trouble
understanding, actually. It doesn't seem like an either-or choice to me...
-Ryan
On Wed, May 23, 2012 at 4:26 PM, Isaac Gouy
From: wren ng thornton
Sent: Tuesday, May 22, 2012 9:30 PM
FWIW, that matches my expectations pretty well. Naive/standard Java
-snip- performing
slower than Smalltalk; highly tweaked Java using non-standard data types performing on-par with or somewhat faster than Smalltalk.
I have no difficulty believing that if you are talking about a 1996 Java reference implementation and a 1996 Smalltalk JIT VM.
I could believe that if you are comparing a naive Java program with a highly tweaked Smalltalk program.
That C is 7x faster is a bit on the high end, but for something like tsort I could imagine it'd be possible.
It's possible because it's possible to write a Java program to be slower than it need be :-)
Do bear in mind that Java doesn't optimize ---that's the JIT's job
What are we supposed to make of that?
Why write that and not -- Do bear in mind that Smalltalk doesn't optimize that's the JIT's job -- or -- Do bear in mind that C doesn't optimize that's the compiler's job.
-snip-
But even still, in my experience of using Smalltalk, the standard data structures are much better done and so they will be on-par with what you'd get from hand-tuning for Java. I've spent a lot of time trying to get decent performance out of Java, not so much with Smalltalk; but the performance with Smalltalk was sufficient that it wasn't needed so badly.
Do you have a specific example that you can share?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Date: Sat, 19 May 2012 08:57:38 +0200 From: Ertugrul S?ylemez
Subject: Re: [Haskell-cafe] Can Haskell outperform C++? To: haskell-cafe@haskell.org Message-ID: <20120519085738.37548a07@tritium.streitmacht.eu> Content-Type: text/plain; charset="us-ascii"
Haskell delivers reasonable performance for almost all non-embedded applications, and for the extreme edge cases one can still switch to C or assembly using the FFI.
At the cost of introducing another dimension of complexity (two codebases, two toolchains, need for programmers specialising in two languages). There is a reason why many businesses go to great pains to ensure their codebase is homogeneous (and it's not "managers are dumb"). RW

On 5/18/12 7:45 AM, Roman Werpachowski wrote:
On Fri, 18 May 2012 15:30:09 +1200, "Richard O'Keefe"
wrote: The claim was and remains solely that THE TIME DIFFERENCE BETWEEN *ALGORITHMS* can be bigger than THE TIME DIFFERENCE BETWEEN *LANGUAGES* and is in this particular case.
Yes, but aren't the differences in the same ballpark, and if we want to compare *languages*, we should use identical algorithms to make the comparison fair.
"Fair" in what sense? That is, what _exactly_ are you hoping to compare? If the goal is to benchmark the implementation of the runtime, VM, or built-in types, then requiring the same algorithm makes sense--- because the algorithm is irrelevant other than to provide a bunch of calls to the runtime/vm/etc. However, benchmarking a language's implementation in this way is rarely that helpful. It's great for comparing CPython to PyPy (or any other in-language cross-compiler comparison), but what would it tell you about Haskell vs C++? If the goal is to compare, say, production costs for a given level of performance, then fixing the algorithm is not at all fair. The fact of the matter is that different languages make different algorithms easier to (a) implement, and (b) discover/identify/generalize. Thus, when it comes to real-world software, the language that makes it easy to implement good algorithms has a major advantage--- an advantage which is being specifically ignored by fixing the algorithm aforehand. In order for "fair" to have any meaning whatsoever, we must first specify what is being compared, so that we know what it is that things are supposed to be fair with regard to. -- Live well, ~wren

From: wren ng thornton
Sent: Saturday, May 19, 2012 12:49 AM Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
-snip-
"Fair" in what sense? That is, what _exactly_ are you hoping to compare?
If the goal is to benchmark the implementation of the runtime, VM, or built-in types, then requiring the same algorithm makes sense--- because the algorithm is irrelevant other than to provide a bunch of calls to the runtime/vm/etc. However, benchmarking a language's implementation in this way is rarely that helpful. It's great for comparing CPython to PyPy (or any other in-language cross-compiler comparison), but what would it tell you about Haskell vs C++?
The PyPy crowd won't like it if you take programs written for CPython and measure how they run with PyPy - that's "not fair". But it might take a couple of years before they contribute programs optimised for PyPy :-( (You already said what it would tell you, but questioned how helpful that would be.)
If the goal is to compare, say, production costs for a given level of performance, then fixing the algorithm is not at all fair. The fact of the matter is that different languages make different algorithms easier to (a) implement, and (b) discover/identify/generalize. Thus, when it comes to real-world software, the language that makes it easy to implement good algorithms has a major advantage--- an advantage which is being specifically ignored by fixing the algorithm aforehand.
Let's just say that's true - Is it useful? What would we need to do to make the comparison? We could do something like - "Plat_Forms: Is there a single best web development technology? A professional programming contest" http://page.mi.fu-berlin.de/prechelt/Biblio/platforms07-cacm-2010.pdf But that was just once, with very few teams, and only one problem -- seems like it would need to be repeated more often than is affordable, and with more teams than can be persuaded to donate their time. Maybe your point is true but practically useless? :-(
In order for "fair" to have any meaning whatsoever, we must first specify what is being compared, so that we know what it is that things are supposed to be fair with regard to.
'What does "not fair" mean? (A fable)' http://stackoverflow.com/a/6380299

----- Original Message -----
From: Richard O'Keefe
Sent: Thursday, May 17, 2012 8:30 PM Subject: Re: [Haskell-cafe] Can Haskell outperform C++? -snip-
The claim was and remains solely that THE TIME DIFFERENCE BETWEEN *ALGORITHMS* can be bigger than THE TIME DIFFERENCE BETWEEN *LANGUAGES* and is in this particular case.
That seems like a modest and uncontentious claim.
There is also a second claim, which I thought was uncontentious: the relative difficulty of tasks varies with language.
That doesn't seem unlikely; but I think you'd have to spell out just what you mean by language, and it also doesn't seem unlikely that for the same task the relative difficulty might flip when other details change (the people doing the task, the language implementation, the libraries available for the language implementation...) It all seems so very particular ;-)
participants (10)
-
Bryan O'Sullivan
-
Chris Dornan
-
Gregg Lebovitz
-
Isaac Gouy
-
ok@cs.otago.ac.nz
-
Paulo Pocinho
-
Richard O'Keefe
-
Roman Werpachowski
-
Ryan Newton
-
wren ng thornton