
On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
Actually, I/O bound is *good*.
Why would that be good or bad?
The context here is a UNIX-style topological sorting program. Being I/O bound means that the program is limited by how fast it can read the data. If 90% of the time goes into reading the data, that means that the *algorithmic* part is fast enough. There may be very little one can do about the I/O part.
I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs.
I didn't _think_ I'd omitted anything important. Oh well. For 25,000 nodes and 2,989,635 edges, Java (first version) 4.83 seconds -- uses ArrayList, HashMap Java (2nd version) 4.17 seconds -- uses own IntList Java (3rd version) 4.09 seconds -- different approach Smalltalk 4.40 seconds C 0.92 seconds -- my code Mac OS X tsort(1) 150.35 seconds -- 11.84 user + 138.51 system For 50,000 nodes and 8,385,254 edges, Java (first version) ran out of memory after 89.54 seconds (default heap) Java (2nd version) 13.31 seconds -- avoid Integer boxing! Java (3rd version) 15.07 seconds Smalltalk 14.52 seconds C 2.36 seconds Mac OS X tsort(1) 482.32 seconds -- 41.55 user + 440.77 system The Mac OS X tsort(1) is presumably written in C. Symbols have been stripped, so even with Instruments I can't see where the time is going. Judging from its memory behaviour, I believe that most of the time is going in the input phase, which suggests a spectacularly inefficient symbol table, which suggests that it might be using a not-very-modified version of the Unix V7 code, which used a linear search...
Of course, to some degree, user defined hash functions remedy that specific problem.
While creating other, and perhaps worse, ones.
For example, in the Smalltalk code, if you use a Dictionary of Strings, you're getting Robert Jenkin's hash function in optimised C. If you supply your own, you're getting a very probably worse hash function and it's going to run rather slower. And above all, the stuff you are benchmarking is no longer code that people are actually likely to write.
I think you're being a touch quarrelsome :-)
That upset me. All I was saying is that surely the only *point* of talking about the performance of *languages* is to get some idea of how well programs are likely to do, and not any old specially crafted code, but the kind of code that is likely to be written for purposes other than benchmarking. So the more you bash on a particular example to get the time down, the less representative it is of the kind of code people are likely to write *for purposes other than benchmarking*. Just today I marked a student program where their program took 15 minutes and my model answer took a touch under 4 milliseconds. I explained to them _that_ their program was spectacularly slow. They already knew _why_ it was. I also explained the one trick (lazy initialisation) that could have hugely improved the time. I also told them that they had made the right decision, to optimise *their* time, not the computer's, in their particular context.
Do *all* Smalltalk language implementations provide that same hash function in optimised C?
*That* function, no. *Some* hash function as a "primitive" (meaning optimised C), well, I don't have every Smalltalk, but the ones I do have, I've checked, and yes they do.
Have Smalltalk language implementations *always* provided that same hash function in optimised C?
Obviously not: Smalltalk-80 ran on Xerox D-machines, which didn't *have* C. "Primitives" were implemented in microcode.
Have you researched what code people are actually likely to write, or are you just speculating? ;-)
I'm in my 6th decade. I've seen a lot of code in a lot of languages.
"It depends" is the second best answer we can get. The best answer is one that says _what_ it depends on.
That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)
The subject line has the obvious and boring answer "yes, of course". I recall one time when I wanted to analyse some data and typed up Fortran code for a suitable technique from a book. It didn't work. After struggling to debug it for a while, I implemented the whole thing from scratch in Prolog. Then I went back to the Fortran code, spotted my typing mistake, and fixed it. Result, two working programs. The Prolog program (compiled to a VM which was emulated; the compiler was not very clever) ran faster than the Fortran program (compiled to native code by a seriously optimising compiler from a major computer manufacturer). Reason? Same algorithm, different data structure. The data structure was one that was easy to express in Prolog (or Haskell!) but would have been insanely difficult in Fortran 77. This century's Fortran is of course another matter. The point here of course is that there are data structures and algorithms that are easy to express in some languages but hard in others, so that in code written *for purposes other than benchmarking*, they just would not be _used_ in one of those languages.