
On 24/05/2012, at 4:39 AM, Isaac Gouy wrote:
From: Richard O'Keefe
Sent: Tuesday, May 22, 2012 7:59 PM But string processing and text I/O using the java.io.* classes aren't brilliant.
Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?
No. Amongst other things, I have my own ByteString and ByteStringBuilder classes that are basically clones of String and StringBuilder, and using them makes surprisingly little direct difference; the point is saving memory. I have obtained large speedups in Java using Java by dodging around the Java libraries. Other people have reported the same to me.
With both of these changes, we are moving away from recommended good practice; the faster code is the kind of code people are not supposed to write any more.
Says who? Is that on your own authority or some other source you can point us to?
It looks increasingly as though there is no point in this discussion. Is there ANY conceivable criticism of Java that will not elicit ad hominem attacks from you? I have read more Java textbooks than I wished to. I was on Sun's Java techniques and tips mailing list for years. I could go on, but is there, *really*, any point?
These particular measurements were made using my own Smalltalk compiler which is an oddity amongst Smalltalks: a whole program compiler that compiles via C. Yes, most of the good ideas came from INRIA, although ST/X does something not entirely dissimilar.
Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!
You have got to be joking.
Why? On various benchmarks, sometimes VisualWorks is better, sometimes my system is better. My system is utterly naive, incorporating almost none of the classic Smalltalk optimisations. I redid the test using VisualWorks NonCommercial. It took about twice as long as my Smalltalk did. According to 'TimeProfiler profile: [...]', 98% of the time is in the load phase; half of that is down to the hash table. A surprisingly small part of the rest is due to actual input (ExternalReadStream>>next); quite a bit goes into building strings and testing characters. Why the difference? With all due respect, VisualWorks still has the classic Smalltalk implementation of hash tables. Mine is different. This is a library issue, not a language issue. One of the tasks in reading is skipping separators. Since it's used a lot in parsing input, my library pushes that right down to the bottom level of ReadStream and ChannelInputStream. VisualWorks uses a single generic implementation that doesn't get up to the low level dodges mine does. And so on. All *library* issues, not *compiler* or *language* issues. Which is the whole point of this thread, as far as I am concerned. C, Java, Smalltalk: this real example is dominated by *library* level issues, not language issues or compiler issues.
And it's not INTERESTING, and it's not about LANGUAGES. There is NOTHING about the Java language that makes code like this necessarily slow. It's the LIBRARY. The java.io library was designed for flexibility, not speed. That's why there is a java.nio library.
Here's the gorilla in the room question - So why doesn't your program use java.nio?
Because that would be insane. This is a program I originally whipped up in less than an hour for two reasons: (A) I wanted to provide some students with an example of a "work-list" algorithm that had some realism to it. For that purpose, the program had to be READABLE. (B) To my astonishment, the tsort(1) programs in OpenSolaris and Mac OS X 10.6.8 turned out to be grotesquely slow for non-toy graphs. I was expecting to have a use for the program myself, so as it stood, the Java version was already quite fast enough to be useful. (As in, a LOT faster than the system version, even though the system version was written in C.) The one issue I had with the first version was not time, but space, so I explored two ways of making it take less space. There is no NEED to rewrite the program to use java.nio; having replaced the system version of the command the Java version was no longer the bottleneck in my intended use. For me personally, having no experience with java.nio, it was *easier* to rewrite the program from scratch in C than to overcome the java.nio learning curve. And in any case, I knew very well that I could get near enough to the same order of improvement using InputStream and wrapping my own buffering code over that (I've done that before). Above all, since the students were even less familiar with nio than I am, using nio would have destroyed the program's utility for purpose (A). As for the Smalltalk version, I often rewrite small things into Smalltalk in order to find out what I'm doing wrong in my implementation.
And that's the point I was making with this example. Why does Smalltalk come out in the middle of the Java results? A balance between a language penalty (tagged integer arithmetic is a lot slower than native integer arithmetic) and a library bonus (a leaner meaner I/O design where there are wrappers if you want them but you very seldom need them). It's the great advantage of using libraries rather than syntax: libraries can be changed.
No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.
No, I chose to - use the official Java plain text I/O library - the way the official Java series books and tutorials say it should be used - with a MINIMUM of wrapper layers. And it was FAST ENOUGH TO BE USEFUL. No, I chose to use that library THE WAY IT IS INTENDED TO BE USED. It is the simplest most straightforward way to go. It's the *same* "algorithm" that the C and Smalltalk versions use.
imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.
"Past century"? Insults, is it? As for "how much better programs using other data structures and algorithms perform", this whole thread is about how well programs using the SAME data structures and algorithms perform, and whether we can assign much meaning to that. How could it possibly be better to do something irrelevant to the topic?