
Sebastian Sylvan wrote:
On 11/10/06, Henk-Jan van Tuyl
wrote: On Fri, 10 Nov 2006 01:44:15 +0100, Donald Bruce Stewart
wrote: So back in January we had lots of fun tuning up Haskell code for the Great Language Shootout[1]. We did quite well at the time, at one point ranking overall first[2]. [...]
Haskell suddenly dropped several places in the overall socre, when the size measurement changed from line-count to number-of-bytes after gzipping. Maybe it's worth it, to study why this is; Haskell programs are often much more compact then programs in other languages, but after gzipping, other languages do much better. One reason I can think of, is that for very short programs, the import statements weigh heavily.
I think the main factor is that languages with large syntactic redundancy get that compressed away. I.e if you write:
MyVeryLongAndConvlutedClassName MyVeryLargeAndConvulutedObject new MyVeryLongAndConvolutedClassName( somOtherLongVariableName );
Or something like that, that makes the code clumpsy and difficult to read, but it won't affect the gzipped byte count very much. Their current way of meassuring is pretty much pointless, since the main thing the gzipping does is remove the impact of clunky syntax. Meassuring lines of code is certainly not perfect, but IMO it's a lot more useful as a metric then gzipped bytes.
It may not be useful on its own, but it is not entirely meaningless. By using a lossless compression algorithm, you might infer some meaning about the code. Where it fails though is that if the algorithm was ideal (preferring low space at the expense of time), then the resulting bytes should be exactly the same. If it is not, then the samples did not do the exact same thing in the first place and so are not comparable! So, assuming gzip is ideal, then it is considered a win by having a higher compressed output! It is not that the method is pointless, it is the extrapolation and interpretation of the results. You could argue that the gzipped output is just the same thing written in a new programming language - of course, it is not very readable (at least not to me since I do not have gunzip installed in my brain, but I do have a Haskell interpreter of some sort). Achieving minimum expressiveness at the source code level is entirely subjective and is based on an interpretation by the observer. Using gzip attempts to minimise this subjectivity - whether or not it is successful is not entirely decidable, but it is at least better. Unfortunately, the results have been misinterpreted. Just smile and nod, I do :)