
So back in January we had lots of fun tuning up Haskell code for the Great Language Shootout[1]. We did quite well at the time, at one point ranking overall first[2]. After doing all we could with ghc 6.4.2, the Haskell entries have been left for the last 10 months, while we worked on new libraries (bytestring, regex-*). 1. http://shootout.alioth.debian.org/ 2. http://www.cse.unsw.edu.au/~dons/data/haskell_1.html Now the time has come to reload the shootout for another round! GHC 6.6 is on the 'sandbox' debian machine, and will soon be on the other shootout boxes[3], which means we can use: * Data.ByteString * regex-* libraries 3. http://shootout.alioth.debian.org/sandbox/benchmark.php?test=all&lang=ghc&lang2=javaxint#about And thus greatly improve: fannkuch fasta k-nucleotide regex-dna reverse-complement sum-file While we're here we should fix: chameneos And anything else you want to take a look at. A community page has been set up to which you can submit improved entries: http://www.haskell.org/haskellwiki/Great_language_shootout So, install GHC 6.6, read up on Data.ByteString and the new regex libs, and submit faster code to the wiki! Our shootout-interface officer, musasabi, can then commit them to shootout cvs, once consensus is reached on the best code to submit. Let's take back first place! :) -- Don

On Fri, 10 Nov 2006 01:44:15 +0100, Donald Bruce Stewart
So back in January we had lots of fun tuning up Haskell code for the Great Language Shootout[1]. We did quite well at the time, at one point ranking overall first[2]. [...]
Haskell suddenly dropped several places in the overall socre, when the size measurement changed from line-count to number-of-bytes after gzipping. Maybe it's worth it, to study why this is; Haskell programs are often much more compact then programs in other languages, but after gzipping, other languages do much better. One reason I can think of, is that for very short programs, the import statements weigh heavily. -- Met vriendelijke groet, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ -- Using Opera's revolutionary e-mail client: https://secure.bmtmicro.com/opera/buy-opera.html?AID=789433

On 11/10/06, Henk-Jan van Tuyl
On Fri, 10 Nov 2006 01:44:15 +0100, Donald Bruce Stewart
wrote: So back in January we had lots of fun tuning up Haskell code for the Great Language Shootout[1]. We did quite well at the time, at one point ranking overall first[2]. [...]
Haskell suddenly dropped several places in the overall socre, when the size measurement changed from line-count to number-of-bytes after gzipping. Maybe it's worth it, to study why this is; Haskell programs are often much more compact then programs in other languages, but after gzipping, other languages do much better. One reason I can think of, is that for very short programs, the import statements weigh heavily.
I think the main factor is that languages with large syntactic redundancy get that compressed away. I.e if you write: MyVeryLongAndConvlutedClassName MyVeryLargeAndConvulutedObject new MyVeryLongAndConvolutedClassName( somOtherLongVariableName ); Or something like that, that makes the code clumpsy and difficult to read, but it won't affect the gzipped byte count very much. Their current way of meassuring is pretty much pointless, since the main thing the gzipping does is remove the impact of clunky syntax. Meassuring lines of code is certainly not perfect, but IMO it's a lot more useful as a metric then gzipped bytes. -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Sebastian Sylvan wrote:
On 11/10/06, Henk-Jan van Tuyl
wrote: On Fri, 10 Nov 2006 01:44:15 +0100, Donald Bruce Stewart
wrote: So back in January we had lots of fun tuning up Haskell code for the Great Language Shootout[1]. We did quite well at the time, at one point ranking overall first[2]. [...]
Haskell suddenly dropped several places in the overall socre, when the size measurement changed from line-count to number-of-bytes after gzipping. Maybe it's worth it, to study why this is; Haskell programs are often much more compact then programs in other languages, but after gzipping, other languages do much better. One reason I can think of, is that for very short programs, the import statements weigh heavily.
I think the main factor is that languages with large syntactic redundancy get that compressed away. I.e if you write:
MyVeryLongAndConvlutedClassName MyVeryLargeAndConvulutedObject new MyVeryLongAndConvolutedClassName( somOtherLongVariableName );
Or something like that, that makes the code clumpsy and difficult to read, but it won't affect the gzipped byte count very much. Their current way of meassuring is pretty much pointless, since the main thing the gzipping does is remove the impact of clunky syntax. Meassuring lines of code is certainly not perfect, but IMO it's a lot more useful as a metric then gzipped bytes.
It may not be useful on its own, but it is not entirely meaningless. By using a lossless compression algorithm, you might infer some meaning about the code. Where it fails though is that if the algorithm was ideal (preferring low space at the expense of time), then the resulting bytes should be exactly the same. If it is not, then the samples did not do the exact same thing in the first place and so are not comparable! So, assuming gzip is ideal, then it is considered a win by having a higher compressed output! It is not that the method is pointless, it is the extrapolation and interpretation of the results. You could argue that the gzipped output is just the same thing written in a new programming language - of course, it is not very readable (at least not to me since I do not have gunzip installed in my brain, but I do have a Haskell interpreter of some sort). Achieving minimum expressiveness at the source code level is entirely subjective and is based on an interpretation by the observer. Using gzip attempts to minimise this subjectivity - whether or not it is successful is not entirely decidable, but it is at least better. Unfortunately, the results have been misinterpreted. Just smile and nod, I do :)

Hello Sebastian, Saturday, November 11, 2006, 3:51:09 AM, you wrote:
Meassuring lines of code is certainly not perfect, but IMO it's a lot more useful as a metric then gzipped bytes.
why they don't use word count?? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 11/11/06, Bulat Ziganshin
Hello Sebastian,
Saturday, November 11, 2006, 3:51:09 AM, you wrote:
Meassuring lines of code is certainly not perfect, but IMO it's a lot more useful as a metric then gzipped bytes.
why they don't use word count??
I don't know. I suppose it's quite difficult to define a useful meaning of "word". I mean: foo=\xs->[y*2|y<-xs] Would count as one word, I suppose, though the number of tokens is far greater. Though I do think a word count as well, would hold more intuitive meaning than gzipped bytes. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Sebastian Sylvan wrote:
On 11/10/06, Henk-Jan van Tuyl
wrote: On Fri, 10 Nov 2006 01:44:15 +0100, Donald Bruce Stewart
wrote: So back in January we had lots of fun tuning up Haskell code for the Great Language Shootout[1]. We did quite well at the time, at one point ranking overall first[2]. [...]
Haskell suddenly dropped several places in the overall socre, when the size measurement changed from line-count to number-of-bytes after gzipping. Maybe it's worth it, to study why this is; Haskell programs are often much more compact then programs in other languages, but after gzipping, other languages do much better. One reason I can think of, is that for very short programs, the import statements weigh heavily.
I think the main factor is that languages with large syntactic redundancy get that compressed away. I.e if you write:
MyVeryLongAndConvlutedClassName MyVeryLargeAndConvulutedObject new MyVeryLongAndConvolutedClassName( somOtherLongVariableName );
Or something like that, that makes the code clumpsy and difficult to read, but it won't affect the gzipped byte count very much. Their current way of meassuring is pretty much pointless, since the main thing the gzipping does is remove the impact of clunky syntax. Meassuring lines of code is certainly not perfect, but IMO it's a lot more useful as a metric then gzipped bytes.
Sure, since gzip is the metric, then we can optimise for that. For example, instead of writing a higher-order function, just copy it out N times instantiating the higher-order argument differently each time. There should be no gzipped-code-size penalty for doing that, and it'll be faster :-) Cheers, Simon

Hello Simon, Tuesday, November 14, 2006, 4:31:27 PM, you wrote:
Sure, since gzip is the metric, then we can optimise for that. For example, instead of writing a higher-order function, just copy it out N times instantiating the higher-order argument differently each time. There should be no gzipped-code-size penalty for doing that, and it'll be faster :-)
i hope that ghc 6.8 will include -Ogzip! switch which transforms program so it becomes more compressible by gzip. also jhc's recent haskell-to-ghc translation mode may be optimized in this way, so jhc will finally outperform ghc as a best shootout platform :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (6)
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
Henk-Jan van Tuyl
-
Sebastian Sylvan
-
Simon Marlow
-
Tony Morris