
On 11/2/07, Sterling Clover
As I understand it, the question is what you want to measure for. gzip is actually pretty good at, precisely because it removes boilerplate, reducing programs to something approximating their complexity. So a higher gzipped size means, at some level, a more complicated algorithm (in the case, maybe, of lower level languages, because there's complexity that's not lifted to the compiler). LOC per language, as I understand it, has been somewhat called into question as a measure of productivity, but there's still a correlation between programmers and LOC across languages even if it wasn't as strong as thought -- on the other hand, bugs per LOC seems to have been fairly strongly debunked as something constant across languages. If you want a measure of the language as a language, I guess LOC/gzipped is a good ratio for how much "noise" it introduces -- but if you want to measure just pure speed across similar algorithmic implementations, which, as I understand it, is what the shootout is all about, then gzipped actually tends to make some sense.
--S _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Lossless File compression, AKA entropy coding, attempts to maximize the amount of information per bit (or byte) to be as close to the entropy as possible. Basically, gzip is measuring (approximating) the amount of "information" contained in the code. I think it would be interesting to compare the ratios between raw file size its entropy (we can come up with a precise metric later). This would show us how concise the language and code actually is. --ryan