On 11/2/07, Sterling Clover <s.clover@gmail.com> wrote:
As I understand it, the question is what you want to measure for.
gzip is actually pretty good at, precisely because it removes
boilerplate, reducing programs to something approximating their
complexity. So a higher gzipped size means, at some level, a more
complicated algorithm (in the case, maybe, of lower level languages,
because there's complexity that's not lifted to the compiler). LOC
per language, as I understand it, has been somewhat called into
question as a measure of productivity, but there's still a
correlation between programmers and LOC across languages even if it
wasn't as strong as thought -- on the other hand, bugs per LOC seems
to have been fairly strongly debunked as something constant across
languages. If you want a measure of the language as a language, I
guess LOC/gzipped is a good ratio for how much "noise" it introduces
-- but if you want to measure just pure speed across similar
algorithmic implementations, which, as I understand it, is what the
shootout is all about, then gzipped actually tends to make some sense.

--S
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Lossless File compression, AKA entropy coding, attempts to maximize the amount of information per bit (or byte) to be as close to the entropy as possible. Basically, gzip is measuring (approximating) the amount of "information" contained in the code.

I think it would be interesting to compare the ratios between raw file size its entropy (we can come up with a precise metric later). This would show us how concise the language and code actually is.

--ryan