
Joachim Breitner schrieb:
Hi,
although semantically it could, ghc does not do common subexpression elimination (CSE), at least not in every case. The canonical example where it would be bad is the function (with div to avoid numerical casting):
avg :: [Int] -> Int avg n = (sum [0..n] `div` length [0..n])
The reason why [0..n] is not shared is because then it would be kept in memory completely after the evaluation of sum, because length will still need it, possibly overflowing the memory. In the non-shared form, though, the garbage collector can clean up directly behind sum and length, and the program runs in constant space.
Note that the shared expression would be very large compared to the thunks.
Now consider the program:
avg' :: [Int] -> (Int, Int) avg' n = (sum [0..n] `div` length [0..n], length [0..n])
It think this is not typecorrect, since 'n' denotes a list and is used as upper bound in [0..n]. Should it be
avg' ns = (sum ns `div` length ns, length ns)
?
avg n = (sum [0..n] `div` l, l) where l = length [0..n] is safe, because the shared expression (an Int) is smaller than the
I’d say that CSE to thunks and the compiler can tell.
If this is meant to be
avg n = (sum ns `div` l, l) where l = length ns
according to my substitution above, then 'ns' still has to be shared and stored completely. I think the general problem is more complicated than common subexpression elimination, because lazy evaluation sometimes forces an inefficient order of computation. Other examples are
viewR xs = (init xs, last xs)
and an elegant Haskell implementation of the Unix command 'wc' like
wc text = (length text, length (words text), length (lines text))