Where/Let clauses and subexpression repetition/memoization

Dear Haskell Community, I have been bedeviled by a persistent confusion about one aspect of where/let clauses, which I think I can illustrate more easily with code than with words. Given code like this: averageR pxls = map (pcast . setR ravg) pxls' where (ravg,_,_) = averages pxls' pxls' = map pcast pxls What happens if I substitute pxls' for its right-hand side, averageR pxls = map (pcast . setR ravg) (map pcast pxls) where (ravg,_,_) = averages (map pcast pxls) My "superstition" here has been that the former only evaluates pxls' once, whereas the latter computes it twice. This seems like a basic issue which must have been confirmed or debunked somewhere in my readings, but it hasn't sunk in with me.

On Thu, Aug 18, 2011 at 21:21, Michael Serra
My "superstition" here has been that the former only evaluates pxls' once, whereas the latter computes it twice. This seems like a basic issue which must have been confirmed or debunked somewhere in my readings, but it hasn't sunk in with me.
As I understand it, common subexpression elimination in lazy languages is difficult at best because the shared subexpressions may thereby become ineligible for fusion, so you're expected to do it explicitly by means of where/let clauses. So your superstition is actually the truth. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On Friday 19 August 2011, 04:22:22, Brandon Allbery wrote:
On Thu, Aug 18, 2011 at 21:21, Michael Serra
wrote: My "superstition" here has been that the former only evaluates pxls' once, whereas the latter computes it twice. This seems like a basic issue which must have been confirmed or debunked somewhere in my readings, but it hasn't sunk in with me.
As I understand it, common subexpression elimination in lazy languages is difficult at best because the shared subexpressions may thereby become ineligible for fusion, so you're expected to do it explicitly by means of where/let clauses. So your superstition is actually the truth.
With a small grain of salt or two. The language allows both, sharing and recomputation, in both cases. But an implementation that did not share if the value is bound to a name in a where/let would grossly violate the users' expectations, so per principle of least surprise you can pretty much rely on sharing in the first case. The latter is harder to predict. In most cases, you'll get recomputation, but GHC does a bit of CSE, so in some cases it will compute only once and share the result (which may be a bad thing - that's a further reason for not doing too much CSE, sometimes sharing has catastrophic results).

On Fri, Aug 19, 2011 at 2:52 AM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:
[...] In most cases, you'll get recomputation, but GHC does a bit of CSE, so in some cases it will compute only once and share the result (which may be a bad thing - that's a further reason for not doing too much CSE, sometimes sharing has catastrophic results).
This is a beginner's list, so I'll ask: what catastrophic result could it have, if it's computing pure functions?

On Fri, Aug 19, 2011 at 09:43:13AM -0700, Tom Murphy wrote:
On Fri, Aug 19, 2011 at 2:52 AM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:
[...] In most cases, you'll get recomputation, but GHC does a bit of CSE, so in some cases it will compute only once and share the result (which may be a bad thing - that's a further reason for not doing too much CSE, sometimes sharing has catastrophic results).
This is a beginner's list, so I'll ask: what catastrophic result could it have, if it's computing pure functions?
It cannot change the meaning of a program. But it can drastically change the memory usage. For example, consider (length [1..1000000], sum [1..1000000]) vs. let x = [1..1000000] in (length x, sum x) The first expression needs only a constant amount of memory, since each list can be incrementally generated and garbage collected while accumulating the length and sum. In the second program, however, x cannot be garbage collected while length is being computed, since x is still being referred to by sum. So at the point just after evaluating length and before starting in on evaluation of sum, the entire million-element list is in memory. -Brent
participants (5)
-
Brandon Allbery
-
Brent Yorgey
-
Daniel Fischer
-
Michael Serra
-
Tom Murphy