Hello,

> I offer up the following example:
>

This is an instructive example.

>   mean xs = sum xs / length xs
>

In order to type-check, I actually need to write something like:

    mean xs = sum xs / fromIntegral (length xs)

There are other ways of get the numeric types to match correctly, but this is fairly general.

Then, I immediately blow my stack if I try something like:

    mean [1..1000000000].

The culprit is actually sum which is defined in the base libraries as either a foldl or a direct recursion depending on a compiler flag. In either case, the code is not strict enough; just trying to compute:

     sum [1..10000000]

blows the stack. This can be easily fixed by defining a suitable strict sum:

    sum' = foldl' (+) 0

and now sum' has constant space. We could try to redefine mean using sum':

    mean1 xs = sum' xs / fromIntegral (length xs)

but this still gobbles up memory. The reason is that xs is used twice and cannot be discarded as it is generated. So we must move to a direct fold, as you did, to get a space efficient mean.

> If we now rearrange this to
>
>   mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s' = s+x; n' = n+1
> in s' `seq` n' `seq` (s', n')) (0,0)
>
> and run the same example, and watch it run in constant space.
>
This code actually blows the stack on my machine just like the first naive mean. Foldl is perhaps more intuitive  to use here, since we are summing the numbers as we encounter them while walking down the list, and there is a strict version, foldl', provided in the base libraries.


    mean2 = uncurry (/) . foldl' (\(s,n) x -> (s+x, n+1)) (0,0)

However, this still gobbles up memory... the reason is that pairs are lazy. So we need a way to force the (s+x) and (n+1). An easy, and unobtrusive way to do this is to use strict pattern matching:

    mean2 = uncurry (/) . foldl' (\(!s, !n) x -> (s+x, n+1)) (0,0)

Now we can run:

    mean2 [1..1000000000]

in constant space.

While using an explicit foldl is less readable than the direct version (especially to a beginner), it is a standard functional idiom. Furthermore, a good understanding of lazy evaluation should be enough to guide you to using the strict foldl' and then then strict patterns. Is this a reasonable analysis?

Also, we've made no attempt to address speed. However, I would argue that the code's performance time is predictable-- it grows linearly with the size of the list. Improving the performance time is another matter that requires knowing about the internal representation of the various types being used.

-Jeff


---

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.