Hello,
> I offer up the following example:
>
This is an instructive example.
> mean xs = sum xs / length xs
>
In order to type-check, I actually need to write something
like:
mean xs = sum xs / fromIntegral (length
xs)
There are other ways of get the numeric types to match
correctly, but this is fairly general.
Then, I immediately blow my stack if I try something
like:
mean [1..1000000000].
The culprit is actually sum which is defined in the
base libraries as either a foldl or a direct recursion depending on a compiler
flag. In either case, the code is not strict enough; just trying to compute:
sum [1..10000000]
blows the stack. This can be easily fixed by defining
a suitable strict sum:
sum' = foldl' (+) 0
and now sum' has constant space. We could try to redefine
mean using sum':
mean1 xs = sum' xs / fromIntegral (length
xs)
but this still gobbles up memory. The reason is that
xs is used twice and cannot be discarded as it is generated. So we must
move to a direct fold, as you did, to get a space efficient mean.
> If we now rearrange this to
>
> mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s'
= s+x; n' = n+1
> in s' `seq` n' `seq` (s', n')) (0,0)
>
> and run the same example, and watch it run in constant space.
>
This code actually blows the stack on my machine just like the first naive
mean. Foldl is perhaps more intuitive to use here, since we are summing
the numbers as we encounter them while walking down the list, and there
is a strict version, foldl', provided in the base libraries.
mean2 = uncurry (/) . foldl' (\(s,n)
x -> (s+x, n+1)) (0,0)
However, this still gobbles up memory... the reason
is that pairs are lazy. So we need a way to force the (s+x) and (n+1).
An easy, and unobtrusive way to do this is to use strict pattern matching:
mean2 = uncurry (/) . foldl' (\(!s,
!n) x -> (s+x, n+1)) (0,0)
Now we can run:
mean2 [1..1000000000]
in constant space.
While using an explicit foldl is less readable than
the direct version (especially to a beginner), it is a standard functional
idiom. Furthermore, a good understanding of lazy evaluation should be enough
to guide you to using the strict foldl' and then then strict patterns.
Is this a reasonable analysis?
Also, we've made no attempt to address speed. However,
I would argue that the code's performance time is predictable-- it grows
linearly with the size of the list. Improving the performance time is another
matter that requires knowing about the internal representation of the various
types being used.
-Jeff
---
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.