Re: [Haskell-cafe] Two-iteration optimisation (was: GHC Predictability)

14 May 2008

      paul:
...
We've had a big debate over
...
mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
For anyone who didn't follow it, the problem is that "mean" needs to 
traverse its argument twice, so the entire list has to be held in 
memory.  So if "xs = [1..1000000000]" then "mean xs" uses all your 
memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.
The solution is for the programmer to rewrite "mean" to accumulate a 
pair containing the running total and count together, then do the 
division.  This makes me wonder: could there be a compiler optimisation 
rule for this, collapsing two iterations over a list into one.  I've 
never tried writing GHC rules, but something like:
...
f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g 
i gt, h i ht)))
The first problem that occurs to me is the number of permutations of 
fold and map functions.
You'd want a general fusion framework for this, and other accumulators,
that's let's you treat 'f' as a zip of some kind that pulls items from
its two arguments. And then require only a single rewrite rule for all
your loop forms.

Stream fusion (see "Lists to Streams to Nothing at All") at least does this for zips,

    zip (foldl .. )
        (foldl .. )

Will end up with a single loop, but for an arbitrary 'f' instead of zip,
seems harder.

-- Don

Re: [Haskell-cafe] Two-iteration optimisation (was: GHC Predictability)

Don Stewart