Re: [Haskell-cafe] Two-iteration optimisation (was: GHC Predictability)

19 May 2008


      On Wed, May 14, 2008 at 06:08:28PM +0100, Paul Johnson wrote:
...
We've had a big debate over
...
mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
For anyone who didn't follow it, the problem is that "mean" needs to  
traverse its argument twice, so the entire list has to be held in  
memory.  So if "xs = [1..1000000000]" then "mean xs" uses all your  
memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.
The solution is for the programmer to rewrite "mean" to accumulate a  
pair containing the running total and count together, then do the  
division.  This makes me wonder: could there be a compiler optimisation  
rule for this, collapsing two iterations over a list into one.  I've  
never tried writing GHC rules, but something like:
...
f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i 
gt, h i ht)))
The first problem that occurs to me is the number of permutations of  
fold and map functions.
Paul.
There are two problems with this rule.  The first is that the function is not
strict in 'gt' and 'ht' -- this is easily fixed with a little bit of seq.  The
other problem is that 'f' must be strict in both parameters for this rule to
hold.  As far as I know, there is no access to strictness information in rule
pragmas.


Cheers,
Spencer Janssen