Two-iteration optimisation (was: GHC Predictability)

newer
Control.Exception.evaluate -...

older
Question on declaring an instance

Paul Johnson

14 May 2008 14 May '08

5:08 p.m.

We've had a big debate over

...

mean xs = foldl' (+) 0 xs / fromIntegral (length xs)

For anyone who didn't follow it, the problem is that "mean" needs to traverse its argument twice, so the entire list has to be held in memory. So if "xs = [1..1000000000]" then "mean xs" uses all your memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not. The solution is for the programmer to rewrite "mean" to accumulate a pair containing the running total and count together, then do the division. This makes me wonder: could there be a compiler optimisation rule for this, collapsing two iterations over a list into one. I've never tried writing GHC rules, but something like:

...

f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i gt, h i ht)))

The first problem that occurs to me is the number of permutations of fold and map functions. Paul.

Show replies by date

Don Stewart

14 May 14 May

5:20 p.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

paul:

...

We've had a big debate over

...
mean xs = foldl' (+) 0 xs / fromIntegral (length xs)

For anyone who didn't follow it, the problem is that "mean" needs to traverse its argument twice, so the entire list has to be held in memory. So if "xs = [1..1000000000]" then "mean xs" uses all your memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.

The solution is for the programmer to rewrite "mean" to accumulate a pair containing the running total and count together, then do the division. This makes me wonder: could there be a compiler optimisation rule for this, collapsing two iterations over a list into one. I've never tried writing GHC rules, but something like:

...
f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i gt, h i ht)))

The first problem that occurs to me is the number of permutations of fold and map functions.

You'd want a general fusion framework for this, and other accumulators, that's let's you treat 'f' as a zip of some kind that pulls items from its two arguments. And then require only a single rewrite rule for all your loop forms. Stream fusion (see "Lists to Streams to Nothing at All") at least does this for zips, zip (foldl .. ) (foldl .. ) Will end up with a single loop, but for an arbitrary 'f' instead of zip, seems harder. -- Don

Andrew Coppin

6:38 p.m.

New subject: Two-iteration optimisation

Don Stewart wrote:

...

You'd want a general fusion framework for this, and other accumulators, that's let's you treat 'f' as a zip of some kind that pulls items from its two arguments. And then require only a single rewrite rule for all your loop forms.

Stream fusion (see "Lists to Streams to Nothing at All") at least does this for zips,

zip (foldl .. ) (foldl .. )

Will end up with a single loop, but for an arbitrary 'f' instead of zip, seems harder.

It seems the thing to do would be to expose a top-level function for explicitly specifying that this is what you want to do: foo :: (x' -> y' -> z') -> ([x] -> x') -> ([y] -> y') -> [x] -> [y] -> z' You could then say mean xs = foo (/) sum length which is pretty declarative, and gives the compiler an easy time with optimising. Of course, if nobody chooses to *use* this function... it won't help you any.

Yitzchak Gale

15 May 15 May

9:17 a.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

Don Stewart wrote:

...

You'd want a general fusion framework for this... Stream fusion... at least does this for zips... but for an arbitrary 'f' instead of zip, seems harder.

And of course, you wouldn't want that: f xs = xs : map expensiveCalculation xs Please don't fuse those two loops into one. In the case of "mean", the outer function in question is /, and that is a good candidate for fusion because it is strict in both arguments. I think Don is right. There is a lot of room for expanding the scope of fusion, but it needs to be done one function at a time until our strictness analysis gets better. In the general case, I like the way things work now. The original complaint about "unpredictability" is, in my opinion, a "bug in the documentation". This is yet another concept whose basics need to be made much more accessible to newbies. -Yitz

Andrew Coppin

8:45 p.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

Yitzchak Gale wrote:

...

And of course, you wouldn't want that:

f xs = xs : map expensiveCalculation xs

Please don't fuse those two loops into one.

...doesn't type check. Did you mean (++)?

...

In the case of "mean", the outer function in question is /, and that is a good candidate for fusion because it is strict in both arguments.

I think Don is right. There is a lot of room for expanding the scope of fusion, but it needs to be done one function at a time until our strictness analysis gets better.

Mmm, strictnes analysis. That sounds hard...

...

In the general case, I like the way things work now. The original complaint about "unpredictability" is, in my opinion, a "bug in the documentation". This is yet another concept whose basics need to be made much more accessible to newbies.

...which is really the point I was trying to get across, yes. :-) Certainly there are experts [like Don] who seemingly have no trouble knocking out blindingly fast Haskell code. The problem is that only the halowed experts can do this; it's currently rather inaccessible to beginners and intermediates. For example, back when I really *was* a beginner, I found several fleeting references to "laziness can be bad sometimes", but I couldn't find anywhere that explains *why*. The concept that laziness could be something you *don't* want seemed highly counter-intuitive. [After all, laziness is this wonderful thing that stops you doing extra work you don't need to!] Indeed, it wasn't until I wrote a Haskell interpretter that I discovered the [main] problem - huge unevaluated expressions being needlessly dragged around. The fact that laziness is bad sometimes is a pretty basic piece of information for anybody remotely serious about performance programming to be made aware of. I think the information is out there, it's just not tefficially "findable" right now. I'm not 100% sure what the best way to improve this situation would be, but I think it involves somebody somewhere sitting down to write a single, coherant account from beginning to end explaining what the issues are, in language that people who aren't compiler designers can understand.

Luke Palmer

16 May 16 May

4:04 a.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

On Thu, May 15, 2008 at 8:45 PM, Andrew Coppin wrote:

...

Yitzchak Gale wrote:

...
And of course, you wouldn't want that:

f xs = xs : map expensiveCalculation xs

Please don't fuse those two loops into one.

...doesn't type check. Did you mean (++)?

Hmm? While the name might be misleading... expensiveCalculation = (:[]) Luke

Andrew Coppin

6:42 p.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

Luke Palmer wrote:

...

On Thu, May 15, 2008 at 8:45 PM, Andrew Coppin wrote:

...
Yitzchak Gale wrote:

...
And of course, you wouldn't want that:

f xs = xs : map expensiveCalculation xs

Please don't fuse those two loops into one.

...doesn't type check. Did you mean (++)?

Hmm? While the name might be misleading...

expensiveCalculation = (:[])

Ah. So f :: [x] -> [[x]]. Devious...

Henning Thielemann

14 May 14 May

5:25 p.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

On Wed, 14 May 2008, Paul Johnson wrote:

...

We've had a big debate over

...
mean xs = foldl' (+) 0 xs / fromIntegral (length xs)

For anyone who didn't follow it, the problem is that "mean" needs to traverse its argument twice, so the entire list has to be held in memory. So if "xs = [1..1000000000]" then "mean xs" uses all your memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.

I also wondered how to solve that problem. It also occurs in implementations of "wc", that is counting lines, words and characters of a text in parallel.

...

The solution is for the programmer to rewrite "mean" to accumulate a pair containing the running total and count together, then do the division. This makes me wonder: could there be a compiler optimisation rule for this, collapsing two iterations over a list into one. I've never tried writing GHC rules, but something like:

...
f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i gt, h i ht)))

The left hand sides of GHC rules must have a definite function as top-level construct, whereas 'f' is a variable. You may define a function which points the optimizer to the places where something shall be replaced. Say fuseFold :: (a -> b -> c) -> a -> b -> c fuseFold f x y = f x y {-# RULES forall f g h x y. fuseFold f (foldl' g x) (foldl' h x) = ... #-} However, instead of writing library functions which use foldl', it might be better to implement and export the accumulating functions separately. They can be used with 'foldl's of arbitrary data structures and they can be fused explicitly without GHC rules. However, I suspect you cannot rewrite say (length (words str)) or (length (lines str)) in that manner in an elegant way.

...

The first problem that occurs to me is the number of permutations of fold and map functions.

foldl' could be merged with 'map' to a foldl', however the existing fusion frameworks use different primitives, like foldr/build.

Albert Y. C. Lai

6:47 p.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

Paul Johnson wrote:

...

The solution is for the programmer to rewrite "mean" to accumulate a pair containing the running total and count together, then do the division. This makes me wonder: could there be a compiler optimisation rule for this, collapsing two iterations over a list into one.

Do not forget: http://www.haskell.org/tmrwiki/WhyAttributeGrammarsMatter Though not a compiler optimisation, it's a thinker optimisation. I have another thought. It is now time to investigate the viability of mean x = s `par` l `par` (s/l) where s = sum x l = length x If you worry that the sum thread and the length thread are not synchronized and therefore there is still no bound on the list prefix kept in memory, I'm sure you can improve it by one of the chunking strategies. As our hardware grows more cores but not more gigahertz, it may become detrimal to fuse. Fusion implies entangling, an anti-thesis to parallelism. One day we may call this an optimisation: unfuse code hand-fused by programmers, then parallelize. Optimisation people on the imperative side are already doing this (they have more hand-fused code than we do), though targetting at older hardware like vector machines and SIMD, not yet multiple cores.

Andrew Coppin

6:53 p.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

Albert Y. C. Lai wrote:

...

If you worry that the sum thread and the length thread are not synchronized and therefore there is still no bound on the list prefix kept in memory, I'm sure you can improve it by one of the chunking strategies.

I'm more worried that data now has to go through tiny CPU <--> RAM busses rather than zipping along at the CPU's core frequency as it would if only 1 CPU were involved. I would think the cache issues make the sequential version a fair bit faster. (Not to mention that no GC is required.)

...

As our hardware grows more cores but not more gigahertz, it may become detrimal to fuse. Fusion implies entangling, an anti-thesis to parallelism. One day we may call this an optimisation: unfuse code hand-fused by programmers, then parallelize. Optimisation people on the imperative side are already doing this (they have more hand-fused code than we do), though targetting at older hardware like vector machines and SIMD, not yet multiple cores.

The key to parallel processing is splitting a task into *independent* tasks so you can do them in parallel. If they're still interdependent [like the above], I suspect you're not gaining much.

Spencer Janssen

19 May 19 May

7:22 a.m.

New subject: Two-iteration optimisation (was: GHC Predictability)

On Wed, May 14, 2008 at 06:08:28PM +0100, Paul Johnson wrote:

...

We've had a big debate over

...
mean xs = foldl' (+) 0 xs / fromIntegral (length xs)

For anyone who didn't follow it, the problem is that "mean" needs to traverse its argument twice, so the entire list has to be held in memory. So if "xs = [1..1000000000]" then "mean xs" uses all your memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.

The solution is for the programmer to rewrite "mean" to accumulate a pair containing the running total and count together, then do the division. This makes me wonder: could there be a compiler optimisation rule for this, collapsing two iterations over a list into one. I've never tried writing GHC rules, but something like:

...
f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i gt, h i ht)))

The first problem that occurs to me is the number of permutations of fold and map functions.

Paul.

There are two problems with this rule. The first is that the function is not strict in 'gt' and 'ht' -- this is easily fixed with a little bit of seq. The other problem is that 'f' must be strict in both parameters for this rule to hold. As far as I know, there is no access to strictness information in rule pragmas. Cheers, Spencer Janssen

6260

Age (days ago)

6265

Last active (days ago)

List overview

Download

10 comments

8 participants

participants (8)

Albert Y. C. Lai
Andrew Coppin
Don Stewart
Henning Thielemann
Luke Palmer
Paul Johnson
Spencer Janssen
Yitzchak Gale