
I wrote up the second part of the tour of understanding low level performance in GHC here, http://reddit.com/r/programming/info/6lx36/comments/ Follows on from the discussion last week about various performance related things. Enjoy. -- Don

I think it's hard to overstate how awesome the RULES pragma is.
Thanks for the example.
-- ryan
On 6/3/08, Don Stewart
I wrote up the second part of the tour of understanding low level performance in GHC here,
http://reddit.com/r/programming/info/6lx36/comments/
Follows on from the discussion last week about various performance related things.
Enjoy.
-- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, 3 Jun 2008, Don Stewart wrote:
I wrote up the second part of the tour of understanding low level performance in GHC here,
http://reddit.com/r/programming/info/6lx36/comments/
Follows on from the discussion last week about various performance related things.
Now the difficult question: How to write the 'mean' function in terms of 'sum' and 'length' while getting the same performance?

On Wed, 2008-06-04 at 09:32 +0200, Henning Thielemann wrote:
On Tue, 3 Jun 2008, Don Stewart wrote:
I wrote up the second part of the tour of understanding low level performance in GHC here,
http://reddit.com/r/programming/info/6lx36/comments/
Follows on from the discussion last week about various performance related things.
Now the difficult question: How to write the 'mean' function in terms of 'sum' and 'length' while getting the same performance?
There's another rather harder fusion transformation that notices when two left folds are demanded in the same strictness context and they fold over the same input list then they can be performed together. sum = foldl (\s x -> s + x) 0 length = foldl (\l x -> l + 1) 0 mean xs = sum xs / length xs So we must note that sum and length are demanded at the same time and since they are both foldl's will consume the whole of xs. So we can merge the two foldl's into one just by tupling them up: sumlength = foldl (\(s, l) x -> (s + x, l + 1)) (0, 0) mean xs = s / l where (s, l) = sumlength xs The Fortran people have been doing this kind of loop fusion for some years. What makes it a bit harder for us is that we cannot do it with rules because it's not a simple local transformation. It could probably be done with a special compiler pass, though it'd need strictness analysis to be done much earlier. Duncan

[Forgot to post to the list, sorry]
2008/6/4 Duncan Coutts
On Wed, 2008-06-04 at 09:32 +0200, Henning Thielemann wrote:
On Tue, 3 Jun 2008, Don Stewart wrote:
I wrote up the second part of the tour of understanding low level performance in GHC here,
http://reddit.com/r/programming/info/6lx36/comments/
Follows on from the discussion last week about various performance related things.
Now the difficult question: How to write the 'mean' function in terms of 'sum' and 'length' while getting the same performance?
There's another rather harder fusion transformation that notices when two left folds are demanded in the same strictness context and they fold over the same input list then they can be performed together.
sum = foldl (\s x -> s + x) 0 length = foldl (\l x -> l + 1) 0
mean xs = sum xs / length xs
So we must note that sum and length are demanded at the same time and since they are both foldl's will consume the whole of xs.
So we can merge the two foldl's into one just by tupling them up:
sumlength = foldl (\(s, l) x -> (s + x, l + 1)) (0, 0)
mean xs = s / l where (s, l) = sumlength xs
I see a problem with this particular fusion, though: It changes the space complexity of the program, from linear to constant. Therefore, with some programs, relying on this "optimization" can be a matter of correctness, not just performance. Therefore, if it is not clear when the compiler will optimize this, I'd rather not use it. (A counter example is tail calls, which are rather easily deducible, at least in a strict context) At least, with more simple fusions, the difference was between stressing the GC or not. The space and time complexities of the problem didn't change at all. Only the constants did. Loup

On Wed, Jun 4, 2008 at 9:48 AM, Loup Vaillant
I see a problem with this particular fusion, though: It changes the space complexity of the program, from linear to constant. Therefore, with some programs, relying on this "optimization" can be a matter of correctness, not just performance. Therefore, if it is not clear when the compiler will optimize this, I'd rather not use it. (A counter example is tail calls, which are rather easily deducible, at least in a strict context)
Haskell is not required to be lazy, only non-strict. That is, Haskell as a language is free to re-evaluate expressions bound in let clauses. This can change the time and space complexity of a program. To me, time and space complexity is not about correctness but performance. Given unbounded time and space, you will still arrive at the same result regardless of the complexity. What makes the asymptotic class more blessed than the associated constants? However I still see your point. If optimizations cannot be guaranteed--the conditions under which they fire are brittle--then the language can be yet harder to predict (which is not something Haskell needs!). It's hard to turn down an optimization which will accidentally asymptotically improve your program, however. I wonder what can be said about "stable" optimizations which are insensitive to their environments in some sense. Luke

I wonder what can be said about "stable" optimizations which are insensitive to their environments in some sense.
http://citeseer.ist.psu.edu/veldhuizen02guaranteed.html Ganesh ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ==============================================================================

On Wednesday 04 June 2008 11:05:52 Luke Palmer wrote:
To me, time and space complexity is not about correctness but performance.
IRL the specification often dictates the complexity. If your code fails to satisfy the spec then it is wrong. Are you saying that Haskell code can never satisfy any such specification?
Given unbounded time and space, you will still arrive at the same result regardless of the complexity.
Given that the set of computers with unbounded time and space is empty, is it not fruitless to discuss its properties? -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e

Jon Harrop wrote:
IRL the specification often dictates the complexity. If your code fails to satisfy the spec then it is wrong. Are you saying that Haskell code can never satisfy any such specification?
In addition to RL, it it should and it can in theory too: http://www.cs.toronto.edu/~hehner/aPToP/ in particular chapter 4 sections 4.2 and 4.3. I also admire that Musser and Saini give asymptotic costs as an indispensible part of the STL specification. Everyone should at least do that much.

Jon Harrop wrote:
On Wednesday 04 June 2008 11:05:52 Luke Palmer wrote:
Given unbounded time and space, you will still arrive at the same result regardless of the complexity.
Given that the set of computers with unbounded time and space is empty, is it not fruitless to discuss its properties?
The set of omnipotent people is also empty. That has not stopped homo sapiens from discussing /its/ properties. All history is ever only such a discussion for nothing fires up the imagination nor provokes acts of brilliant foolhardiness like the unattainable. -- Kim-Ee -- View this message in context: http://www.nabble.com/More-on-performance-tp17629453p17676772.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

On Wed, 4 Jun 2008, Luke Palmer wrote:
On Wed, Jun 4, 2008 at 9:48 AM, Loup Vaillant
wrote: I see a problem with this particular fusion, though: It changes the space complexity of the program, from linear to constant. Therefore, with some programs, relying on this "optimization" can be a matter of correctness, not just performance. Therefore, if it is not clear when the compiler will optimize this, I'd rather not use it. (A counter example is tail calls, which are rather easily deducible, at least in a strict context)
Haskell is not required to be lazy, only non-strict. That is, Haskell as a language is free to re-evaluate expressions bound in let clauses. This can change the time and space complexity of a program.
To me, time and space complexity is not about correctness but performance. Given unbounded time and space, you will still arrive at the same result regardless of the complexity. What makes the asymptotic class more blessed than the associated constants?
Is it possible to extend the garbage collector that way, that it does not only check whether references to a piece of data exist, but that it also tries to eliminate these references by further evaluations. Consider again mean xs = sum xs / fromIntegral (length xs) If 'sum' forces construction of xs, unevaluated 'length' still points to the first element of xs. Could the garbage collector start counting for 'length' in order to free the first elements of xs?

On Wed, 4 Jun 2008, Duncan Coutts wrote:
On Wed, 2008-06-04 at 09:32 +0200, Henning Thielemann wrote:
Now the difficult question: How to write the 'mean' function in terms of 'sum' and 'length' while getting the same performance?
There's another rather harder fusion transformation that notices when two left folds are demanded in the same strictness context and they fold over the same input list then they can be performed together.
sum = foldl (\s x -> s + x) 0 length = foldl (\l x -> l + 1) 0
mean xs = sum xs / length xs
So we must note that sum and length are demanded at the same time and since they are both foldl's will consume the whole of xs.
So we can merge the two foldl's into one just by tupling them up:
sumlength = foldl (\(s, l) x -> (s + x, l + 1)) (0, 0)
mean xs = s / l where (s, l) = sumlength xs
How about assisting the compiler with a helper function named 'parallel' ? parallel :: ([a] -> b, [a] -> c) -> [a] -> (b,c) parallel (f,g) xs = (f xs, g xs) mean xs = uncurry (/) $ parallel (sum,length) xs ? We could state RULES in terms of 'parallel'. By calling 'parallel', the user tells, that he wants fusion here. Say "parallel/foldl/foldl" forall f, g, x0, y0. parallel (foldl f x0, foldl g y0) = foldl (\(x,y) z -> (f x z, g y z)) (x0,y0)

On Jun 4, 2008, at 5:51 AM, Henning Thielemann wrote: How about assisting the compiler with a helper function named 'parallel' ?
parallel :: ([a] -> b, [a] -> c) -> [a] -> (b,c) parallel (f,g) xs = (f xs, g xs)
mean xs = uncurry (/) $ parallel (sum,length) xs
? We could state RULES in terms of 'parallel'. By calling 'parallel', the user tells, that he wants fusion here.
Say "parallel/foldl/foldl" forall f, g, x0, y0. parallel (foldl f x0, foldl g y0) = foldl (\(x,y) z -> (f x z, g y z)) (x0,y0)
Well, we already have &&&. Would a sufficiently specialized rule over that be a useful addition to Control.Arrow? Regards, Sterl.

Henning Thielemann
Now the difficult question: How to write the 'mean' function in terms of 'sum' and 'length' while getting the same performance?
Write a RULE pragma converting \xs -> (foldl' f y0 xs,foldl' g z0 xs) into \xs -> foldl' (\(y,z) x -> (f y x,g z x)) (y0,z0) xs ? To actually work, it'd have to work for arbitrary top level function, not just the (,) constructor. No idea if it's feasible at all, of course :-) -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (12)
-
Albert Y. C. Lai
-
Don Stewart
-
Duncan Coutts
-
Henning Thielemann
-
Jon Harrop
-
Ketil Malde
-
Kim-Ee Yeoh
-
Loup Vaillant
-
Luke Palmer
-
Ryan Ingram
-
Sittampalam, Ganesh
-
Sterling Clover