
If one thinks about Haskell data structures as of ordinary data structures, fusion seems a great deal -- instead of producing intermediate lists and possibly running out of memory, we just run a loop and use constant amount of space. But Haskell data structures are quite different -- they are produced as demanded. Consider the example from the Stream Fusion paper[1]: f :: Int → Int f n = sum [ k ∗ m | k ← [1..n], m ← [1..k ] ] Assuming the sum is a strict left fold, it consumes elements of lists one-by-one and runs in constant space. The list part can be transformed to foldr (++) [] $ map (\k -> map (\m -> k*m) [1..k]) [1..n] which is capable of producing elements one-by-one. So the whole thing probably should run in constant space as well. Of course I don't claim that fusion is useless -- just trying to understand the problem it solves. Are we saving a few closures and cons cells here? [1] Stream Fusion. From Lists to Streams to Nothing at All. Duncan Coutts, Roman Leshchinskiy, Don Stewart. -- Roman I. Cheplyaka :: http://ro-che.info/ Don't worry what people think, they don't do it very often.

Yes, the goal isn't so much to improve complexity (both are O(1)) but to
reduce the constant factor on that O(1).
In an inner loop like that, allocator/gc calls by far dominate the cost of
the program. If you can remove them, you've improved the performance of the
program by 10-100x.
In the case where everything is Int, you can even unbox and get entirely in
registers, which gives you comparable performance to a hand-tuned C or
assembly language loop.
-- ryan
On Tue, May 17, 2011 at 10:55 PM, Roman Cheplyaka
If one thinks about Haskell data structures as of ordinary data structures, fusion seems a great deal -- instead of producing intermediate lists and possibly running out of memory, we just run a loop and use constant amount of space.
But Haskell data structures are quite different -- they are produced as demanded. Consider the example from the Stream Fusion paper[1]:
f :: Int → Int f n = sum [ k ∗ m | k ← [1..n], m ← [1..k ] ]
Assuming the sum is a strict left fold, it consumes elements of lists one-by-one and runs in constant space.
The list part can be transformed to
foldr (++) [] $ map (\k -> map (\m -> k*m) [1..k]) [1..n]
which is capable of producing elements one-by-one. So the whole thing probably should run in constant space as well.
Of course I don't claim that fusion is useless -- just trying to understand the problem it solves. Are we saving a few closures and cons cells here?
[1] Stream Fusion. From Lists to Streams to Nothing at All. Duncan Coutts, Roman Leshchinskiy, Don Stewart.
-- Roman I. Cheplyaka :: http://ro-che.info/ Don't worry what people think, they don't do it very often.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, May 18, 2011 at 2:04 AM, Ryan Ingram
Yes, the goal isn't so much to improve complexity (both are O(1)) but to reduce the constant factor on that O(1).
In an inner loop like that, allocator/gc calls by far dominate the cost of the program. If you can remove them, you've improved the performance of the program by 10-100x.
Yes, this is important. Fusion is an obvious win on strict structures, because it can make the space usage asymptotically better. However, even if this doesn't happen for lazy structures, fusion can save a lot of _time_. It won't make the time complexity asymptotically better, but it can save an amount of work proportional to the complexity of the algorithm. For instance, I was playing around with uvector a while back, and needed foldr or something similar that wasn't available. So I wrote something like: foldr f z s = if null s then z else f (head s) (tail s) This all ran in constant space, but tail re-unfolds the entire stream every time, so this function has time complexity O(n^2). The nth element chugs through n allocation-followed-by-deallocations before it becomes usable, which can all be done in constant space, but takes linear time. Fusion won't save you in this example (to my knowledge). But if you compose k functions from lists to lists together, you'll have k allocations-followed-by-deallocations on every element that makes it all the way through the pipeline. You'll see O(1) space usage, but your time usage has a c*k*n term simply from being expressed by a composition pipeline, where c is the cost of the unnecessary boxing. Fusion eliminates this term. -- Dan

On 18/05/2011, at 15:55 , Roman Cheplyaka wrote:
Of course I don't claim that fusion is useless -- just trying to understand the problem it solves. Are we saving a few closures and cons cells here?
And thunk allocations, and thunk entries. Entering a thunk costs upwards of 20 cycles, while performing a single addition should only cost one. Imagine every thunk entry is a function call. You don't want to call a whole function just to add two numbers together. Those "few closures and cons cells" can be surprisingly expensive when compared to native ALU instructions on a modern machine. Ben.

Also, we do fusion on strict structures (e.g. vectors), where you get
back O(n) on each fused point. Obviously, it is less of a win on lazy
structures than the (pathological) case of strict data, but it is
still a win.
-- Don
On Tue, May 17, 2011 at 11:07 PM, Ben Lippmeier
On 18/05/2011, at 15:55 , Roman Cheplyaka wrote:
Of course I don't claim that fusion is useless -- just trying to understand the problem it solves. Are we saving a few closures and cons cells here?
And thunk allocations, and thunk entries. Entering a thunk costs upwards of 20 cycles, while performing a single addition should only cost one. Imagine every thunk entry is a function call. You don't want to call a whole function just to add two numbers together.
Those "few closures and cons cells" can be surprisingly expensive when compared to native ALU instructions on a modern machine.
Ben.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Roman Cheplyaka wrote:
Of course I don't claim that fusion is useless -- just trying to understand the problem it solves. Are we saving a few closures and cons cells here?
In addition to what everyone else said, fusion can be a big win when it allows further optimisations. For instance, fusing map (+1) . map (+2) can eliminate 1 addition per iteration. Even without taking allocation into account, most of the reasons for why loop fusion is a worthwhile optimisation in C apply to Haskell, too! Roman
participants (6)
-
Ben Lippmeier
-
Dan Doel
-
Don Stewart
-
Roman Cheplyaka
-
Roman Leshchinskiy
-
Ryan Ingram