Would it be a good or bad idea to make dropWhile try to fuse?

As Graham Hutton describes in "A tutorial on the universality and
expressiveness of fold",
dropWhile p = fst . foldr go ([], [])
where
go x (ys, xs) = (if p x then ys else x:xs, x:xs)
We can write this for real:
dropWhile :: forall a . (a -> Bool) -> [a] -> [a]
dropWhile p list = build dropWhile'
where
dropWhile' :: forall b . (a -> b -> b) -> b -> b
dropWhile' c n = fst $ foldr go (n, n) list
where
go x (ys, xs) = let xs' = x `c` xs in (if p x then ys else xs', xs')
This is generally awful, because it could copy a large portion of the
list with no benefit. However, if it fuses, it will sometimes give
good results. For example, compiling
f m n = foldr (+) 0 $ dropWhile (

On Fri, Jul 25, 2014 at 12:35 AM, David Feuer
This is generally awful, because it could copy a large portion of the list with no benefit.
This is a general example of why paramorphisms are more powerful than catamorphisms. That is, we have the following two notions of "folding", where I will uncurry things to make the pattern clearer: cata :: (b, a -> b -> b) -> [a] -> b cata (nil,cons) = go where go [] = nil go (x:xs) = cons x (go xs) para :: (b, a -> ([a],b) -> b) -> [a] -> b para (nil,cons) = go where go [] = nil go (x:xs) = cons x (xs, go xs) That is, we pass the algebra both the recursive structure and the result of the recursive call, instead of only passing the latter. In terms of the definability of (mathematical) functions, cata and para have the same power; however, in terms of algorithms, paramorphisms are strictly more powerful. For example, for Peano-style natural numbers with paramorphisms we can implement an O(1)-time predecessor function; but with only catamorphisms the best we can do is O(n)-time. Analogously for lists, paramorphisms give us an O(1)-time tail function (with structure sharing), whereas catamorphisms only admit O(n)-time (without structure sharing). The dropWhile function is a generalization of tail, so it'll run into exactly the same problem. The trick, of course, is getting fusion to work for build/para (or dually: apo/destroy). The problem is para doesn't make linear use of the recursive substructure, so it isn't as amenable to eliminating intermediate structures. In fact, there's a lot of interesting room for research here.
Does anyone have any thoughts about whether this one is worth pursuing?
The fact that it can result in massive copying is disturbing. Before adopting it in the core libraries you'd want to make sure you can guide the rewrite rules so that this definition of dropWhile is only ever used when it will be fused away. But, before taking the time to figure out how to do that, you'll probably want to find a bunch of examples in the wild so that you can quantify the benefits and compare that to the expected development cost. For example, if you have a program that makes extensive use of dropWhile, you could manually inline the fusing version and see how it compares. -- Live well, ~wren

Thanks for the info. I think I'm going to leave that one alone for now and
focus on the ones that are more obviously good ideas: scanr, takeWhile, and
(now that we have appropriate optimizations) scanl and (hopefully, if no
one objects) scanl', as well as the semi-related inits. I'd also like to
revisit length, which is currently a horrible mess of rules, and see if the
new foldl' allows it to be written efficiently without so much trouble.
On Jul 25, 2014 5:00 PM, "wren romano"
On Fri, Jul 25, 2014 at 12:35 AM, David Feuer
wrote: This is generally awful, because it could copy a large portion of the list with no benefit.
This is a general example of why paramorphisms are more powerful than catamorphisms. That is, we have the following two notions of "folding", where I will uncurry things to make the pattern clearer:
cata :: (b, a -> b -> b) -> [a] -> b cata (nil,cons) = go where go [] = nil go (x:xs) = cons x (go xs)
para :: (b, a -> ([a],b) -> b) -> [a] -> b para (nil,cons) = go where go [] = nil go (x:xs) = cons x (xs, go xs)
That is, we pass the algebra both the recursive structure and the result of the recursive call, instead of only passing the latter. In terms of the definability of (mathematical) functions, cata and para have the same power; however, in terms of algorithms, paramorphisms are strictly more powerful. For example, for Peano-style natural numbers with paramorphisms we can implement an O(1)-time predecessor function; but with only catamorphisms the best we can do is O(n)-time. Analogously for lists, paramorphisms give us an O(1)-time tail function (with structure sharing), whereas catamorphisms only admit O(n)-time (without structure sharing). The dropWhile function is a generalization of tail, so it'll run into exactly the same problem.
The trick, of course, is getting fusion to work for build/para (or dually: apo/destroy). The problem is para doesn't make linear use of the recursive substructure, so it isn't as amenable to eliminating intermediate structures. In fact, there's a lot of interesting room for research here.
Does anyone have any thoughts about whether this one is worth pursuing?
The fact that it can result in massive copying is disturbing. Before adopting it in the core libraries you'd want to make sure you can guide the rewrite rules so that this definition of dropWhile is only ever used when it will be fused away. But, before taking the time to figure out how to do that, you'll probably want to find a bunch of examples in the wild so that you can quantify the benefits and compare that to the expected development cost. For example, if you have a program that makes extensive use of dropWhile, you could manually inline the fusing version and see how it compares.
-- Live well, ~wren

Hi David, Am Freitag, den 25.07.2014, 17:32 -0400 schrieb David Feuer:
I think I'm going to leave that one alone for now and focus on the ones that are more obviously good ideas: scanr, takeWhile, and (now that we have appropriate optimizations) scanl and (hopefully, if no one objects) scanl', as well as the semi-related inits. I'd also like to revisit length, which is currently a horrible mess of rules, and see if the new foldl' allows it to be written efficiently without so much trouble.
thanks for that, by the way. I guess people generally expect more functions to be fusable than they are at the moment. I would suggest to attack the problem more systemematically. Can you create a comprehensive benchmark for list functions? Something along the lines of * for every list function, create one or a few micro benchmarks. * put them into one suite, using criterion to measure them. Make it possible to easily test different implementations (e.g. have lists `lengthImpls = [("Foo", Foo.length), ... ]` so that testing another implementations amounts to writing Foo.hs and importing that. * additionally, put them into one benchmark not using criterion (but rather executing all of them on a reasonable size setting) so that we can add it to nofib. * the, based on that, we can make founded decisions. The point of the nofib benchmark is to make sure we don’t regress when we change something in the compiler or the libraries. It will notify us that something went amiss, and then we can use the criterion-based testsuite to figure out what exactly broke. It would also be a benchmark for other fusion techniques to be measured against. If it turns out to be useful and if you are in academics you can probably talk about the benchmark suite at some workshop and get citations from every following list-fusion paper ;-) The microbenchmarks should test both fusable and unfusable uses of the functions. You can easily turn one into the other using don'tFuse = id {-# NOINLINE don'tFuse #-} somewhere in the pipeline. For functions that both consume and produce, we’d want to test all four combinations. Finally, we need to put an eye on binary sizes: If fusion does not happen we want to use a compiled version of the function, and not re-create it in every use site. nofib tracks module sizes, so if the nofib version of your benchmark is split into modules in a sensible way, these module size reports will be worth watching. Is that something you would like to work on, and take the lead? If you put your code in a git repo somewhere I’ll review and contribute, of course. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

I am not convinced that Criterion is the appropriate tool for most of
this work. I believe we need to look primarily at allocation numbers
when looking at these in isolation, and only look at benchmarks
comparing the performance of realistic programs using them. The
trouble is that allocation and nursery GC are both very fast (normally
that's a good thing), so I believe the primary benefit we hope to get
from fusion is better L1 cache utilization. We will only see that in
benchmarks when enough other computation is going on in the vicinity
of the list pipeline. It may be possible to write benchmarks to
investigate that, but I certainly don't know how. That said, I do
believe your idea of some sort of automated/semi-automated test suite
is a good one. I will try to learn a bit more about GHC profiling and
see if I can come up with at least a first draft of a rough test
suite.
On Sat, Jul 26, 2014 at 7:16 AM, Joachim Breitner
Hi David,
Am Freitag, den 25.07.2014, 17:32 -0400 schrieb David Feuer:
I think I'm going to leave that one alone for now and focus on the ones that are more obviously good ideas: scanr, takeWhile, and (now that we have appropriate optimizations) scanl and (hopefully, if no one objects) scanl', as well as the semi-related inits. I'd also like to revisit length, which is currently a horrible mess of rules, and see if the new foldl' allows it to be written efficiently without so much trouble.
thanks for that, by the way. I guess people generally expect more functions to be fusable than they are at the moment.
I would suggest to attack the problem more systemematically. Can you create a comprehensive benchmark for list functions? Something along the lines of * for every list function, create one or a few micro benchmarks. * put them into one suite, using criterion to measure them. Make it possible to easily test different implementations (e.g. have lists `lengthImpls = [("Foo", Foo.length), ... ]` so that testing another implementations amounts to writing Foo.hs and importing that. * additionally, put them into one benchmark not using criterion (but rather executing all of them on a reasonable size setting) so that we can add it to nofib. * the, based on that, we can make founded decisions.
The point of the nofib benchmark is to make sure we don’t regress when we change something in the compiler or the libraries. It will notify us that something went amiss, and then we can use the criterion-based testsuite to figure out what exactly broke.
It would also be a benchmark for other fusion techniques to be measured against. If it turns out to be useful and if you are in academics you can probably talk about the benchmark suite at some workshop and get citations from every following list-fusion paper ;-)
The microbenchmarks should test both fusable and unfusable uses of the functions. You can easily turn one into the other using don'tFuse = id {-# NOINLINE don'tFuse #-} somewhere in the pipeline. For functions that both consume and produce, we’d want to test all four combinations.
Finally, we need to put an eye on binary sizes: If fusion does not happen we want to use a compiled version of the function, and not re-create it in every use site. nofib tracks module sizes, so if the nofib version of your benchmark is split into modules in a sensible way, these module size reports will be worth watching.
Is that something you would like to work on, and take the lead? If you put your code in a git repo somewhere I’ll review and contribute, of course.
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Why not do both? :-) criterion is a very powerful tool for these micro
benchmark workloads.
On Saturday, July 26, 2014, David Feuer
I am not convinced that Criterion is the appropriate tool for most of this work. I believe we need to look primarily at allocation numbers when looking at these in isolation, and only look at benchmarks comparing the performance of realistic programs using them. The trouble is that allocation and nursery GC are both very fast (normally that's a good thing), so I believe the primary benefit we hope to get from fusion is better L1 cache utilization. We will only see that in benchmarks when enough other computation is going on in the vicinity of the list pipeline. It may be possible to write benchmarks to investigate that, but I certainly don't know how. That said, I do believe your idea of some sort of automated/semi-automated test suite is a good one. I will try to learn a bit more about GHC profiling and see if I can come up with at least a first draft of a rough test suite.
On Sat, Jul 26, 2014 at 7:16 AM, Joachim Breitner
javascript:;> wrote: Hi David,
Am Freitag, den 25.07.2014, 17:32 -0400 schrieb David Feuer:
I think I'm going to leave that one alone for now and focus on the ones that are more obviously good ideas: scanr, takeWhile, and (now that we have appropriate optimizations) scanl and (hopefully, if no one objects) scanl', as well as the semi-related inits. I'd also like to revisit length, which is currently a horrible mess of rules, and see if the new foldl' allows it to be written efficiently without so much trouble.
thanks for that, by the way. I guess people generally expect more functions to be fusable than they are at the moment.
I would suggest to attack the problem more systemematically. Can you create a comprehensive benchmark for list functions? Something along the lines of * for every list function, create one or a few micro benchmarks. * put them into one suite, using criterion to measure them. Make it possible to easily test different implementations (e.g. have lists `lengthImpls = [("Foo", Foo.length), ... ]` so that testing another implementations amounts to writing Foo.hs and importing that. * additionally, put them into one benchmark not using criterion (but rather executing all of them on a reasonable size setting) so that we can add it to nofib. * the, based on that, we can make founded decisions.
The point of the nofib benchmark is to make sure we don’t regress when we change something in the compiler or the libraries. It will notify us that something went amiss, and then we can use the criterion-based testsuite to figure out what exactly broke.
It would also be a benchmark for other fusion techniques to be measured against. If it turns out to be useful and if you are in academics you can probably talk about the benchmark suite at some workshop and get citations from every following list-fusion paper ;-)
The microbenchmarks should test both fusable and unfusable uses of the functions. You can easily turn one into the other using don'tFuse = id {-# NOINLINE don'tFuse #-} somewhere in the pipeline. For functions that both consume and produce, we’d want to test all four combinations.
Finally, we need to put an eye on binary sizes: If fusion does not happen we want to use a compiled version of the function, and not re-create it in every use site. nofib tracks module sizes, so if the nofib version of your benchmark is split into modules in a sensible way, these module size reports will be worth watching.
Is that something you would like to work on, and take the lead? If you put your code in a git repo somewhere I’ll review and contribute, of course.
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de javascript:; • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de javascript:; • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org javascript:;
_______________________________________________ Libraries mailing list Libraries@haskell.org javascript:; http://www.haskell.org/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org javascript:; http://www.haskell.org/mailman/listinfo/libraries

On Sat, Jul 26, 2014 at 12:55 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Why not do both? :-) criterion is a very powerful tool for these micro benchmark workloads.
Do you know where I can find real uses of scanr, scanl, and takeWhile? The GHC type checker seems to use scanl once. That's not enough to get a sense.

Hi, Am Freitag, den 25.07.2014, 00:35 -0400 schrieb David Feuer:
This is generally awful, because it could copy a large portion of the list with no benefit. However, if it fuses, it will sometimes give good results. For example, compiling
f m n = foldr (+) 0 $ dropWhile (
yields Core that does not produce a intermediate list. Does anyone have any thoughts about whether this one is worth pursuing?
in general, I’d be doubtful about this. I guess it shouldn’t happen unless it fuses on both sides (as then there is no sharing, is it?) which is probably not easily expressible with RULES. Greetings, Joachim -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de Jabber-ID: nomeata@joachim-breitner.de
participants (4)
-
Carter Schonwald
-
David Feuer
-
Joachim Breitner
-
wren romano