
Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
This function seems to perform significantly better. For example, the program below takes about 15 seconds with the old inits, and only 3 seconds with the new version (tested with GHC 6.4.1 and -O2).
main = print $ sum $ map sum $ inits [1..7000]
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries. Spencer Janssen

On 4/7/06, Spencer Janssen
Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
This function seems to perform significantly better. For example, the program below takes about 15 seconds with the old inits, and only 3 seconds with the new version (tested with GHC 6.4.1 and -O2).
main = print $ sum $ map sum $ inits [1..7000]
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
That's quite a bit faster on my machine as well. I think the following slight variation may be a bit clearer, though: inits xs = [] : zipWith take [1..length xs] (repeat xs) -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Sebastian Sylvan wrote:
On 4/7/06, Spencer Janssen
wrote: Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
This function seems to perform significantly better. For example, the program below takes about 15 seconds with the old inits, and only 3 seconds with the new version (tested with GHC 6.4.1 and -O2).
main = print $ sum $ map sum $ inits [1..7000]
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
That's quite a bit faster on my machine as well. I think the following slight variation may be a bit clearer, though: inits xs = [] : zipWith take [1..length xs] (repeat xs)
-- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
The length xs wont work on infinite lists. So you can't do : take 100 (length (inits [1..])) -- Tim Toorop

On 4/8/06, Tim Toorop
Sebastian Sylvan wrote:
On 4/7/06, Spencer Janssen
wrote: Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
This function seems to perform significantly better. For example, the program below takes about 15 seconds with the old inits, and only 3 seconds with the new version (tested with GHC 6.4.1 and -O2).
main = print $ sum $ map sum $ inits [1..7000]
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
That's quite a bit faster on my machine as well. I think the following slight variation may be a bit clearer, though: inits xs = [] : zipWith take [1..length xs] (repeat xs)
-- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
The length xs wont work on infinite lists.
Ah of course! /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Tim Toorop wrote:
Sebastian Sylvan wrote:
On 4/7/06, Spencer Janssen
wrote: Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
This function seems to perform significantly better. For example, the program below takes about 15 seconds with the old inits, and only 3 seconds with the new version (tested with GHC 6.4.1 and -O2).
main = print $ sum $ map sum $ inits [1..7000]
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
That's quite a bit faster on my machine as well. I think the following slight variation may be a bit clearer, though: inits xs = [] : zipWith take [1..length xs] (repeat xs)
-- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
The length xs wont work on infinite lists. So you can't do : take 100 (length (inits [1..]))
-- Tim Toorop _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries Hmm.. that won't work anyway. It should have been: take 100 (inits [1..]) , ofcourse. My bad.
-- Tim Toorop

Hello Sebastian, Saturday, April 8, 2006, 5:43:44 PM, you wrote:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs) inits xs = [] : zipWith take [1..length xs] (repeat xs)
original defintion use a clever trick whci allows to not fully evaluate list before proceeding. this will require less memory for such lists as [1..10^6] and moreover it's the only way to work with infinite lists -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Fri, 07 Apr 2006, "Spencer Janssen"
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
It is not a drop in replacement. The original inits is strict, this one isn't. The specification of inits (from the Haskell 98 report): inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs) -- /NAD

On 2006-04-08, Nils Anders Danielsson
On Fri, 07 Apr 2006, "Spencer Janssen"
wrote: inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
It is not a drop in replacement. The original inits is strict, this one isn't.
The specification of inits (from the Haskell 98 report):
inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
Is that a property many programs depend on? I'd actually call that a bug of the original. -- Aaron Denney -><-

Aaron Denney wrote:
On 2006-04-08, Nils Anders Danielsson
wrote: On Fri, 07 Apr 2006, "Spencer Janssen"
wrote: inits xs = [] : (zipWith take [1..] $ map (const xs) xs) As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries. It is not a drop in replacement. The original inits is strict, this one isn't.
The specification of inits (from the Haskell 98 report):
inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
Is that a property many programs depend on? I'd actually call that a bug of the original.
It may break some things: head (Data.List.inits undefined) is an error head (New.Spiffy.inits undefined) is [] -- Chris

Chris Kuklewicz wrote:
Aaron Denney wrote:
On 2006-04-08, Nils Anders Danielsson
wrote: On Fri, 07 Apr 2006, "Spencer Janssen"
wrote: inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
It is not a drop in replacement. The original inits is strict, this one isn't.
The specification of inits (from the Haskell 98 report):
inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
Is that a property many programs depend on? I'd actually call that a bug of the original.
It may break some things:
head (Data.List.inits undefined) is an error
head (New.Spiffy.inits undefined) is []
Is the head of the inits of undefined really an error? Since the head of inits [] is also [] ... But if you really want that undefined to produce an error.. you could just : inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn inits' _ = undefined

Tim Toorop wrote:
Chris Kuklewicz wrote:
Aaron Denney wrote:
On 2006-04-08, Nils Anders Danielsson
wrote: On Fri, 07 Apr 2006, "Spencer Janssen"
wrote: inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
It is not a drop in replacement. The original inits is strict, this one isn't.
The specification of inits (from the Haskell 98 report):
inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
Is that a property many programs depend on? I'd actually call that a bug of the original.
It may break some things:
head (Data.List.inits undefined) is an error
head (New.Spiffy.inits undefined) is []
Is the head of the inits of undefined really an error? Since the head of inits [] is also [] ... But if you really want that undefined to produce an error.. you could just : inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn inits' _ = undefined
Exactly. Now inits' *is* a drop in replacement for inits. -- Chris

On 2006-04-08, Chris Kuklewicz
Is the head of the inits of undefined really an error? Since the head of inits [] is also [] ... But if you really want that undefined to produce an error.. you could just : inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn inits' _ = undefined
Exactly. Now inits' *is* a drop in replacement for inits.
Right, but the new spiffy inits seems to be a strict superset. Does anything plausibly depend on the strictness of the original. I think it was written that way for clarity, not for the strictness properties. -- Aaron Denney -><-

Aaron Denney wrote:
On 2006-04-08, Chris Kuklewicz
wrote: Is the head of the inits of undefined really an error? Since the head of inits [] is also [] ... But if you really want that undefined to produce an error.. you could just : inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn inits' _ = undefined
Exactly. Now inits' *is* a drop in replacement for inits.
Right, but the new spiffy inits seems to be a strict superset. Does anything plausibly depend on the strictness of the original. I think it was written that way for clarity, not for the strictness properties.
I have nothing that depends on the difference. And it really only affects (inits undefined). There is a certain progressions to this though (A) inits undefined is undefined (B) inits [undefined] is []:[undefined] (C) inits [1,undefined] is []:[1]:[1,undefined] The original replacement made (A) behave like (B), which breaks the nice progression. -- Chris

Aaron Denney wrote:
On 2006-04-08, Chris Kuklewicz
wrote: Is the head of the inits of undefined really an error? Since the head of inits [] is also [] ... But if you really want that undefined to produce an error.. you could just : inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn inits' _ = undefined
Exactly. Now inits' *is* a drop in replacement for inits.
Right, but the new spiffy inits seems to be a strict superset. Does anything plausibly depend on the strictness of the original. I think it was written that way for clarity, not for the strictness properties.
The inits' _ = undefined can be left out, but may be useful to some clarity. On the other hand, I think the new spiffy inits is quite clear once you notice why its map (const xn) $ undefined:xn This version doesn't use a weird recursive function. Its just.. the definition of inits I think. first element of inits, take the first element. 2nd element of inits, take the first plus the next element ... But that's just my logic... probably why I wrote it like this in the first place ( just wanted to see if I could write the inits without using a book ;) )

On 4/8/06, Tim Toorop
On the other hand, I think the new spiffy inits is quite clear once you notice why its map (const xn) $ undefined:xn
I think it would be even clearer if it were defined like this:
inits xs = zipWith take [0..] $ map (const xs) xs
That way you don't have this ugly special case for the first element in the list. But there is an even better advantage: it is strict again! So it's termination properties are exactly like the Prelude one. This version *can* be used as a drop-in replacement of the Prelude version. Cheers, /Josef

Josef Svenningsson wrote:
On 4/8/06, Tim Toorop
wrote: On the other hand, I think the new spiffy inits is quite clear once you notice why its map (const xn) $ undefined:xn
I think it would be even clearer if it were defined like this:
inits xs = zipWith take [0..] $ map (const xs) xs
That way you don't have this ugly special case for the first element in the list. But there is an even better advantage: it is strict again! So it's termination properties are exactly like the Prelude one. This version *can* be used as a drop-in replacement of the Prelude version.
Cheers,
/Josef
Yes it is very clear ... except for the fact that it doesn't work correctly. It misses the last element. now we could write inits2' xn = (zipWith take [0..] $ map (const xn) xn ) ++ [xn] Though I dont know if this really is clearer, and it seems to make the garbage collector work slightly more And which I think does exactly the same as inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn

On 2006-04-08, Tim Toorop
Josef Svenningsson wrote:
On 4/8/06, Tim Toorop
wrote: On the other hand, I think the new spiffy inits is quite clear once you notice why its map (const xn) $ undefined:xn
I think it would be even clearer if it were defined like this:
inits xs = zipWith take [0..] $ map (const xs) xs
That way you don't have this ugly special case for the first element in the list. But there is an even better advantage: it is strict again! So it's termination properties are exactly like the Prelude one. This version *can* be used as a drop-in replacement of the Prelude version.
Cheers,
/Josef
Yes it is very clear ... except for the fact that it doesn't work correctly. It misses the last element. now we could write inits2' xn = (zipWith take [0..] $ map (const xn) xn ) ++ [xn] Though I dont know if this really is clearer, and it seems to make the garbage collector work slightly more And which I think does exactly the same as inits' xn@(_:_) = zipWith take [0..] $ map (const xn) $ undefined:xn
Yes, but is there any real need to use the ugly pattern syntax? Surely inits xs = zipWith take [0..] $ map (const xs) (undefined:xs) (I still think letting head $ inits undefined be [] rather than undefined is better than the current def.) -- Aaron Denney -><-

On 4/9/06, Aaron Denney
inits xs = zipWith take [0..] $ map (const xs) (undefined:xs)
Nicest version so far!
(I still think letting head $ inits undefined be [] rather than undefined is better than the current def.)
Yes, from a termination point of view it surely seems better with the lazy version. But I believe that this is the wrong way of seeing it. These termination issues are hardly any problem in every-day programming. But what few people here seem to realize is that a change in the strictness of a function will change the space behaviour of a program. I use that as a rule of thumb when inspecting Haskell code: whenever the strictness changes that will lead to a change in memory consumption. So changing inits to a lazier version will not affect a lot of program termination-wise but it will have effects on the space consumption. Whether these effects will be good of bad depends on the program, but the fact remains, there will be a change. I think this is unwelcome and that's why I think that the strictness properties of the Prelude functions should be preserved. Cheers, /Josef

Aaron Denney wrote:
On 2006-04-08, Nils Anders Danielsson
wrote: On Fri, 07 Apr 2006, "Spencer Janssen"
wrote: inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
As this version performs much better and will work as a drop in replacement, I suggest that it be included in the hierarchical libraries.
It is not a drop in replacement. The original inits is strict, this one isn't.
The specification of inits (from the Haskell 98 report):
inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
Is that a property many programs depend on? I'd actually call that a bug of the original.
If you know anything that is bad about the definition above, please tell. But so far I haven't found _anything_ the original inits performs better on. I profiled some expressions. And as you can see, the original inits seems to create alot of junk. I did profiling with print ( inits [1..5000]), with which the original version is just a tad slower. But with print ( length (inits [..])) I couldn't even compare the original version to this one. The old had problems to calculate length (inits [1..20000]) while this one easilly did length (inits [1..500000]). The output is in the pdf files. _Note_ that the length is of [1..500000]. I couldn't be bothered to run my computer for some hours to let the original version of inits calculate it. And this version of inits didn't even show up on the graph when calculating [1..20000] in case you wonder why its not there.

Spencer Janssen wrote:
Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one. Objections about the relaxed strictness are noted, but I subscribe to the view that the original is more strict than necessary, and the general trend for Data.List functions is to be as lazy as possible. Cheers, Simon

On 4/10/06, Simon Marlow
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one. Objections about the relaxed strictness are noted, but I subscribe to the view that the original is more strict than necessary, and the general trend for Data.List functions is to be as lazy as possible.
Fair enough. But how does it play with list fusion? I know that both (:) and zipWith are good producers but how does it work when they are composed like this? And how does it compare with the strict version which is just a call to zipWith (and must therefore be a good producer I assume). Cheers, /Josef

"Josef Svenningsson"
On 4/10/06, Simon Marlow
wrote: I propose to replace inits in Data.List with this one.
Fair enough. But how does it play with list fusion?
I don't know the answer to your question, but it does prompt another one of my own.
I know that both (:) and zipWith are good producers but how does it work when they are composed like this? And how does it compare with the strict version which is just a call to zipWith (and must therefore be a good producer I assume).
Are the larger sizes of zipWith (zipWith3, zipWith4, ...) also good producers? I have some evidence from a recent application I have been writing that they may not be. At least, reading the -ddump-simpl output, it does not seem clear-cut that a pipeline e.g. (concat . f . map g . zipWith3 h as bs) cs is turned into a simple loop, although perhaps I am not reading it right. Is there any received wisdom on what should be expected here? Regards, Malcolm

On 4/10/06, Malcolm Wallace
Are the larger sizes of zipWith (zipWith3, zipWith4, ...) also good producers? I have some evidence from a recent application I have been writing that they may not be. At least, reading the -ddump-simpl output, it does not seem clear-cut that a pipeline e.g. (concat . f . map g . zipWith3 h as bs) cs is turned into a simple loop, although perhaps I am not reading it right. Is there any received wisdom on what should be expected here?
The GHC documentation is rather clear at this point. See: http://www.haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html#i... The larger zipWiths are not listed as good producers, hence you cannot expect fusion to happen. Cheers, /Josef

"Josef Svenningsson"
The GHC documentation is rather clear at this point. The larger zipWiths are not listed as good producers, hence you cannot expect fusion to happen.
Thanks for the pointer, it was very useful. So now, I am trying to write some RULES that will turn zipWith[3..n] into both good producers and good consumers. But to simplify a bit, I'll first concentrate on improving the existing RULES for pair-wise zipWith. zipWith is essentially turned into a foldr2, and then I see that GHC.Base has rules for consuming either of the lists via the foldr/build mechanism: foldr2 f z (build g) ys foldr2 f z xs (build g) but there is no rule for consuming both lists simultaneously. Let's have a go at defining it: {-# RULES "foldr2/both" forall k z (g::forall c.(a->c->c)->c->c) (h::forall c.(b->c->c)->c->c) . foldr2 f z (build g) (build h) = g (\x _-> h (\y r-> f x y r) z) z #-} The intuition is that we run the first generator, g, to get x, then the second generator, h, to get y, then apply the pairwise function f to x and y, with r being the recursive call back into the next iteration. If either of the generators fails to produce a new element, we drop out of the iteration through the z (nil) argument. Does this look about right? So the main problem now is that the type annotations for g and h in the rule are not correct. GHC insists that if the bound variable is polymorphic, it must have a type signature. The result type of both g and h should be the same, yet the way it is written with separate quantifications, they can be different, so the rule will not fire. I really want to quantify the 'forall c' at an outer level, but I don't think there is a mechanism to do that? Regards, Malcolm

Malcolm Wallace wrote:
{-# RULES "foldr2/both" forall k z (g::forall c.(a->c->c)->c->c) (h::forall c.(b->c->c)->c->c) . foldr2 f z (build g) (build h) = g (\x _-> h (\y r-> f x y r) z) z #-}
Looks wrong to me. Somehow it doesn't feel right that the first argument to g doesn't use its sencond argument and I think, what you wrote is actually foldr (f (head (build g))) z (build h) or something like that. Wasn't there some consensus that foldr/build cannot deforest both lists that go into zip? Intuitively it feels right: the "loop" that drives the calculation is contained in the build, and it cannot call out to the other generator to just get one element. Udo. -- "The key to performance is elegance, not battalions of special cases." -- Jon Bentley and Doug McIlroy

Udo Stenzel
Malcolm Wallace wrote:
{-# RULES "foldr2/both" forall k z (g::forall c.(a->c->c)->c->c) (h::forall c.(b->c->c)->c->c) . foldr2 f z (build g) (build h) = g (\x _-> h (\y r-> f x y r) z) z #-}
Looks wrong to me. Somehow it doesn't feel right that the first argument to g doesn't use its sencond argument and I think, what you wrote is actually
foldr (f (head (build g))) z (build h)
I was a bit worried that might be the case, yes.
Wasn't there some consensus that foldr/build cannot deforest both lists that go into zip?
I'd like to find out if that is indeed part of the folklore.
Intuitively it feels right: the "loop" that drives the calculation is contained in the build, and it cannot call out to the other generator to just get one element.
Hmm, and yet, one also feels that it must be possible to generate items from multiple lists in "lock-step" (without the list structure), since that is exactly the pattern being expressed by the zipWith family. Regards, Malcolm

On 4/10/06, Malcolm Wallace
Wasn't there some consensus that foldr/build cannot deforest both lists that go into zip?
I'd like to find out if that is indeed part of the folklore.
I'd rather call it an open research problem. But I think it is generally considered as unlikely to be possible.
Hmm, and yet, one also feels that it must be possible to generate items from multiple lists in "lock-step" (without the list structure), since that is exactly the pattern being expressed by the zipWith family.
I totally agree with your intuition. In fact, it is possible to achieve fusion in this case, we just don't know how to do it within the foldr/build setting. <plug> I couple of years ago I wrote a paper about it: http://www.cs.chalmers.se/~josefs/publications/fusion.pdf Readers digest: Using a variation on foldr/build (I call it the dual, but that's stretching it a bit) zipWith can be a good consumer in both its arguments at the same time. But this variation is unfortunately incompatible with foldr/build. </plug> It'd be really cool if you managed to crack the problem, but be warned that Smart People (read Simon Peyton :-) has tried hard and failed. Cheers, /Josef

Hello Josef, Monday, April 10, 2006, 9:22:31 PM, you wrote:
Wasn't there some consensus that foldr/build cannot deforest both lists that go into zip?
that is a typical problem of "two simultaneous control structures" and one it's solution is using coroutines. lazily evaluated lists is of course another solution, but it is not so fast may be, some lightweight form of coroutines can be used in such situations to perfrom job faster? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 4/11/06, Bulat Ziganshin
Hello Josef,
Monday, April 10, 2006, 9:22:31 PM, you wrote:
Wasn't there some consensus that foldr/build cannot deforest both lists that go into zip?
Umm... I didn't write that. Udo Stenzel did.
that is a typical problem of "two simultaneous control structures" and one it's solution is using coroutines. lazily evaluated lists is of course another solution, but it is not so fast
may be, some lightweight form of coroutines can be used in such situations to perfrom job faster?
I not sure how to understand your comment in the context of the present discussion. As I said before, it is indeed possible to fuse zipWith[1..] with other funcions. The only problem is that we don't know how to do it for all list argument at the same time *within the foldr/build framework*. And since ghc only uses foldr/build at the moment we have to fuse the zipWith family of functions by hand if we want to achieve fusion. Cheers, /Josef

Bulat Ziganshin:
may be, some lightweight form of coroutines can be used in such situations to perfrom job faster?
Well, lazy evaluation is itself a form of lightweight coroutining. The question is just how to plumb the compiler transformations such that the coroutines can work without generating intermediate structures. Josef Svenningsson:
As I said before, it is indeed possible to fuse zipWith[1..] with other funcions. The only problem is that we don't know how to do it for all list argument at the same time *within the foldr/build framework*. And since ghc only uses foldr/build at the moment we have to fuse the zipWith family of functions by hand if we want to achieve fusion.
Thanks for pointing to your ICFP'02 paper on this. Very interesting. The remaining difficulty is how to allow the co-existence of foldr/build with your destroy/unfoldr, so we get the benefits of both techniques. I have an idea about that. So, where you have unfoldr :: (s ->Maybe (a,s)) -> s -> [a] unfoldr g s = go s where go s = case g s of Nothing -> [] Just (x,s') -> x: go s' as the basis of a good producer, I thought, why not express this more like the 'build' pattern by abstracting the list constructors (:) and []. genUnfoldr :: (s -> Maybe (a,s)) -> s -> (a->b->b) -> b -> b genUnfoldr g s cons nil = go s where go s = case g s of Nothing -> nil Just (x,s') -> x `cons` go s' The original unfold can be expressed as unfoldr g s = genUnfoldr g s (:) [] or unfoldr g s = build (genUnfoldr g s) Now, we can use an unfoldr as the producer in *either* a foldr/build rule, or a destroy/unfoldr rule. But unfoldr also gives us the additional ability to fuse two producers together. Here is the fusing function (zipDU in your paper): zipG :: (s->Maybe (a,s)) -> (r->Maybe (b,r)) -> (s,r) -> Maybe ((a,b),(s,r)) g `zipG` h = \ (s,r)-> case g s of Nothing -> Nothing Just (x,s') -> case h r of Nothing -> Nothing Just (y,r') -> Just ((x,y),(s',r')) and here is the rule to use it in a zipWith-like context: {-# RULES "foldr2/unfoldr/unfoldr" foldr2 consumer z (unfoldr g s) (unfoldr h r) = foldr (\ (x,y) z-> consumer x y z) z (unfoldr (g `zipG` h) (s,r)) #-} Recalling that unfoldr is just a build, this then triggers the ordinary foldr/build rule, to become foldr (\ (x,y) z-> consumer x y z) z (build (genUnfoldr (g `zipG` h) (s,r))) ==> genUnfoldr (g`zipG`f) (s,r) (\ (x,y) z-> consumer x y z) z Perhaps this is not much better though, because where we used to have intermediate list structure, now we have intermediate pair structures. I would like to hope that other optimisations might still be able to remove these as well, but have not yet investigated. Regards, Malcolm

On 4/11/06, Malcolm Wallace
Thanks for pointing to your ICFP'02 paper on this. Very interesting.
I'm glad you liked it. :)
The remaining difficulty is how to allow the co-existence of foldr/build with your destroy/unfoldr, so we get the benefits of both techniques. I have an idea about that.
[Details cut out] First of all, I'm glad to see you've also discovered how fun it can be to do transformations of functional programs. This is real fun! :) Secondly, and this is a readers digest of what I write below, I've thought long and hard about how to marriage foldr/build with destroy/unfoldr and there simply doesn't seem to be a way which maintains the initial simplicity of foldr/build and which enables significantly more fusion. So, what you demonstrate is that given that we have two functions, both represented as unfoldr, then they can be fused with zipWith even though it is represented as a foldr2. But if we try to imagine a more concrete scenario things become more problematic. Which two functions could those be? Suppose they were both map functions. That's fine, map can be represented as an unfoldr. But wait a second! If we want to use foldr/build then we want to represent map in terms of foldr and build - not unfoldr! Dang! And this is really the core of the problem with unifying foldr/build and destroy/unfoldr - which representation certain functions should have. One could imagine having both representations and have GHC's rule mechanism choose the representation that works at a particular instance. In case both representations should work one can let GHC choose non-deterministically. The first problem with this is that it starts to get messy. But I have no doubt that it is doable. Secondly, I'm fairly sure (I have a faint memory of this but I can't quite recall it) that to achieve maximum fusion in certain situations then there are functions that must change representation from foldr/build to destroy/unfoldr or vice versa. One might prespond to this by saying: Fine, we're only dealing with compiler optimisations here anyway and it is difficult to guarantee that optimisations trigger in general and why should this be different. But one of the beauties with foldr/build is the producer/consumer abstraction which makes it very easy for programmers to predict when fusion will happen. I certainly don't want to be without that. I hate to bring so pessimistic opinions, you seem to be on a roll here. Perhaps there is a special case here lurking which can achieve a bit of fusion in the case for zipWith. But personally I doubt it.
genUnfoldr (g`zipG`f) (s,r) (\ (x,y) z-> consumer x y z) z
Perhaps this is not much better though, because where we used to have intermediate list structure, now we have intermediate pair structures. I would like to hope that other optimisations might still be able to remove these as well, but have not yet investigated.
There are optimisations that tackle this situation and GHC implements some but not all of them. Here are some relevant papers: http://research.microsoft.com/%7Esimonpj/Papers/cpr/index.htm ftp://ftp.cs.kun.nl/pub/Clean/papers/2000/groj2000-OptRecFunYTuples.pdf Cheers, /Josef

"Josef Svenningsson"
First of all, I'm glad to see you've also discovered how fun it can be to do transformations of functional programs. This is real fun! :)
Yes, calculating new programs from old is fun, if somewhat mind-bending.
there simply doesn't seem to be a way which maintains the initial simplicity of foldr/build and which enables significantly more fusion.
Perhaps you are right, but the phrase "open research question" invited me to give it a try.
So, what you demonstrate is that given that we have two functions, both represented as unfoldr, then they can be fused with zipWith even though it is represented as a foldr2. But if we try to imagine a more concrete scenario things become more problematic. Which two functions could those be? Suppose they were both map functions. That's fine, map can be represented as an unfoldr. But wait a second! If we want to use foldr/build then we want to represent map in terms of foldr and build - not unfoldr! Dang!
Well, the core of my idea is that instead of two stages, foldr/build or destroy/unfoldr, there are really three: foldr/build/genUnfoldr. I insist that, instead of writing good producers with just build, you must use build + genUnfoldr. This applies equally to hand-written producers and producers generated internally through RULES. There is still only one type of good consumer: those that can fuse foldr with build: "foldr/build" foldr f z (build g) = g f z "foldr/unfoldr" foldr f z (build (genUnfoldr g s)) = genUnfoldr g s f z But the crucial new thing is that, whilst there is no way to fuse multiple ordinary 'build' generators into a single pass, you *can* fuse 'genUnfoldr' generators together, to get simultaneous generation of multiple values. I'm still working out all the details - it may yet turn out that you quickly reach dead-ends where further fusion does not occur. But I'm hoping that it is possible to express whole trees of computation (composition pipelines + branching at zipWith) rather than just pipelines, in the one framework.
And this is really the core of the problem with unifying foldr/build and destroy/unfoldr - which representation certain functions should have.
In my scheme, there is a consistent representation, so I think it can be deterministic. Regards, Malcolm

Malcolm Wallace wrote:
Well, the core of my idea is that instead of two stages, foldr/build or destroy/unfoldr, there are really three: foldr/build/genUnfoldr.
As Josef already realized, foldr can be expressed in terms of destroy: *> foldr c n xs = destroy foldrDU xs *> where foldrDU g y = case g y of *> Nothing -> n *> Just (x,y') -> c x (foldrDU g y') The converse does not seem to be true, destroy is strictly more expressive than foldr. Similarly any unfoldr can be expressed as a build: *> unfoldr g y = build (unfoldrFB y) *> where unfoldrFB Nothing c n = n *> unfoldrFB (Just (x,y')) c n = c x (unfoldrFB y' c n) Using these definitions we get a foldr/unfoldr rule for free. (Or do we? I'm leaving the details as an exercise.) What we don't get is a destroy/build rule, but that seems impossible anyway. So to get maximum fusion, the general rule seems to be "prefer to write producers in terms of unfoldr and consumers in terms of foldr". However, the libraries of GHC have to be changed anyway.
I'm still working out all the details - it may yet turn out that you quickly reach dead-ends where further fusion does not occur. But I'm hoping that it is possible to express whole trees of computation (composition pipelines + branching at zipWith) rather than just pipelines, in the one framework.
That's my feeling, too. But that should already be possible using destroy/unfoldr alone. Actually I can think of lots of interesting consumers that (seem to) require destroy (foldl, zip, ReadP), but of no interesting producers that would require build. I'd expect problems in situations with a single producer and multiple consumers, but those aren't currently deforested anyway. Regards, Udo.

On Apr 11, 2006, at 2:37 PM, Udo Stenzel wrote:
Malcolm Wallace wrote:
Well, the core of my idea is that instead of two stages, foldr/ build or destroy/unfoldr, there are really three: foldr/build/genUnfoldr.
As Josef already realized, foldr can be expressed in terms of destroy:
*> foldr c n xs = destroy foldrDU xs *> where foldrDU g y = case g y of *> Nothing -> n *> Just (x,y') -> c x (foldrDU g y')
The converse does not seem to be true, destroy is strictly more expressive than foldr.
This is an intriguing question, and a way to get hackers like me interested... Proving equivalence would simply be a matter of finding a definition (which would have to use a whole lot of higher- order magic). Now I'm going to have to go back and convince myself it won't work all over again.
Similarly any unfoldr can be expressed as a build:
*> unfoldr g y = build (unfoldrFB y) *> where unfoldrFB Nothing c n = n *> unfoldrFB (Just (x,y')) c n = c x (unfoldrFB y' c n)
Using these definitions we get a foldr/unfoldr rule for free.
In fact, the pH compiler (back in 1994) had a deforestation pass based on a version of foldr/unfoldr deforestation, with none of this build or destroy nonsense. This is enough to express quite a few "good producers" and all the "good consumers" in the GHC sense. And we can indeed obtain a zipWith / unfoldr fusion rule, or rather several with an obvious order of application (I'm reconstructing this on the fly from recollections of a subtly different form; it'll be broadly correct but I'll probably mess up a detail along the way). 1) zip (unfoldr f z) (unfoldr g y) = unfoldr (hoist f g) (z,y) where hoist f g (z,y) = f z >>= \u -> g y >>= \v -> (u,v) -- in the Maybe monad 2) foldr g y (zip (unfoldr f z) xs) = foldr h (const y) xs z where h x k z = maybe y (\(v,w)-> g (x,v) (k w)) (f z) 3) as above, with arguments commuted. 4) zipWith xs ys = unfoldr u (xs,ys) where u (x:xs,y:ys) = Just ((x,y),(xs,ys)) u _ = Nothing We can squeeze reasonable-looking loops out of higher-order folds if we're willing to do a little bit of work. We do end up relying on transformations like worker/wrapper quite a bit to get rid of all those tuples... But that's what worker/wrapper is for.
So to get maximum fusion, the general rule seems to be "prefer to write producers in terms of unfoldr and consumers in terms of foldr".
Absolutely. The frustrating part: all those list transducers like reverse, take, drop, inits, tails... Some of them are expressible as folds or unfolds, but it's never terribly natural; we should expect the resulting code to be pretty awful. For these, either build or destroy seem to be indispensible. So the real quest here is for something which unites build and destroy. Then foldr and unfoldr are just a side show.
I'm still working out all the details - it may yet turn out that you quickly reach dead-ends where further fusion does not occur. But I'm hoping that it is possible to express whole trees of computation (composition pipelines + branching at zipWith) rather than just pipelines, in the one framework.
As I say, list transducers are a killer. Luckily map, concatMap, and filter are friendly---these are necessary to make comprehensions work out nicely. But other list functions are quite hard. I had a hack for reverse (we can turn foldr into foldl and vice versa, and foldl is just a higher-order foldr), but the general case is quite tricky. This is all very timely, as I've been taking a trip down memory lane lately; in Fortress we are representing catamorphisms in Boom-style fold-form directly as objects, and expressing aggregates as build- functions. -Jan-Willem Maessen
That's my feeling, too. But that should already be possible using destroy/unfoldr alone. Actually I can think of lots of interesting consumers that (seem to) require destroy (foldl, zip, ReadP), but of no interesting producers that would require build. I'd expect problems in situations with a single producer and multiple consumers, but those aren't currently deforested anyway.
Regards,
Udo. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Jan-Willem Maessen
foldr can be expressed in terms of destroy: The converse does not seem to be true, destroy is strictly more expressive than foldr.
This is an intriguing question, and a way to get hackers like me interested...
Please hack away! I'm just a humble user with an application that is running too slowly due to lots of intermediate structures that the compiler won't throw away without some persuasion ...
we can indeed obtain a zipWith / unfoldr fusion rule
1) zip (unfoldr f z) (unfoldr g y) = unfoldr (hoist f g) (z,y) where hoist f g (z,y) = f z >>= \u -> g y >>= \v -> (u,v)
Yes, I worked something like this one out myself.
2) foldr g y (zip (unfoldr f z) xs) = foldr h (const y) xs z where h x k z = maybe y (\(v,w)-> g (x,v) (k w)) (f z) 3) as above, with arguments commuted.
These are the versions that deforest just one of the producers, like ghc already does.
We can squeeze reasonable-looking loops out of higher-order folds if we're willing to do a little bit of work. We do end up relying on transformations like worker/wrapper quite a bit to get rid of all those tuples... But that's what worker/wrapper is for.
What I'm wondering is whether the foldr/unfoldr rules are strictly better than foldr/build, in the sense that they work on all the same subjects, but foldr/unfoldr additionally tackles some situations (zipWith) that foldr/build does not. I'm guessing from what you say they are actually incomparable - each handles some situations the other does not.
Absolutely. The frustrating part: all those list transducers like reverse, take, drop, inits, tails... Some of them are expressible as folds or unfolds, but it's never terribly natural; we should expect the resulting code to be pretty awful. For these, either build or destroy seem to be indispensible.
Ah. In my application I have a lot of these (especially 'drop'). Udo Stenzel wrote:
I'd expect problems in situations with a single producer and multiple consumers, but those aren't currently deforested anyway.
Oops, my application has a lot of *those* as well! For instance, zipWith4 f xs (drop 1 xs) (drop line xs) (drop plane xs) Ideally, I want the work of computing xs to be shared, *and* for zipWith4 to be a tight loop, something like unfoldr generate xs where generate f s = (f (s!!0) (s!!1) (s!!line) (s!!plane), tail s) Note that the intermediate list for xs is still there, really just as a form of ad hoc memoisation buffer. It seems unlikely that deforestation could introduce e.g. a circular buffer or whatever technique you would use in an imperative setting. Maybe some other program-calculational technique is required for this. Graham Hutton has a nice way of deriving such machine-level operational concepts from pure code. His example is exception-handling, where you start with a functional specification of what it should mean, and the technique derives stack markers to implement it. I wonder if the same technique could derive the minimal sharing buffer for the arguments of zipWith? IIRC, first you de-functionalise the specification, then use the continuation-passing transform, then re-functionalise. I'll have to look it up. Regards, Malcolm

On Apr 12, 2006, at 10:25 AM, Malcolm Wallace wrote:
Jan-Willem Maessen
wrote: We can squeeze reasonable-looking loops out of higher-order folds if we're willing to do a little bit of work. We do end up relying on transformations like worker/wrapper quite a bit to get rid of all those tuples... But that's what worker/wrapper is for.
What I'm wondering is whether the foldr/unfoldr rules are strictly better than foldr/build, in the sense that they work on all the same subjects, but foldr/unfoldr additionally tackles some situations (zipWith) that foldr/build does not. I'm guessing from what you say they are actually incomparable - each handles some situations the other does not.
That is my present understanding, yes---all because there's a foldr/ build, a foldr/unfoldr, and a destroy/unfoldr, but no destroy/build.
Absolutely. The frustrating part: all those list transducers like reverse, take, drop, inits, tails... Some of them are expressible as folds or unfolds, but it's never terribly natural; we should expect the resulting code to be pretty awful. For these, either build or destroy seem to be indispensible.
Ah. In my application I have a lot of these (especially 'drop').
Actually, take and drop aren't so hard (proving that my memory is
indeed a bit fuzzy):
take n = map snd . filter ((
Udo Stenzel wrote:
I'd expect problems in situations with a single producer and multiple consumers, but those aren't currently deforested anyway.
Oops, my application has a lot of *those* as well! For instance,
zipWith4 f xs (drop 1 xs) (drop line xs) (drop plane xs)
Ideally, I want the work of computing xs to be shared, *and* for zipWith4 to be a tight loop, something like
unfoldr generate xs where generate f s = (f (s!!0) (s!!1) (s!!line) (s!!plane), tail s)
Yes, multiple consumers are a problem, and RULES can't capture the most interesting cases. If we transform the whole program into arrow-form we could turn all those binding constructs into function compositions and reason about them equationally, but I'm pretty sure that would *not* be an improvement in the average hacker's life. Haven't seen Graham Hutton's work on exception handling, but it sounds like it has this flavor. Cool, but very tricky. -Jan
Note that the intermediate list for xs is still there, really just as a form of ad hoc memoisation buffer. It seems unlikely that deforestation could introduce e.g. a circular buffer or whatever technique you would use in an imperative setting.
Maybe some other program-calculational technique is required for this. Graham Hutton has a nice way of deriving such machine-level operational concepts from pure code. His example is exception-handling, where you start with a functional specification of what it should mean, and the technique derives stack markers to implement it. I wonder if the same technique could derive the minimal sharing buffer for the arguments of zipWith? IIRC, first you de-functionalise the specification, then use the continuation-passing transform, then re-functionalise. I'll have to look it up.
Regards, Malcolm _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Josef Svenningsson wrote:
On 4/10/06, Malcolm Wallace
wrote: Are the larger sizes of zipWith (zipWith3, zipWith4, ...) also good producers? I have some evidence from a recent application I have been writing that they may not be. At least, reading the -ddump-simpl output, it does not seem clear-cut that a pipeline e.g. (concat . f . map g . zipWith3 h as bs) cs is turned into a simple loop, although perhaps I am not reading it right. Is there any received wisdom on what should be expected here?
The GHC documentation is rather clear at this point. See: http://www.haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html#i...
The larger zipWiths are not listed as good producers, hence you cannot expect fusion to happen.
Thanks for the pointer, I wasn't aware of that :) In fact, I recently added RULEs for deforesting take, so I need to add it to that list. I don't see any good reason why the larger zipWiths cannot be good producers, it just needs somebody to write the RULEs and check that they work. The tricky bit is usually making sure that the foldr/build versions (just build in this case) revert to the efficient variant in the event that fusion doesn't happen. Cheers, Simon

Simon Marlow wrote:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one.
I think such a function should not use arithmetics (comparisons of ints via take) Christian for finite lists "reverse . map reverse . tails . reverse" is quite good (but it is 80% garbage collecting on my machine)

Simon Marlow wrote:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one.
I think such a function should not use arithmetics (comparisons of ints via take)
Christian
for finite lists "reverse . map reverse . tails . reverse" is quite good (but it is 80% garbage collecting on my machine) This seems faster when you want to actually use each value of the inits. (say the sum of the map sum on inits) But it won't work for infinite lists, and will take longer to get say,
Christian Maeder wrote: the 6th element of the inits, depending on the length of the list. So inits [1..1000000] !! 100 will take longer then inits [1..1000] !! 100 with your version. Since it is going to reverse the whole list. Tim Toorop.

Simon Marlow wrote:
Spencer Janssen wrote:
Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one. Objections about the relaxed strictness are noted, but I subscribe to the view that the original is more strict than necessary, and the general trend for Data.List functions is to be as lazy as possible.
Cheers, Simon
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000]) inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs -- Chris Kuklewicz

Chris Kuklewicz wrote:
Simon Marlow wrote:
Spencer Janssen wrote:
Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one. Objections about the relaxed strictness are noted, but I subscribe to the view that the original is more strict than necessary, and the general trend for Data.List functions is to be as lazy as possible.
Cheers, Simon
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
Seems to be quite faster in some cases! Though other cases it's a bit slower. Probably depends on the thing you want to use the inits for which one might be better. Though with getting the actual contents of inits (like with sum $ map sum $ intis) yours sure is faster! -- Tim Toorop

Chris Kuklewicz wrote:
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
This one looks nice, maybe the helper should be changed to allow "head $ inits' undefined": helper f xs = f [] : case xs of [] -> [] x : r -> helper (f . (x :)) r C.

On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like inits = map ($ []) . scanl (.) id . map (:) but this is also competitive: inits = map reverse . scanl (flip (:)) []

Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like
inits = map ($ []) . scanl (.) id . map (:)
That takes 3 times longer than the helper function definition. Barring fusion, it is creating an extra list. The "scanl" makes a list, the "map" makes a list and the "(f [])" makes a list. The helper function makes a list with "(f[])" and with "(...):helper...".
but this is also competitive:
inits = map reverse . scanl (flip (:)) []
I would never try "reverse" when looking for performance, but that runs at the same speed as the helper and allocates the same amount of space. -- Chris

Chris Kuklewicz wrote:
Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like
inits = map ($ []) . scanl (.) id . map (:)
That takes 3 times longer than the helper function definition. Barring fusion, it is creating an extra list. The "scanl" makes a list, the "map" makes a list and the "(f [])" makes a list.
The helper function makes a list with "(f[])" and with "(...):helper...".
but this is also competitive:
inits = map reverse . scanl (flip (:)) []
I would never try "reverse" when looking for performance, but that runs at the same speed as the helper and allocates the same amount of space.
I think we need to create a test case or something, before we say one function is better then the other. Just like with the helper function. The originally proposed function is sometimes a tad faster then the helper function. And sometimes very much slower. So we need to know what is important in the use of inits. And then see which one is faster with probably -O2 added. Because the functions seem act very differently in the ghci (VERY!) -- Tim Toorop

Tim Toorop wrote:
Chris Kuklewicz wrote:
Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like
inits = map ($ []) . scanl (.) id . map (:)
That takes 3 times longer than the helper function definition. Barring fusion, it is creating an extra list. The "scanl" makes a list, the "map" makes a list and the "(f [])" makes a list.
The helper function makes a list with "(f[])" and with "(...):helper...".
but this is also competitive:
inits = map reverse . scanl (flip (:)) []
I would never try "reverse" when looking for performance, but that runs at the same speed as the helper and allocates the same amount of space.
I think we need to create a test case or something, before we say one function is better then the other.
Defining best is tricky. That is why I told everyone my usage was (sum $ map length $ inits [1..10000]).
Just like with the helper function. The originally proposed function is sometimes a tad faster then the helper function. And sometimes very much slower.
Could you tell me what your usage is?
So we need to know what is important in the use of inits.
I am going out on a limb here and say : The usage metric must consume all of the output of inits. If someone wants (1) less than all the output of inits, and (2) wants the highest performance then they should consider writing a more specialized function to use instead of inits. If someone wants (1) all of the output of inits (2) wants the highest performance then they should not have to replace the inits in Data.List
And then see which one is faster with probably -O2 added.
Testing performance without -O2 is interesting, but anyone who cares whether Data.List.inits gets replaced will be using optimizations.
Because the functions seem act very differently in the ghci (VERY!)
Of course.
-- Tim Toorop

Chris Kuklewicz wrote:
Tim Toorop wrote:
Chris Kuklewicz wrote:
Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like
inits = map ($ []) . scanl (.) id . map (:)
That takes 3 times longer than the helper function definition. Barring fusion, it is creating an extra list. The "scanl" makes a list, the "map" makes a list and the "(f [])" makes a list.
The helper function makes a list with "(f[])" and with "(...):helper...".
but this is also competitive:
inits = map reverse . scanl (flip (:)) []
I would never try "reverse" when looking for performance, but that runs at the same speed as the helper and allocates the same amount of space.
I think we need to create a test case or something, before we say one function is better then the other.
Defining best is tricky. That is why I told everyone my usage was (sum $ map length $ inits [1..10000]).
Just like with the helper function. The originally proposed function is sometimes a tad faster then the helper function. And sometimes very much slower.
Could you tell me what your usage is?
So we need to know what is important in the use of inits.
I am going out on a limb here and say : The usage metric must consume all of the output of inits.
If someone wants (1) less than all the output of inits, and (2) wants the highest performance then they should consider writing a more specialized function to use instead of inits.
If someone wants (1) all of the output of inits (2) wants the highest performance then they should not have to replace the inits in Data.List
And then see which one is faster with probably -O2 added.
Testing performance without -O2 is interesting, but anyone who cares whether Data.List.inits gets replaced will be using optimizations.
Because the functions seem act very differently in the ghci (VERY!)
Of course.
Ok, I did some tests. And I think the inits with the helper is generally the fastest with compiling with -O2 (I also don't think the ghci matters much, since nobody is probably using haskell programs from the ghci) For those of you who do want to use it in the ghci... The inits with the helper function is by far the slowest in the ghci. inits3 is the best for use in the ghci (and fits easily in a let inits... in inits ) inits1 xs = [] : (zipWith take [1..] $ map (const xs) xs) inits2 = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs inits3 = map reverse . scanl (flip (:)) [] examples: main = print (sum $ map sum $ inits1 [1..20000]) inits1: real 0m17.710s user 0m17.457s sys 0m0.136s inits2: real 0m15.489s user 0m15.265s sys 0m0.128s inits3: real 0m12.251s user 0m12.080s sys 0m0.107s main = print (sum $ inits1 [1..] !! 5000000) inits1: real 0m4.909s user 0m4.393s sys 0m0.417s inits2: real 0m5.992s user 0m5.477s sys 0m0.435s inits3: real 0m8.277s user 0m7.780s sys 0m0.401s So you see. There is not 'a' fastest inits here. -- Tim Toorop main = print (sum $ map sum $ inits1 [1..20000]) inits1: real 0m17.710s user 0m17.457s sys 0m0.136s 6,405,773,340 bytes allocated in the heap 2,124,612,484 bytes copied during GC 426,528 bytes maximum residency (1982 sample(s)) 24433 collections in generation 0 ( 5.08s) 1982 collections in generation 1 ( 2.66s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 9.77s ( 9.75s elapsed) GC time 7.74s ( 8.11s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 17.51s ( 17.86s elapsed) %GC time 44.2% (45.4% elapsed) Alloc rate 655,657,455 bytes per MUT second Productivity 55.8% of total user, 54.7% of total elapsed inits2: real 0m15.489s user 0m15.265s sys 0m0.128s 4,006,422,940 bytes allocated in the heap 2,136,413,112 bytes copied during GC 655,988 bytes maximum residency (1055 sample(s)) 15280 collections in generation 0 ( 4.86s) 1055 collections in generation 1 ( 1.81s) 3 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 8.54s ( 8.52s elapsed) GC time 6.67s ( 6.95s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 15.21s ( 15.47s elapsed) %GC time 43.9% (44.9% elapsed) Alloc rate 469,136,175 bytes per MUT second Productivity 56.1% of total user, 55.2% of total elapsed inits3: ./a.out +RTS -sstderr 1333533340000 4,006,503,460 bytes allocated in the heap 1,247,253,272 bytes copied during GC 660,184 bytes maximum residency (76 sample(s)) 15280 collections in generation 0 ( 3.26s) 76 collections in generation 1 ( 0.16s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 8.65s ( 8.29s elapsed) GC time 3.42s ( 3.96s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 12.07s ( 12.25s elapsed) %GC time 28.3% (32.3% elapsed) Alloc rate 463,179,590 bytes per MUT second Productivity 71.7% of total user, 70.6% of total elapsed real 0m12.251s user 0m12.080s sys 0m0.107s main = print (length $ map sum $ inits [1..10000000]) inits1: ./a.out +RTS -sstderr 10000001 1,927,409,632 bytes allocated in the heap 484,969,800 bytes copied during GC 142,899,112 bytes maximum residency (9 sample(s)) 7352 collections in generation 0 ( 1.04s) 9 collections in generation 1 ( 1.55s) 275 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 2.42s ( 2.51s elapsed) GC time 2.59s ( 3.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.01s ( 5.51s elapsed) %GC time 51.7% (54.4% elapsed) Alloc rate 796,450,261 bytes per MUT second Productivity 48.3% of total user, 43.9% of total elapsed real 0m5.567s user 0m5.013s sys 0m0.395s inits2: ./a.out +RTS -sstderr 10000001 1,241,220,400 bytes allocated in the heap 701,736,288 bytes copied during GC 151,593,700 bytes maximum residency (9 sample(s)) 4734 collections in generation 0 ( 1.63s) 9 collections in generation 1 ( 1.68s) 292 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.60s ( 1.53s elapsed) GC time 3.31s ( 3.82s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.91s ( 5.35s elapsed) %GC time 67.4% (71.4% elapsed) Alloc rate 775,762,750 bytes per MUT second Productivity 32.6% of total user, 29.9% of total elapsed real 0m5.414s user 0m4.910s sys 0m0.387s inits3: ./a.out +RTS -sstderr 10000001 1,161,924,400 bytes allocated in the heap 785,531,992 bytes copied during GC 153,038,768 bytes maximum residency (9 sample(s)) 4432 collections in generation 0 ( 2.01s) 9 collections in generation 1 ( 1.74s) 295 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.59s ( 1.62s elapsed) GC time 3.75s ( 4.21s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.34s ( 5.83s elapsed) %GC time 70.2% (72.2% elapsed) Alloc rate 730,770,062 bytes per MUT second Productivity 29.8% of total user, 27.3% of total elapsed real 0m5.884s user 0m5.341s sys 0m0.445s main = print (length $ inits1 [1..10000000]) inits1: ./a.out +RTS -sstderr 10000001 1,563,333,496 bytes allocated in the heap 491,073,020 bytes copied during GC 146,087,820 bytes maximum residency (9 sample(s)) 5963 collections in generation 0 ( 0.96s) 9 collections in generation 1 ( 1.63s) 282 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 2.13s ( 2.23s elapsed) GC time 2.59s ( 2.89s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.72s ( 5.12s elapsed) %GC time 54.9% (56.4% elapsed) Alloc rate 733,959,387 bytes per MUT second Productivity 45.1% of total user, 41.6% of total elapsed real 0m5.183s user 0m4.728s sys 0m0.388s inits2: ./a.out +RTS -sstderr 10000001 880,860,504 bytes allocated in the heap 718,379,884 bytes copied during GC 160,190,556 bytes maximum residency (9 sample(s)) 3360 collections in generation 0 ( 1.33s) 9 collections in generation 1 ( 1.78s) 309 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.42s ( 1.38s elapsed) GC time 3.11s ( 3.76s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.53s ( 5.14s elapsed) %GC time 68.7% (73.2% elapsed) Alloc rate 620,324,298 bytes per MUT second Productivity 31.3% of total user, 27.6% of total elapsed real 0m5.198s user 0m4.537s sys 0m0.470s inits3: ./a.out +RTS -sstderr 10000001 1,161,924,400 bytes allocated in the heap 785,531,992 bytes copied during GC 153,038,768 bytes maximum residency (9 sample(s)) 4432 collections in generation 0 ( 2.05s) 9 collections in generation 1 ( 1.71s) 295 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.41s ( 1.67s elapsed) GC time 3.76s ( 4.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.17s ( 5.68s elapsed) %GC time 72.7% (70.6% elapsed) Alloc rate 824,059,858 bytes per MUT second Productivity 27.3% of total user, 24.8% of total elapsed real 0m5.738s user 0m5.179s sys 0m0.417s main = print (or $ concat $ inits1 $ ((replicate 20000 False) ++[True] ++ [] ) ) inits1: ./a.out +RTS -sstderr True 4,822,652,444 bytes allocated in the heap 1,427,002,776 bytes copied during GC 266,116 bytes maximum residency (1325 sample(s)) 18397 collections in generation 0 ( 5.19s) 1325 collections in generation 1 ( 0.86s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 5.57s ( 6.03s elapsed) GC time 6.05s ( 5.87s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 11.62s ( 11.90s elapsed) %GC time 52.1% (49.3% elapsed) Alloc rate 865,826,291 bytes per MUT second Productivity 47.9% of total user, 46.8% of total elapsed real 0m11.893s user 0m11.622s sys 0m0.176s inits2: ./a.out +RTS -sstderr True 2,404,621,440 bytes allocated in the heap 731,744,312 bytes copied during GC 28,236 bytes maximum residency (1 sample(s)) 9172 collections in generation 0 ( 1.88s) 1 collections in generation 1 ( 0.00s) 1 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 3.31s ( 3.09s elapsed) GC time 1.88s ( 2.28s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.19s ( 5.37s elapsed) %GC time 36.2% (42.5% elapsed) Alloc rate 726,471,734 bytes per MUT second Productivity 63.8% of total user, 61.6% of total elapsed real 0m5.374s user 0m5.199s sys 0m0.135s inits3: ./a.out +RTS -sstderr True 2,404,856,640 bytes allocated in the heap 729,353,700 bytes copied during GC 30,124 bytes maximum residency (1 sample(s)) 9173 collections in generation 0 ( 2.11s) 1 collections in generation 1 ( 0.00s) 1 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 3.07s ( 3.36s elapsed) GC time 2.11s ( 1.96s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.18s ( 5.32s elapsed) %GC time 40.7% (36.8% elapsed) Alloc rate 783,340,925 bytes per MUT second Productivity 59.3% of total user, 57.7% of total elapsed real 0m5.319s user 0m5.183s sys 0m0.059s main = print (sum $ inits1 [1..] !! 5000000) inits1: ./a.out +RTS -sstderr 12500002500000 1,319,160,468 bytes allocated in the heap 408,728,680 bytes copied during GC 91,254,564 bytes maximum residency (9 sample(s)) 4276 collections in generation 0 ( 1.02s) 9 collections in generation 1 ( 1.46s) 227 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.91s ( 1.98s elapsed) GC time 2.48s ( 2.89s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.39s ( 4.87s elapsed) %GC time 56.5% (59.3% elapsed) Alloc rate 690,659,930 bytes per MUT second Productivity 43.5% of total user, 39.2% of total elapsed real 0m4.909s user 0m4.393s sys 0m0.417s inits2: ./a.out +RTS -sstderr 12500002500000 916,901,092 bytes allocated in the heap 681,934,332 bytes copied during GC 120,143,084 bytes maximum residency (9 sample(s)) 2742 collections in generation 0 ( 1.64s) 9 collections in generation 1 ( 2.00s) 275 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.83s ( 1.78s elapsed) GC time 3.64s ( 4.16s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.47s ( 5.94s elapsed) %GC time 66.5% (70.0% elapsed) Alloc rate 501,038,848 bytes per MUT second Productivity 33.5% of total user, 30.8% of total elapsed real 0m5.992s user 0m5.477s sys 0m0.435s inits3: ./a.out +RTS -sstderr 12500002500000 1,117,490,752 bytes allocated in the heap 801,070,420 bytes copied during GC 136,574,424 bytes maximum residency (10 sample(s)) 3507 collections in generation 0 ( 2.59s) 10 collections in generation 1 ( 3.34s) 288 Mb total memory in use INIT time 0.00s ( 0.01s elapsed) MUT time 1.84s ( 1.89s elapsed) GC time 5.93s ( 6.32s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.77s ( 8.22s elapsed) %GC time 76.3% (76.9% elapsed) Alloc rate 607,331,930 bytes per MUT second Productivity 23.7% of total user, 22.4% of total elapsed real 0m8.277s user 0m7.780s sys 0m0.401s main = print ( sum $ map sum $ take 7000 $ drop 10000 $ inits1 [1..]) inits1: ./a.out +RTS -sstderr 652166665500 3,027,739,192 bytes allocated in the heap 1,198,514,144 bytes copied during GC 366,448 bytes maximum residency (1116 sample(s)) 11548 collections in generation 0 ( 3.11s) 1116 collections in generation 1 ( 1.39s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.62s ( 4.62s elapsed) GC time 4.50s ( 4.72s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 9.12s ( 9.34s elapsed) %GC time 49.3% (50.5% elapsed) Alloc rate 655,354,803 bytes per MUT second Productivity 50.7% of total user, 49.5% of total elapsed real 0m9.342s user 0m9.128s sys 0m0.143s inits2: ./a.out +RTS -sstderr 652166665500 1,893,647,932 bytes allocated in the heap 981,247,684 bytes copied during GC 566,448 bytes maximum residency (444 sample(s)) 7222 collections in generation 0 ( 2.06s) 444 collections in generation 1 ( 0.72s) 2 Mb total memory in use INIT time 0.00s ( 0.01s elapsed) MUT time 4.35s ( 4.35s elapsed) GC time 2.78s ( 2.95s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.13s ( 7.31s elapsed) %GC time 39.0% (40.4% elapsed) Alloc rate 435,321,363 bytes per MUT second Productivity 61.0% of total user, 59.5% of total elapsed real 0m7.309s user 0m7.131s sys 0m0.133s inits3: ./a.out +RTS -sstderr 652166665500 1,894,330,720 bytes allocated in the heap 979,096,380 bytes copied during GC 564,824 bytes maximum residency (443 sample(s)) 7225 collections in generation 0 ( 1.94s) 443 collections in generation 1 ( 0.93s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.24s ( 4.29s elapsed) GC time 2.87s ( 2.95s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.11s ( 7.24s elapsed) %GC time 40.4% (40.7% elapsed) Alloc rate 446,776,113 bytes per MUT second Productivity 59.6% of total user, 58.6% of total elapsed real 0m7.251s user 0m7.120s sys 0m0.058s main = print ( inits1 [1..4000]) inits1: ./a.out +RTS -sstderr 1,400,787,732 bytes allocated in the heap 388,860,884 bytes copied during GC 106,012 bytes maximum residency (364 sample(s)) 5343 collections in generation 0 ( 1.94s) 364 collections in generation 1 ( 0.12s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.54s ( 4.49s elapsed) GC time 2.06s ( 2.27s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.60s ( 6.76s elapsed) %GC time 31.2% (33.6% elapsed) Alloc rate 308,543,553 bytes per MUT second Productivity 68.8% of total user, 67.2% of total elapsed real 0m6.761s user 0m6.606s sys 0m0.065s inits2: ./a.out +RTS -sstderr 1,303,507,236 bytes allocated in the heap 397,923,788 bytes copied during GC 153,240 bytes maximum residency (352 sample(s)) 4972 collections in generation 0 ( 2.06s) 352 collections in generation 1 ( 0.09s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.07s ( 4.45s elapsed) GC time 2.15s ( 2.06s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.22s ( 6.51s elapsed) %GC time 34.6% (31.6% elapsed) Alloc rate 320,272,048 bytes per MUT second Productivity 65.4% of total user, 62.5% of total elapsed real 0m6.519s user 0m6.226s sys 0m0.113s inits3: ./a.out +RTS -sstderr 1,303,667,780 bytes allocated in the heap 397,794,396 bytes copied during GC 152,220 bytes maximum residency (352 sample(s)) 4973 collections in generation 0 ( 1.88s) 352 collections in generation 1 ( 0.20s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.25s ( 4.80s elapsed) GC time 2.08s ( 1.73s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.33s ( 6.53s elapsed) %GC time 32.9% (26.5% elapsed) Alloc rate 306,745,360 bytes per MUT second Productivity 67.1% of total user, 65.1% of total elapsed real 0m6.526s user 0m6.338s sys 0m0.101s main = sum $ map sum $ subseqs [1..1000] subseqs xs = [ ] : [t | i<-inits xs, t<-tails i, not (null t) ] inits1: ./a.out +RTS -sstderr 1,424,286,576 bytes allocated in the heap 17,410,888 bytes copied during GC 56,944 bytes maximum residency (17 sample(s)) 5369 collections in generation 0 ( 0.11s) 17 collections in generation 1 ( 0.00s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.99s ( 5.10s elapsed) GC time 0.11s ( 0.14s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.10s ( 5.24s elapsed) %GC time 2.2% (2.7% elapsed) Alloc rate 285,428,171 bytes per MUT second Productivity 97.8% of total user, 95.2% of total elapsed real 0m5.236s user 0m5.106s sys 0m0.080s inits2: ./a.out +RTS -sstderr 1,418,214,180 bytes allocated in the heap 17,529,592 bytes copied during GC 57,084 bytes maximum residency (17 sample(s)) 5346 collections in generation 0 ( 0.03s) 17 collections in generation 1 ( 0.00s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 5.10s ( 5.08s elapsed) GC time 0.03s ( 0.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.13s ( 5.18s elapsed) %GC time 0.6% (1.9% elapsed) Alloc rate 278,081,211 bytes per MUT second Productivity 99.4% of total user, 98.5% of total elapsed real 0m5.189s user 0m5.132s sys 0m0.029s inits3: ./a.out +RTS -sstderr 1,418,217,652 bytes allocated in the heap 17,461,052 bytes copied during GC 56,828 bytes maximum residency (17 sample(s)) 5346 collections in generation 0 ( 0.13s) 17 collections in generation 1 ( 0.00s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 5.13s ( 5.26s elapsed) GC time 0.13s ( 0.14s elapsed) EXIT time 0.01s ( 0.00s elapsed) Total time 5.27s ( 5.40s elapsed) %GC time 2.5% (2.6% elapsed) Alloc rate 275,917,831 bytes per MUT second Productivity 97.5% of total user, 95.2% of total elapsed real 0m5.403s user 0m5.270s sys 0m0.080s main = print ( length $ subseqs [1..10000] ) inits1: ./a.out +RTS -sstderr 50005001 3,615,973,512 bytes allocated in the heap 982,547,684 bytes copied during GC 226,180 bytes maximum residency (930 sample(s)) 13794 collections in generation 0 ( 4.42s) 930 collections in generation 1 ( 0.51s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 5.67s ( 5.57s elapsed) GC time 4.93s ( 5.23s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 10.60s ( 10.80s elapsed) %GC time 46.5% (48.4% elapsed) Alloc rate 637,737,832 bytes per MUT second Productivity 53.5% of total user, 52.5% of total elapsed real 0m10.804s user 0m10.604s sys 0m0.082s inits2: ./a.out +RTS -sstderr 50005001 3,011,192,612 bytes allocated in the heap 1,162,573,012 bytes copied during GC 334,036 bytes maximum residency (939 sample(s)) 11487 collections in generation 0 ( 3.71s) 939 collections in generation 1 ( 1.02s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.70s ( 4.90s elapsed) GC time 4.73s ( 4.71s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 9.43s ( 9.61s elapsed) %GC time 50.2% (49.0% elapsed) Alloc rate 640,679,279 bytes per MUT second Productivity 49.8% of total user, 48.9% of total elapsed real 0m9.612s user 0m9.438s sys 0m0.114s inits3: ./a.out +RTS -sstderr 50005001 3,011,195,876 bytes allocated in the heap 1,163,335,136 bytes copied during GC 331,672 bytes maximum residency (939 sample(s)) 11487 collections in generation 0 ( 3.66s) 939 collections in generation 1 ( 0.83s) 2 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.92s ( 4.84s elapsed) GC time 4.49s ( 4.77s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 9.41s ( 9.61s elapsed) %GC time 47.7% (49.6% elapsed) Alloc rate 612,031,682 bytes per MUT second Productivity 52.3% of total user, 51.2% of total elapsed real 0m9.610s user 0m9.416s sys 0m0.130s ghci *Main> print (sum $ map sum $ inits1 [1..5000]) 20845835000 (3.12 secs, 603772288 bytes) *Main> print (sum $ map sum $ inits2 [1..5000]) 20845835000 (5.74 secs, 602081548 bytes) *Main> print (sum $ map sum $ inits3 [1..5000]) 20845835000 (2.48 secs, 451953816 bytes) *Main> print (length $ map sum $ inits1 [1..10000000]) 10000001 (6.18 secs, 1928079404 bytes) *Main> print (length $ map sum $ inits2 [1..10000000]) 10000001 (15.44 secs, 1485182412 bytes) *Main> print (length $ map sum $ inits3 [1..10000000]) 10000001 (4.30 secs, 1526992312 bytes) *Main> print (length $ inits1 [1..10000000]) 10000001 (7.74 secs, 1563914564 bytes) *Main> print (length $ inits2 [1..10000000]) 10000001 (15.15 secs, 1123320716 bytes) *Main> print (length $ inits3 [1..10000000]) 10000001 (3.92 secs, 1162444200 bytes) *Main> print (or $ concat $ inits1 $ ((replicate 20000 False) ++[True] ++ [] ) ) True (17.64 secs, 9641524032 bytes) *Main> print (or $ concat $ inits2 $ ((replicate 20000 False) ++[True] ++ [] ) ) True (53.36 secs, 9630370716 bytes) *Main> print (or $ concat $ inits3 $ ((replicate 20000 False) ++[True] ++ [] ) ) True (10.42 secs, 7225509420 bytes)

... and the conclusion is? I quite like Ross's version inits = map reverse . scanl (flip (:)) [] and it wins at least on one of the tests. Cheers, Simon

On Mon, Apr 10, 2006 at 06:00:59PM +0100, Chris Kuklewicz wrote:
Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like
inits = map ($ []) . scanl (.) id . map (:)
That takes 3 times longer than the helper function definition.
Sorry, I meant it appeals to my perverse sense of aesthetics. It is of course the de-fused version of helper.
inits = map reverse . scanl (flip (:)) []
I would never try "reverse" when looking for performance, but that runs at the same speed as the helper and allocates the same amount of space.
It's not really surprising: the nested composition built by helper is essentially a list, which is traversed by ($ []). If scanl were defined using build it might run a tiny bit faster.

ross:
On Mon, Apr 10, 2006 at 06:00:59PM +0100, Chris Kuklewicz wrote:
Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
I rather like
inits = map ($ []) . scanl (.) id . map (:)
That takes 3 times longer than the helper function definition.
Sorry, I meant it appeals to my perverse sense of aesthetics. It is of course the de-fused version of helper.
inits = map reverse . scanl (flip (:)) []
I would never try "reverse" when looking for performance, but that runs at the same speed as the helper and allocates the same amount of space.
It's not really surprising: the nested composition built by helper is essentially a list, which is traversed by ($ []). If scanl were defined using build it might run a tiny bit faster.
Ah! Looks like a bit like DList code, from a lib by Manuel Chakravarty: -- | a difference list is a function that given a list returns the original -- contents of the difference list prepended at the given list type DList a = [a] -> [a] -- | open a list for use as a difference list openDL :: [a] -> DList a openDL = (++) -- | create a difference list containing no elements zeroDL :: DList a zeroDL = id -- | create difference list with given single element unitDL :: a -> DList a unitDL = (:) -- | append a single element at a difference list snocDL :: DList a -> a -> DList a snocDL dl x = \l -> dl (x:l) -- | appending difference lists joinDL :: DList a -> DList a -> DList a joinDL = (.) -- | closing a difference list into a normal list closeDL :: DList a -> [a] closeDL = ($[]) Which I've used on occasion to write faster code when doing lots of appends. -- Don

Ross Paterson wrote:
On Mon, Apr 10, 2006 at 03:54:09PM +0100, Chris Kuklewicz wrote:
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
inits = map reverse . scanl (flip (:)) []
It's not really surprising: the nested composition built by helper is essentially a list, which is traversed by ($ []). If scanl were defined using build it might run a tiny bit faster.
an alternative helper (are expanding scanl) would be: inits = helper [] where helper l xs = reverse l : case xs of [] -> [] x : r -> helper (x : l) r C.

Hello Ross, Monday, April 10, 2006, 8:25:53 PM, you wrote:
I rather like
inits = map ($ []) . scanl (.) id . map (:)
but this is also competitive:
inits = map reverse . scanl (flip (:)) []
it seems that the number of `inits` implementations is close to that's of `fac` :) Haskeller, are you wrote you own version of inits?! -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Mon, 10 Apr 2006, Chris Kuklewicz
If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
This function looks like it's exactly equivalent to the H98 one, but I haven't checked the details. Furthermore, this definition made me think of a flaw in many of the others: they won't work for infinite lists. (Note that length ([1..] :: [Int]) = maxBound :: Int.) -- /NAD

Nils Anders Danielsson wrote:
On Mon, 10 Apr 2006, Chris Kuklewicz
wrote: If the goal is speed, then this definition is running over 10% faster with ghc -O2 on my powerbook for (sum $ map length $ inits [1..10000])
inits' = helper id where helper f [] = (f []):[] helper f (x:xs) = (f []):helper (f.(x:)) xs
This function looks like it's exactly equivalent to the H98 one, but I haven't checked the details.
The Haskell 98 Report http://www.haskell.org/onlinereport/list.html gives:
-- inits xs returns the list of initial segments of xs, shortest first. -- e.g., inits "abc" == ["","a","ab","abc"] inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
Which, using ghc-6.4.2 -O2, runs the benchmark operation that I am using (sum $ map length $ inits [1..10000]) in 50.9 seconds. (Just as fast as GHC's Data.List.init, so I expect that GHC uses the code in the report). The actual timing output:
pamac-cek10:/tmp chrisk$ time ./i6 50005000
real 0m56.197s user 0m49.856s sys 0m1.003s
The helper code runs in 5.2 seconds:
pamac-cek10:/tmp chrisk$ time ./i2 50005000
real 0m5.750s user 0m5.114s sys 0m0.111s
Running with +RTS -sstderr -RTS shows the code in the report is allocating exactly 4 times as much space on the heaps. The speed does not change if I use main = print $ sum $ map length $ inits2 (take 10000 [True,True ..])
Furthermore, this definition made me think of a flaw in many of the others: they won't work for infinite lists. (Note that length ([1..] :: [Int]) = maxBound :: Int.)
Subtle problem. But [1..] defaults to Integer which is "infinite" enough. The original example will need to use genericTake . . .

On 2006-04-10, Simon Marlow
Spencer Janssen wrote:
Earlier today on the #haskell IRC channel, Tim Toorop (bolrod on #haskell) pointed out that Data.List.inits is rather slow, and proposed an alternative. After some collabrative tweaking, we came up with the following:
inits xs = [] : (zipWith take [1..] $ map (const xs) xs)
I propose to replace inits in Data.List with this one.
Woo! -- Aaron Denney -><-
participants (15)
-
Aaron Denney
-
Bulat Ziganshin
-
Chris Kuklewicz
-
Christian Maeder
-
dons@cse.unsw.edu.au
-
Jan-Willem Maessen
-
Josef Svenningsson
-
Malcolm Wallace
-
Nils Anders Danielsson
-
Ross Paterson
-
Sebastian Sylvan
-
Simon Marlow
-
Spencer Janssen
-
Tim Toorop
-
Udo Stenzel