missing optimization for (++)

newer
VerifyThis 2018: Call for...

Ben Franksen

4 Mar 2018 4 Mar '18

9:32 a.m.

I found that in base [1] we have (++) [] ys = ys (++) (x:xs) ys = x : xs ++ ys I had expected there to be a clause (++) xs [] = xs at the top, which would avoid the list traversal in this common case. Is this somehow magically taken care of by the {-# RULES "++" [~1] forall xs ys. xs ++ ys = augment (\c n -> foldr c n xs) ys #-} ? No, inlining @augment g xs = g (:) xs@ gives me (\c n -> foldr c n xs) (:) ys = foldr (:) ys xs which still traverses xs even if ys=[]. Any thoughts? [1] http://hackage.haskell.org/package/base-4.10.1.0/docs/src/GHC.Base.html#%2B%... Cheers Ben

Show replies by date

Clinton Mead

4 Mar 4 Mar

9:42 a.m.

Adding that case will require one to evaluate the second argument to check it's empty before allowing one to examine the result. Consider `x ++ some_list_that_takes_a_long_time_to_produce_its_first_element`. In this case your proposal will not be an optimisation. On Sun, Mar 4, 2018 at 8:32 PM, Ben Franksen wrote:

...

I found that in base [1] we have

(++) [] ys = ys (++) (x:xs) ys = x : xs ++ ys

I had expected there to be a clause

(++) xs [] = xs

at the top, which would avoid the list traversal in this common case.

Is this somehow magically taken care of by the

{-# RULES "++" [~1] forall xs ys. xs ++ ys = augment (\c n -> foldr c n xs) ys #-}

? No, inlining @augment g xs = g (:) xs@ gives me

(\c n -> foldr c n xs) (:) ys

= foldr (:) ys xs

which still traverses xs even if ys=[].

Any thoughts?

[1] http://hackage.haskell.org/package/base-4.10.1.0/docs/ src/GHC.Base.html#%2B%2B

Cheers Ben

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Sven Panne

5 Mar 5 Mar

8:02 a.m.

2018-03-04 10:42 GMT+01:00 Clinton Mead :

...

Adding that case will require one to evaluate the second argument to check it's empty before allowing one to examine the result.

Consider `x ++ some_list_that_takes_a_long_time_to_produce_its_first_ element`.

In the extreme, evaluating the 2nd argument might not even terminate or it could throw an exception.

...

In this case your proposal will not be an optimisation.

I would go even a step further: The proposed additional case would not just be worse for some cases, it would be completely wrong: The Prelude part of the Haskell Report specifies among other things the strictness of function arguments. Changing an argument from non-strict to strict could have very severe consequences, e.g. non-terminaton, throwing exceptions where no exceptions would be thrown in the original definition, etc. For (++), the Prelude says that it is strict in its first argument, but non-strict in its second: https://www.haskell.org/onlinereport/haskell2010/haskellch9.html#verbatim-24... Very similar functions in this respect are (&&) and (||), and I guess people would be a bit upset if they suddenly forced evaluation of their second argument... :-)

Ben Franksen

12:13 p.m.

Am 05.03.2018 um 09:02 schrieb Sven Panne:

...

2018-03-04 10:42 GMT+01:00 Clinton Mead :

...
Adding that case will require one to evaluate the second argument to check it's empty before allowing one to examine the result.

Consider `x ++ some_list_that_takes_a_long_time_to_produce_its_first_ element`.

In the extreme, evaluating the 2nd argument might not even terminate or it could throw an exception.

...
In this case your proposal will not be an optimisation.

I would go even a step further: The proposed additional case would not just be worse for some cases, it would be completely wrong: The Prelude part of the Haskell Report specifies among other things the strictness of function arguments. [...]

Okay, okay, I got it. I did not think about strictness when I asked. The funny thing is that the two fusion rules combined, as explained by Josef, seem to cause this shortcut to be taken. But that can't be true because (++) really is non-strict, I tested that, with -O2. How do you explain that? Cheers Ben

Li-yao Xia

12:40 p.m.

On 03/05/2018 07:13 AM, Ben Franksen wrote:

...

Okay, okay, I got it. I did not think about strictness when I asked. The funny thing is that the two fusion rules combined, as explained by Josef, seem to cause this shortcut to be taken. But that can't be true because (++) really is non-strict, I tested that, with -O2. How do you explain that?

Rewrite rules apply at compile time and don't force any computation. The second rule fires only if the second argument of (++) is syntactically []. Otherwise the code doesn't change, and strictness is preserved. Cheers, Li-yao

Ben Franksen

10:18 p.m.

Am 05.03.2018 um 13:40 schrieb Li-yao Xia:

...

On 03/05/2018 07:13 AM, Ben Franksen wrote:

...
Okay, okay, I got it. I did not think about strictness when I asked. The funny thing is that the two fusion rules combined, as explained by Josef, seem to cause this shortcut to be taken. But that can't be true because (++) really is non-strict, I tested that, with -O2. How do you explain that?

Rewrite rules apply at compile time and don't force any computation. The second rule fires only if the second argument of (++) is syntactically []. Otherwise the code doesn't change, and strictness is preserved.

Thanks, yet another thing learned. So let ys = [] in xs ++ ys will traverse the spine of xs but xs ++ [] will not. Interesting. (But who writes something like "xs ++ []" in a real program?) Cheers Ben

Brandon Allbery

10:24 p.m.

Generated code, sometimes including "deriving"-generated code. On Mon, Mar 5, 2018 at 5:18 PM, Ben Franksen wrote:

...

Am 05.03.2018 um 13:40 schrieb Li-yao Xia:

...
On 03/05/2018 07:13 AM, Ben Franksen wrote:

...
Okay, okay, I got it. I did not think about strictness when I asked. The funny thing is that the two fusion rules combined, as explained by Josef, seem to cause this shortcut to be taken. But that can't be true because (++) really is non-strict, I tested that, with -O2. How do you explain that?

Rewrite rules apply at compile time and don't force any computation. The second rule fires only if the second argument of (++) is syntactically []. Otherwise the code doesn't change, and strictness is preserved.

Thanks, yet another thing learned. So

let ys = [] in xs ++ ys

will traverse the spine of xs but

xs ++ []

will not. Interesting.

(But who writes something like "xs ++ []" in a real program?)

Cheers Ben

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Andrew Martin

10:25 p.m.

Actually, it is likely that neither of the examples you gave will end up traversing the spine of the list. The definition of ys would almost certainly be inlined, and then the rule would fire. Sent from my iPhone

...

On Mar 5, 2018, at 5:18 PM, Ben Franksen wrote:

...
Am 05.03.2018 um 13:40 schrieb Li-yao Xia:

...
On 03/05/2018 07:13 AM, Ben Franksen wrote: Okay, okay, I got it. I did not think about strictness when I asked. The funny thing is that the two fusion rules combined, as explained by Josef, seem to cause this shortcut to be taken. But that can't be true because (++) really is non-strict, I tested that, with -O2. How do you explain that?

Rewrite rules apply at compile time and don't force any computation. The second rule fires only if the second argument of (++) is syntactically []. Otherwise the code doesn't change, and strictness is preserved.

Thanks, yet another thing learned. So

let ys = [] in xs ++ ys

will traverse the spine of xs but

xs ++ []

will not. Interesting.

(But who writes something like "xs ++ []" in a real program?)

Cheers Ben

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Ben

11:17 p.m.

On 6 March 2018 9:25:43 am AEDT, Andrew Martin wrote:

...

Actually, it is likely that neither of the examples you gave will end up traversing the spine of the list. The definition of ys would almost certainly be inlined, and then the rule would fire.

Sent from my iPhone

...
On Mar 5, 2018, at 5:18 PM, Ben Franksen wrote:

...
...
On 03/05/2018 07:13 AM, Ben Franksen wrote: Okay, okay, I got it. I did not think about strictness when I asked. The funny thing is that the two fusion rules combined, as explained by Josef, seem to cause this shortcut to be taken. But that can't be

Am 05.03.2018 um 13:40 schrieb Li-yao Xia: true

...
because (++) really is non-strict, I tested that, with -O2. How do you explain that?

Rewrite rules apply at compile time and don't force any computation. The second rule fires only if the second argument of (++) is syntactically []. Otherwise the code doesn't change, and strictness is preserved.

Thanks, yet another thing learned. So

let ys = [] in xs ++ ys

will traverse the spine of xs but

xs ++ []

will not. Interesting.

(But who writes something like "xs ++ []" in a real program?)

Cheers Ben

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

For determining whether xs is traversed, it doesn't really matter whether or not the rule fires and replaces zs = xs ++ ys with just zs = xs. In either case if you traverse the spine of zs to a given depth, it'll force the spine of xs to the same depth. Just having xs ++ ys in the code doesn't traverse the spine of xs, and even evaluating it to WHNF only evaluates xs to the same degree as evaluating xs to WHNF directly. The question is whether it's allocating a new list spine as it goes on top of evaluating the spine of xs, but there shouldn't be any impact on what is evaluated when.

Viktor Dukhovni

4 Mar 4 Mar

8:23 p.m.

...

On Mar 4, 2018, at 4:32 AM, Ben Franksen wrote:

I found that in base [1] we have

(++) [] ys = ys (++) (x:xs) ys = x : xs ++ ys

I had expected there to be a clause

(++) xs [] = xs

at the top, which would avoid the list traversal in this common case.

[...]

which still traverses xs even if ys=[].

Any thoughts?

Haskell is lazy, the traversal of x happens only when the combined list is actually traversed, and only for as many elements as requested, so the construction of the concatenated list carries little additional cost. And as pointed out, needlessly forcing the first element of y can have unintended consequences. For example: GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help Prelude> let x = [1..] Prelude> let y = [1..] Prelude> let z = x ++ y Prelude> take 20 z [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] -- Viktor.

Josef Svenningsson

9:40 p.m.

The rule "foldr/id" will replace 'foldr (:) [] xs' with 'xs'. Cheers, Josef On Sun, Mar 4, 2018 at 10:34 AM Ben Franksen wrote:

...

I found that in base [1] we have

(++) [] ys = ys (++) (x:xs) ys = x : xs ++ ys

I had expected there to be a clause

(++) xs [] = xs

at the top, which would avoid the list traversal in this common case.

Is this somehow magically taken care of by the

{-# RULES "++" [~1] forall xs ys. xs ++ ys = augment (\c n -> foldr c n xs) ys #-}

? No, inlining @augment g xs = g (:) xs@ gives me

(\c n -> foldr c n xs) (:) ys

= foldr (:) ys xs

which still traverses xs even if ys=[].

Any thoughts?

[1]

http://hackage.haskell.org/package/base-4.10.1.0/docs/src/GHC.Base.html#%2B%...

Cheers Ben

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Ben Franksen

10:32 p.m.

Am 04.03.2018 um 22:40 schrieb Josef Svenningsson:

...

The rule "foldr/id" will replace 'foldr (:) [] xs' with 'xs'.

Ah, thanks a lot. That was the missing piece. And good to know, too, this means I can avoid a number of case distinctions I thought I needed for optimum performance. Cheers Ben

2681

Age (days ago)

2682

Last active (days ago)

List overview

Download

11 comments

9 participants

participants (9)

Andrew Martin
Ben
Ben Franksen
Brandon Allbery
Clinton Mead
Josef Svenningsson
Li-yao Xia
Sven Panne
Viktor Dukhovni

missing optimization for (++)

Ben

tags

participants (9)