Solving the containers INLINE issue

Hi, I suggest following variants for solving INLINE containers issues. This solution will be short-term only, the long-term one will probably go in the way of being able to specialize INLINABLE functions. a) remove all INLINEs - smallest code - up to 50% worse - the numbers are in one of my previous mail b) leave all INLINEs - biggest code - fastest - numbers are in one of my previous mail c) remove all INLINEs, but leave small amount of reasonable INLINEs - on functions needing Ord, ie. Map and Set - on 'small' functions only -- member, notMember, lookup*, insert*, delete*, alter*, update*, adjust* and fold* - set balance, balanceL and balanceR as NOINLINE (otherwise they get inlined in insert*, making them too big) - smaller code bloat - ghc binary is 1.4% larger than a) instead of 3.8% (which is the amount b) is larger than a)) - map-properties test is 0.4% larger instead of 126% - Map benchmark is actually 0.2% smaller (probably because of balanceL and balanceR not inlining to insert) - reasonable performance. Here is the speedup of c) against b), measured in % (+= several pct). alter 2.73 delete 3.01 difference 3.76 insert -16.40 insertLookupWithKey empty -1.60 insertLookupWithKey update -2.32 insertLookupWithKey' empty -8.39 insertLookupWithKey' update -15.75 insertWith empty -16.04 insertWith update -15.26 insertWith' empty -17.13 insertWith' update -14.67 insertWithKey empty -15.35 insertWithKey update -13.01 insertWithKey' empty -17.09 insertWithKey' update -16.81 intersection -3.46 lookup 2.73 lookupIndex 0.52 union 0.66 update 3.28 updateLookupWithKey -1.44 The only penalty is 15% loss on insert because of not inlining balanceL and balanceR. But when doing so, the bloat is nearly as in b). I vote for c) and have the patches ready. If there are no major objections, I will push the patches in several hours. (or should I do it sooner? Ian? Simon?) Milan

On 24/09/2010 12:00, Milan Straka wrote:
Hi,
I suggest following variants for solving INLINE containers issues. This solution will be short-term only, the long-term one will probably go in the way of being able to specialize INLINABLE functions.
a) remove all INLINEs - smallest code - up to 50% worse - the numbers are in one of my previous mail
b) leave all INLINEs - biggest code - fastest - numbers are in one of my previous mail
c) remove all INLINEs, but leave small amount of reasonable INLINEs - on functions needing Ord, ie. Map and Set - on 'small' functions only -- member, notMember, lookup*, insert*, delete*, alter*, update*, adjust* and fold* - set balance, balanceL and balanceR as NOINLINE (otherwise they get inlined in insert*, making them too big)
- smaller code bloat - ghc binary is 1.4% larger than a) instead of 3.8% (which is the amount b) is larger than a)) - map-properties test is 0.4% larger instead of 126% - Map benchmark is actually 0.2% smaller (probably because of balanceL and balanceR not inlining to insert)
- reasonable performance. Here is the speedup of c) against b), measured in % (+= several pct). alter 2.73 delete 3.01 difference 3.76 insert -16.40 insertLookupWithKey empty -1.60 insertLookupWithKey update -2.32 insertLookupWithKey' empty -8.39 insertLookupWithKey' update -15.75 insertWith empty -16.04 insertWith update -15.26 insertWith' empty -17.13 insertWith' update -14.67 insertWithKey empty -15.35 insertWithKey update -13.01 insertWithKey' empty -17.09 insertWithKey' update -16.81 intersection -3.46 lookup 2.73 lookupIndex 0.52 union 0.66 update 3.28 updateLookupWithKey -1.44
The only penalty is 15% loss on insert because of not inlining balanceL and balanceR. But when doing so, the bloat is nearly as in b).
I vote for c) and have the patches ready.
I suggest a slight modification of (c): for the functions on which you removed the INLINE pragmas, put INLINABLE on them. This way a client can still get the full speedup with suitable flags if they want. I'm still not really keen on having lookup inlined at every single call site, but I know it's a tradeoff and other people (maybe most people) care less about code size than I do. Cheers, Simon
If there are no major objections, I will push the patches in several hours. (or should I do it sooner? Ian? Simon?)
Milan _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hi,
I suggest following variants for solving INLINE containers issues. This solution will be short-term only, the long-term one will probably go in the way of being able to specialize INLINABLE functions.
a) remove all INLINEs - smallest code - up to 50% worse - the numbers are in one of my previous mail
b) leave all INLINEs - biggest code - fastest - numbers are in one of my previous mail
c) remove all INLINEs, but leave small amount of reasonable INLINEs - on functions needing Ord, ie. Map and Set - on 'small' functions only -- member, notMember, lookup*, insert*, delete*, alter*, update*, adjust* and fold* - set balance, balanceL and balanceR as NOINLINE (otherwise they get inlined in insert*, making them too big)
- smaller code bloat - ghc binary is 1.4% larger than a) instead of 3.8% (which is the amount b) is larger than a)) - map-properties test is 0.4% larger instead of 126% - Map benchmark is actually 0.2% smaller (probably because of balanceL and balanceR not inlining to insert)
- reasonable performance. Here is the speedup of c) against b), measured in % (+= several pct). alter 2.73 delete 3.01 difference 3.76 insert -16.40 insertLookupWithKey empty -1.60 insertLookupWithKey update -2.32 insertLookupWithKey' empty -8.39 insertLookupWithKey' update -15.75 insertWith empty -16.04 insertWith update -15.26 insertWith' empty -17.13 insertWith' update -14.67 insertWithKey empty -15.35 insertWithKey update -13.01 insertWithKey' empty -17.09 insertWithKey' update -16.81 intersection -3.46 lookup 2.73 lookupIndex 0.52 union 0.66 update 3.28 updateLookupWithKey -1.44
The only penalty is 15% loss on insert because of not inlining balanceL and balanceR. But when doing so, the bloat is nearly as in b).
I vote for c) and have the patches ready.
I suggest a slight modification of (c): for the functions on which you removed the INLINE pragmas, put INLINABLE on them. This way a client can still get the full speedup with suitable flags if they want.
I'm still not really keen on having lookup inlined at every single call site, but I know it's a tradeoff and other people (maybe most people) care less about code size than I do.
Well, if we have SPECIALISABLE in the future, we will remove the INLINE pragmas and you will be happy again :) I will eat lunch and push the patches, Milan

Can you also push this patch please? GHC 7.0 is going to warn about lambdas with incomplete patterns (if you have -fwarn-incomplete-patterns), such as \[x] -> e The Sequence module in 'containers' has a couple of occurrences, which the enclosed patch fixes. Of course there's no change in functionality, or even what code is generated. Can you apply please Simon | -----Original Message----- | From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On | Behalf Of Milan Straka | Sent: 24 September 2010 13:43 | To: Simon Marlow | Cc: libraries | Subject: Re: Solving the containers INLINE issue | | Hi, | >> I suggest following variants for solving INLINE containers issues. | >> This solution will be short-term only, the long-term one will probably | >> go in the way of being able to specialize INLINABLE functions. | >> | >> a) remove all INLINEs | >> - smallest code | >> - up to 50% worse | >> - the numbers are in one of my previous mail | >> | >> b) leave all INLINEs | >> - biggest code | >> - fastest | >> - numbers are in one of my previous mail | >> | >> c) remove all INLINEs, but leave small amount of reasonable INLINEs | >> - on functions needing Ord, ie. Map and Set | >> - on 'small' functions only -- member, notMember, lookup*, insert*, | >> delete*, alter*, update*, adjust* and fold* | >> - set balance, balanceL and balanceR as NOINLINE (otherwise they | >> get inlined in insert*, making them too big) | >> | >> - smaller code bloat | >> - ghc binary is 1.4% larger than a) instead of 3.8% (which is | >> the amount b) is larger than a)) | >> - map-properties test is 0.4% larger instead of 126% | >> - Map benchmark is actually 0.2% smaller (probably because of | >> balanceL and balanceR not inlining to insert) | >> | >> - reasonable performance. Here is the speedup of c) against b), | >> measured in % (+= several pct). | >> alter 2.73 | >> delete 3.01 | >> difference 3.76 | >> insert -16.40 | >> insertLookupWithKey empty -1.60 | >> insertLookupWithKey update -2.32 | >> insertLookupWithKey' empty -8.39 | >> insertLookupWithKey' update -15.75 | >> insertWith empty -16.04 | >> insertWith update -15.26 | >> insertWith' empty -17.13 | >> insertWith' update -14.67 | >> insertWithKey empty -15.35 | >> insertWithKey update -13.01 | >> insertWithKey' empty -17.09 | >> insertWithKey' update -16.81 | >> intersection -3.46 | >> lookup 2.73 | >> lookupIndex 0.52 | >> union 0.66 | >> update 3.28 | >> updateLookupWithKey -1.44 | >> | >> The only penalty is 15% loss on insert because of not inlining | >> balanceL and balanceR. But when doing so, the bloat is nearly as in b). | >> | >> | >> I vote for c) and have the patches ready. | > | > I suggest a slight modification of (c): for the functions on which you | > removed the INLINE pragmas, put INLINABLE on them. This way a client | > can still get the full speedup with suitable flags if they want. | > | > I'm still not really keen on having lookup inlined at every single call | > site, but I know it's a tradeoff and other people (maybe most people) | > care less about code size than I do. | | Well, if we have SPECIALISABLE in the future, we will remove the INLINE | pragmas and you will be happy again :) | | I will eat lunch and push the patches, | Milan | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries

Hi,
Can you also push this patch please?
GHC 7.0 is going to warn about lambdas with incomplete patterns (if you have -fwarn-incomplete-patterns), such as \[x] -> e
The Sequence module in 'containers' has a couple of occurrences, which the enclosed patch fixes. Of course there's no change in functionality, or even what code is generated.
Can you apply please
Yes, I will correct it and push. I will look for other warnings as well. Cheers, Milan
| -----Original Message----- | From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On | Behalf Of Milan Straka | Sent: 24 September 2010 13:43 | To: Simon Marlow | Cc: libraries | Subject: Re: Solving the containers INLINE issue | | Hi, | >> I suggest following variants for solving INLINE containers issues. | >> This solution will be short-term only, the long-term one will probably | >> go in the way of being able to specialize INLINABLE functions. | >> | >> a) remove all INLINEs | >> - smallest code | >> - up to 50% worse | >> - the numbers are in one of my previous mail | >> | >> b) leave all INLINEs | >> - biggest code | >> - fastest | >> - numbers are in one of my previous mail | >> | >> c) remove all INLINEs, but leave small amount of reasonable INLINEs | >> - on functions needing Ord, ie. Map and Set | >> - on 'small' functions only -- member, notMember, lookup*, insert*, | >> delete*, alter*, update*, adjust* and fold* | >> - set balance, balanceL and balanceR as NOINLINE (otherwise they | >> get inlined in insert*, making them too big) | >> | >> - smaller code bloat | >> - ghc binary is 1.4% larger than a) instead of 3.8% (which is | >> the amount b) is larger than a)) | >> - map-properties test is 0.4% larger instead of 126% | >> - Map benchmark is actually 0.2% smaller (probably because of | >> balanceL and balanceR not inlining to insert) | >> | >> - reasonable performance. Here is the speedup of c) against b), | >> measured in % (+= several pct). | >> alter 2.73 | >> delete 3.01 | >> difference 3.76 | >> insert -16.40 | >> insertLookupWithKey empty -1.60 | >> insertLookupWithKey update -2.32 | >> insertLookupWithKey' empty -8.39 | >> insertLookupWithKey' update -15.75 | >> insertWith empty -16.04 | >> insertWith update -15.26 | >> insertWith' empty -17.13 | >> insertWith' update -14.67 | >> insertWithKey empty -15.35 | >> insertWithKey update -13.01 | >> insertWithKey' empty -17.09 | >> insertWithKey' update -16.81 | >> intersection -3.46 | >> lookup 2.73 | >> lookupIndex 0.52 | >> union 0.66 | >> update 3.28 | >> updateLookupWithKey -1.44 | >> | >> The only penalty is 15% loss on insert because of not inlining | >> balanceL and balanceR. But when doing so, the bloat is nearly as in b). | >> | >> | >> I vote for c) and have the patches ready. | > | > I suggest a slight modification of (c): for the functions on which you | > removed the INLINE pragmas, put INLINABLE on them. This way a client | > can still get the full speedup with suitable flags if they want. | > | > I'm still not really keen on having lookup inlined at every single call | > site, but I know it's a tradeoff and other people (maybe most people) | > care less about code size than I do. | | Well, if we have SPECIALISABLE in the future, we will remove the INLINE | pragmas and you will be happy again :) | | I will eat lunch and push the patches, | Milan | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries
participants (3)
-
Milan Straka
-
Simon Marlow
-
Simon Peyton-Jones