Turning on -funbox-small-strict-fields by default in GHC 7.8

Hi, I like to discuss enabling -funbox-small-strict-fields by default in GHC 7.8. First, a short description of the flag and why I added it. *What does the flag do?* The flag causes all strict, pointer-sized or smaller* fields to be unpacked, as if you preceded them by an UNPACK pragma. *Why does the flag exist?* The flag doesn't allow you to express anything you couldn't already express using UNPACK/-funbox-strict-fields. The purpose of the flag is twofold: - Allow for less clutter in source code. If you look at code tuned for performance, every single small field has an UNPACK pragma (e.g. see bytestring, text, containers, attoparsec, binary, etc). The extra UNPACK pragmas makes the data type declaration harder to read. - To give better performance by default for beginner and intermediate Haskellers. Almost any performance problem I diagnoes (e.g. on StackOverflow) involve telling the person to use UNPACK (and make the field strict). I'd like to only tell them to make the field strict and let compiler deal with the UNPACK. *How do we turn the flag on by default?* Before we turn the flag on by default, we should convince ourselves that it won't hurt performance (e.g. by causing re-boxing when an unpacked field is passed to a non-strict function). Here's my plan: - Look at all data declarations in some set of core libraries (e.g. the ones mentioned above) and see if there are any strict but not unpacked small fields in there. The presence of such fields suggests that the author decided that unpacking was not beneficial there. *Done:* there are no such fields in bytestring, text, binary, containers, or attoparsec. - Run the nofib suite, which now includes some code with strict fields from the language shootout suite, as a sanity check. - Benchmark some large program that isn't carefully tuned by using strictness in just the right places (like our core libraries tend to be), to make sure this change doesn't hurt performance there. I nomiate GHC as the candidate program for this test. Aside: If I recall correctly John Meacham said that JHC has always used this optimization. Does this sound like a reasonable plan? Does anyone have any input on whether this change makes sense? Simon? P.S. I know how to run nofib with and without the flag. How can I benchmark the impact of the flag on building GHC? Does anyone have a step-by-step recipe for using GHC as a benchmark? * This also includes Double, Int64, and Word64 on 32-bit platforms so the program space usage and performance doesn't change dramatically when switching platforms.

Sounds like a good plan to me -- thank you! Simon From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Johan Tibell Sent: 25 April 2013 17:56 To: ghc-devs@haskell.org Subject: Turning on -funbox-small-strict-fields by default in GHC 7.8 Hi, I like to discuss enabling -funbox-small-strict-fields by default in GHC 7.8. First, a short description of the flag and why I added it. What does the flag do? The flag causes all strict, pointer-sized or smaller* fields to be unpacked, as if you preceded them by an UNPACK pragma. Why does the flag exist? The flag doesn't allow you to express anything you couldn't already express using UNPACK/-funbox-strict-fields. The purpose of the flag is twofold: * Allow for less clutter in source code. If you look at code tuned for performance, every single small field has an UNPACK pragma (e.g. see bytestring, text, containers, attoparsec, binary, etc). The extra UNPACK pragmas makes the data type declaration harder to read. * To give better performance by default for beginner and intermediate Haskellers. Almost any performance problem I diagnoes (e.g. on StackOverflow) involve telling the person to use UNPACK (and make the field strict). I'd like to only tell them to make the field strict and let compiler deal with the UNPACK. How do we turn the flag on by default? Before we turn the flag on by default, we should convince ourselves that it won't hurt performance (e.g. by causing re-boxing when an unpacked field is passed to a non-strict function). Here's my plan: * Look at all data declarations in some set of core libraries (e.g. the ones mentioned above) and see if there are any strict but not unpacked small fields in there. The presence of such fields suggests that the author decided that unpacking was not beneficial there. Done: there are no such fields in bytestring, text, binary, containers, or attoparsec. * Run the nofib suite, which now includes some code with strict fields from the language shootout suite, as a sanity check. * Benchmark some large program that isn't carefully tuned by using strictness in just the right places (like our core libraries tend to be), to make sure this change doesn't hurt performance there. I nomiate GHC as the candidate program for this test. Aside: If I recall correctly John Meacham said that JHC has always used this optimization. Does this sound like a reasonable plan? Does anyone have any input on whether this change makes sense? Simon? P.S. I know how to run nofib with and without the flag. How can I benchmark the impact of the flag on building GHC? Does anyone have a step-by-step recipe for using GHC as a benchmark? * This also includes Double, Int64, and Word64 on 32-bit platforms so the program space usage and performance doesn't change dramatically when switching platforms.

SGTM. Actually, I think actually it should be quite simple to argue that there is no way for this change to degrade performance. Cheers, Edward Excerpts from Johan Tibell's message of Thu Apr 25 09:56:03 -0700 2013:
Hi,
I like to discuss enabling -funbox-small-strict-fields by default in GHC 7.8. First, a short description of the flag and why I added it.
*What does the flag do?* The flag causes all strict, pointer-sized or smaller* fields to be unpacked, as if you preceded them by an UNPACK pragma.
*Why does the flag exist?* The flag doesn't allow you to express anything you couldn't already express using UNPACK/-funbox-strict-fields. The purpose of the flag is twofold:
- Allow for less clutter in source code. If you look at code tuned for performance, every single small field has an UNPACK pragma (e.g. see bytestring, text, containers, attoparsec, binary, etc). The extra UNPACK pragmas makes the data type declaration harder to read. - To give better performance by default for beginner and intermediate Haskellers. Almost any performance problem I diagnoes (e.g. on StackOverflow) involve telling the person to use UNPACK (and make the field strict). I'd like to only tell them to make the field strict and let compiler deal with the UNPACK.
*How do we turn the flag on by default?*
Before we turn the flag on by default, we should convince ourselves that it won't hurt performance (e.g. by causing re-boxing when an unpacked field is passed to a non-strict function). Here's my plan:
- Look at all data declarations in some set of core libraries (e.g. the ones mentioned above) and see if there are any strict but not unpacked small fields in there. The presence of such fields suggests that the author decided that unpacking was not beneficial there. *Done:* there are no such fields in bytestring, text, binary, containers, or attoparsec. - Run the nofib suite, which now includes some code with strict fields from the language shootout suite, as a sanity check. - Benchmark some large program that isn't carefully tuned by using strictness in just the right places (like our core libraries tend to be), to make sure this change doesn't hurt performance there. I nomiate GHC as the candidate program for this test.
Aside: If I recall correctly John Meacham said that JHC has always used this optimization.
Does this sound like a reasonable plan? Does anyone have any input on whether this change makes sense? Simon?
P.S. I know how to run nofib with and without the flag. How can I benchmark the impact of the flag on building GHC? Does anyone have a step-by-step recipe for using GHC as a benchmark?
* This also includes Double, Int64, and Word64 on 32-bit platforms so the program space usage and performance doesn't change dramatically when switching platforms.

On Thu, Apr 25, 2013 at 1:41 PM, Edward Z. Yang
SGTM. Actually, I think actually it should be quite simple to argue that there is no way for this change to degrade performance.
I think you could make up a case where it is a loss. It would boil down to repeated reboxing of a value pulled out of a strict field. In practice I think it's pretty hard. What did you have in mind? -- Johan

Reboxing would involve a two-word heap allocation. I guess it is not possible to argue that this cost would be amortized somewhere else, so you're right. Edward Excerpts from Johan Tibell's message of Thu Apr 25 16:01:49 -0700 2013:
On Thu, Apr 25, 2013 at 1:41 PM, Edward Z. Yang
wrote: SGTM. Actually, I think actually it should be quite simple to argue that there is no way for this change to degrade performance.
I think you could make up a case where it is a loss. It would boil down to repeated reboxing of a value pulled out of a strict field. In practice I think it's pretty hard. What did you have in mind?
-- Johan

On Thu, Apr 25, 2013 at 4:01 PM, Johan Tibell
I think you could make up a case where it is a loss. It would boil down to repeated reboxing of a value pulled out of a strict field. In practice I think it's pretty hard.
I've seen this in real-world code, though it is indeed rare in my experience.

On Thu, Apr 25, 2013 at 9:51 PM, Bryan O'Sullivan
On Thu, Apr 25, 2013 at 4:01 PM, Johan Tibell
wrote: I think you could make up a case where it is a loss. It would boil down to repeated reboxing of a value pulled out of a strict field. In practice I think it's pretty hard.
I've seen this in real-world code, though it is indeed rare in my experience.
If you remember where it would be great to see an example. There's also NOUNPACK nowadays so you can opt out in these specific cases.

On Thu, Apr 25, 2013 at 10:32 PM, Johan Tibell
If you remember where it would be great to see an example.
I'm afraid I don't remember. I spend a lot of time optimising things down to the last memory allocation, and this was one of those cases where I could keep heap allocation constant by unboxing an unpacked field repeatedly, whereas if I packed the field I was allocating a little on every iteration of whatever the loop was.
There's also NOUNPACK nowadays so you can opt out in these specific cases.
Oh yes, I have no problem with this becoming the default behaviour. All I was doing was confirming that it is not always a win "in the wild".

On 26/04/13 06:32, Johan Tibell wrote:
On Thu, Apr 25, 2013 at 9:51 PM, Bryan O'Sullivan
wrote: On Thu, Apr 25, 2013 at 4:01 PM, Johan Tibell
wrote: I think you could make up a case where it is a loss. It would boil down to repeated reboxing of a value pulled out of a strict field. In practice I think it's pretty hard.
I've seen this in real-world code, though it is indeed rare in my experience.
If you remember where it would be great to see an example. There's also NOUNPACK nowadays so you can opt out in these specific cases.
I'm probably a bit late to the party here, but I've seen UNPACK make things worse a lot inside GHC itself. I think it would be a good idea to measure this change on GHC, if you haven't already. One particular example I remember recently was that when trying to make the code generator faster I tried unpacking various fields in the Cmm data type (and its children). Some of them resulted in improvements, and I've left those in, while others made things worse. In some cases it made runtime worse but heap residency better, as you might expect because the representation is smaller. Cheers, Simon

Johan Tibell wrote:
I like to discuss enabling -funbox-small-strict-fields by default in GHC 7.8.
+1 Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On Thu, Apr 25, 2013 at 9:56 AM, Johan Tibell
- Run the nofib suite, which now includes some code with strict fields from the language shootout suite, as a sanity check.
Here are the nofib results (using the benchmarks that are enabled by
default): Min -0.1% -0.0% -1.4% -2.7% -5.7% Max +0.0% +0.3% +1.8% +2.4% +0.0% Geometric Mean -0.0% +0.0% +0.2% +0.1% -0.1% I looked at the outliers and none of them have strict fields so this is just measuring noise. Aside: This is another reminder that we who promised to be performance czars need to look into more reliable benchmarks (with confidence intervals).

On Thu, Apr 25, 2013 at 9:56 AM, Johan Tibell
- Benchmark some large program that isn't carefully tuned by using strictness in just the right places (like our core libraries tend to be), to make sure this change doesn't hurt performance there. I nomiate GHC as the candidate program for this test.
Here are the nofib compile times of unmodified GHC vs GHC compiled with -funbox-strict-fields: -1 s.d. ----- -3.3% +1 s.d. ----- +2.5% Average ----- -0.4% (I compiled GHC by adding the flag to GhcStage2HcOpts and GhcLibHcOpts.) I can't tell if this is noise or not. At least the compile times seem to have gone down every so slightly. With these three tests out of the way, are people happy with me turning on the flag by default for the 7.8 release?

+1 from me.
On Fri, Apr 26, 2013 at 1:00 PM, Johan Tibell
On Thu, Apr 25, 2013 at 9:56 AM, Johan Tibell
wrote: Benchmark some large program that isn't carefully tuned by using strictness in just the right places (like our core libraries tend to be), to make sure this change doesn't hurt performance there. I nomiate GHC as the candidate program for this test.
Here are the nofib compile times of unmodified GHC vs GHC compiled with -funbox-strict-fields:
-1 s.d. ----- -3.3% +1 s.d. ----- +2.5% Average ----- -0.4%
(I compiled GHC by adding the flag to GhcStage2HcOpts and GhcLibHcOpts.)
I can't tell if this is noise or not. At least the compile times seem to have gone down every so slightly.
With these three tests out of the way, are people happy with me turning on the flag by default for the 7.8 release?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671

+1 here too :)
On Fri, Apr 26, 2013 at 2:13 PM, Austin Seipp
+1 from me.
On Thu, Apr 25, 2013 at 9:56 AM, Johan Tibell
wrote: Benchmark some large program that isn't carefully tuned by using strictness in just the right places (like our core libraries tend to
be), to
make sure this change doesn't hurt performance there. I nomiate GHC as
On Fri, Apr 26, 2013 at 1:00 PM, Johan Tibell
wrote: the candidate program for this test.
Here are the nofib compile times of unmodified GHC vs GHC compiled with -funbox-strict-fields:
-1 s.d. ----- -3.3% +1 s.d. ----- +2.5% Average ----- -0.4%
(I compiled GHC by adding the flag to GhcStage2HcOpts and GhcLibHcOpts.)
I can't tell if this is noise or not. At least the compile times seem to have gone down every so slightly.
With these three tests out of the way, are people happy with me turning on the flag by default for the 7.8 release?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On Fri, Apr 26, 2013 at 11:00 AM, Johan Tibell
With these three tests out of the way, are people happy with me turning on the flag by default for the 7.8 release?
I optimistically filed http://hackage.haskell.org/trac/ghc/ticket/7868 in case we agree. :)
participants (8)
-
Austin Seipp
-
Bryan O'Sullivan
-
Carter Schonwald
-
Edward Z. Yang
-
Erik de Castro Lopo
-
Johan Tibell
-
Simon Marlow
-
Simon Peyton-Jones