Bryan
It’s good news that the HEAD is better.
To be honest I’m not terribly enthusiastic about trying to nail
down exactly what’s happening in 6.10 and 6.12 because, although they are
indeed the compilers people will be using, it’s otherwise wasted work because
the HEAD is so different.
Can you try with 6.12 and see if you can find a recipe that does
well enough? If you get desperate (ie there’s a huge perf bump that you can’t
eliminate) then I’ll certainly try to help.
Meanwhile, I don’t know why 6.10 is faster than HEAD (by 25%
too) and I’d like to understand that. Can you submit a Trac ticket saying how
to reproduce? You might need to bundle up the library too, to make sure we can
reproduce it precisely.
Thanks
Simon
From: Bryan O'Sullivan
[mailto:bos@serpentine.com]
Sent: 17 November 2009 07:14
To: Simon Peyton-Jones
Cc: glasgow-haskell-users@haskell.org
Subject: Re: Inliner behaviour - tiny changes lead to huge performance
differences
On Fri, Nov 13, 2009 at 12:26 AM, Simon Peyton-Jones <simonpj@microsoft.com>
wrote:
My goal is for INLINE pragmas to be very predictable. I can't decode your message enough to offer any insights; thank you Roman, who is closer to it, for helping.
Things are considerably different with HEAD than with 6.10.4.
HEAD is indeed spotting and exploiting many of the opportunities for inlining,
while 6.10.4 is a bit of a morass. The difference is stark: my test program
runs in 0.7 seconds with HEAD, and 1.2 with 6.10.4.
Here's a rough table of my results:
6.10.4 8.39
seconds
HEAD
0.50
HEAD*
0.50
6.10.4*
0.39
6.10.4**
0.34
The asterisk above denotes the removal of a single INLINE
pragma from the text library.
The doubled asterisk denotes the removal of a piece of
indirection: instead of length defined as lengthI and both marked as INLINE, I
manually inlined lengthI into the body of length.
For your amusement, GNU "wc -m" takes 1.1 seconds
to count the number of Unicode characters in the same file, so I think that our
combination of performance and brevity is wonderful. Thanks!
So HEAD is far better than 6.10.4 (yay!), but a little
tweaking of the library code makes the 6.10.4 code faster again (boo!). The
HEAD inliner seems, as you hoped, to be behaving far more predictably than its
predecessor.
If you'd like to investigate the remaining performance
discrepancy between 6.10.4 and HEAD, I'll create a Trac ticket with
instructions on how to reproduce my numbers.
In the time between now and the release of 6.14, I wonder
what to do. I'm building 6.12 to see how it fares, but my experience with 6.10
so far suggests that the behaviour of the 6.12 inliner will be fragile and
difficult to understand, which is a bit of a shame. On that older code base, it
seems that I can get really good fused performance, or okay unfused
performance, but not both.