Bryan

It’s good news that the HEAD is better.

To be honest I’m not terribly enthusiastic about trying to nail down exactly what’s happening in 6.10 and 6.12 because, although they are indeed the compilers people will be using, it’s otherwise wasted work because the HEAD is so different.

Can you try with 6.12 and see if you can find a recipe that does well enough? If you get desperate (ie there’s a huge perf bump that you can’t eliminate) then I’ll certainly try to help.

Meanwhile, I don’t know why 6.10 is faster than HEAD (by 25% too) and I’d like to understand that. Can you submit a Trac ticket saying how to reproduce? You might need to bundle up the library too, to make sure we can reproduce it precisely.

Thanks

Simon

From: Bryan O'Sullivan [mailto:bos@serpentine.com]
Sent: 17 November 2009 07:14
To: Simon Peyton-Jones
Cc: glasgow-haskell-users@haskell.org
Subject: Re: Inliner behaviour - tiny changes lead to huge performance differences

On Fri, Nov 13, 2009 at 12:26 AM, Simon Peyton-Jones <simonpj@microsoft.com> wrote:

My goal is for INLINE pragmas to be very predictable. I can't decode your message enough to offer any insights; thank you Roman, who is closer to it, for helping.

Things are considerably different with HEAD than with 6.10.4. HEAD is indeed spotting and exploiting many of the opportunities for inlining, while 6.10.4 is a bit of a morass. The difference is stark: my test program runs in 0.7 seconds with HEAD, and 1.2 with 6.10.4.

Here's a rough table of my results:

6.10.4 8.39 seconds

HEAD 0.50

HEAD* 0.50

6.10.4* 0.39

6.10.4** 0.34

The asterisk above denotes the removal of a single INLINE pragma from the text library.

The doubled asterisk denotes the removal of a piece of indirection: instead of length defined as lengthI and both marked as INLINE, I manually inlined lengthI into the body of length.

For your amusement, GNU "wc -m" takes 1.1 seconds to count the number of Unicode characters in the same file, so I think that our combination of performance and brevity is wonderful. Thanks!

So HEAD is far better than 6.10.4 (yay!), but a little tweaking of the library code makes the 6.10.4 code faster again (boo!). The HEAD inliner seems, as you hoped, to be behaving far more predictably than its predecessor.

If you'd like to investigate the remaining performance discrepancy between 6.10.4 and HEAD, I'll create a Trac ticket with instructions on how to reproduce my numbers.

In the time between now and the release of 6.14, I wonder what to do. I'm building 6.12 to see how it fares, but my experience with 6.10 so far suggests that the behaviour of the 6.12 inliner will be fragile and difficult to understand, which is a bit of a shame. On that older code base, it seems that I can get really good fused performance, or okay unfused performance, but not both.