Let's see if I understand this correctly. In your code, decodeUtf8 calls streamUtf8. They both get inlined into main but then unsafeChr8 does not. Correct?
Here's what I see in the simplifer output with 6.10.4: the unoptimised body of streamUtf8 is being inlined into main, with many out-of-line functions called in its inner loop, then length is out-of-line applied to that result.
I somehow stumbled on the idea of removing the INLINE annotation from next, and performance suddenly improved by a significant integer multiple. This caused the body of streamUtf8 to be inlined into my test program, as I hoped.
Or are you saying that it's streamUtf8 that isn't getting inlined into main?
When I trimmed that INLINE out, the body of streamUtf8 was being inlined, but differently: all of the functions it had been calling out-of-line were now inlined.
Does changing the definition of length to
length = id S.lengthI
help? GHC used to have a bug in this area but I haven't been bitten by it for quite some time.
That change makes no real difference. It changes the function called at that call site, but it's still out-of-line.
Also, I wonder how Stream.stream is defined. Is it strict in Text? If it isn't, does making it strict help?
It is strict in Text, yes.
If you have a spare minute, perhaps you could try the HEAD with the new inliner and see if that helps? Although I somewhat doubt it, to be honest.
I posted those numbers in a reply to Simon a little while ago. HEAD is generally much better than 6.10, which is great, but I'm still stuck with this mystery on versions of the compiler that people may actually be able to use :-\