
I did a little bit of Data.Text benchmarking the other day, and I was shocked to find that decoding a UTF-8 stream proceeded at a sedate 3MB/sec. On investigation, the culprit was that I'd marked both the outer function and the inner (a loop) as INLINE. Because the inner loop was marked as INLINE, GHC wasn't inlining critically important leaf functions into it, so I was getting clobbered to death by boxing and unboxing. Leaving the outer function marked as INLINE, but taking the INLINE off the inner function, seems to cause *both* to get inlined as I originally hoped. This behaviour is all rather mysterious to me. The old "Secrets of the Inliner" paper is very much out of date now, but short of reading the source, I don't know where else to look to turn my voodoo folk intuition into something more solid. Is this all going to change in 6.12 anyway?