
On 13/11/2009 07:04, Bryan O'Sullivan wrote:
I'm working on measuring and improving the performance of the text library at the moment, and the very first test I tried demonstrated a piece of behaviour that I'm not completely able to understand. Actually, I'm not able to understand what's going on at all, beyond a very shallow level. All the comments below pertain to GHC 6.10.4.
The text library uses stream fusion, and I want to measure the performance of UTF-8 decoding.
The code I'm measuring is very simple:
import qualified Data.ByteString as B import Data.Text.Encoding as T import qualified Data.Text as T import System.Environment (getArgs) import Control.Monad (forM_)
main = do args <- getArgs forM_ args $ \a -> do s <- B.readFile a let t = T.decodeUtf8 s print (T.length t)
The streamUtf8 function looks roughly like this:
streamUtf8 :: OnDecodeError -> ByteString -> Stream Char streamUtf8 onErr bs = Stream next 0 (maxSize l) where l = B.length bs next i | i >= l = Done | U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1) | {- etc. -} {-# INLINE [0] streamUtf8 #-}
The values being Yielded from the inner function are, as you can see, themselves constructed by functions.
Originally, with the inner next function manually marked as INLINE, I found that functions like unsafeChr8 were not being inlined by GHC, and performance was terrible due to the amount of boxing and unboxing happening in the inner loop.
I somehow stumbled on the idea of removing the INLINE annotation from next, and performance suddenly improved by a significant integer multiple. This caused the body of streamUtf8 to be inlined into my test program, as I hoped.
I think I can explain this one, at least partially. When you mark a function INLINE, GHC does not optimise the body of the function itself, on the grounds that it will be inlined at the call site anyway and get optimised there. Simon just changed this behaviour in GHC 6.12, so that GHC now keeps the original definition for inlining, but also optimises the original function as normal, which is useful if we can't or don't want to inline at a call site for some reason (or indeed if the calling module is being compiled without -O!). What we still don't understand, however, is why streamUtf8 was not being inlined at the call site. That is the root of the problem. We'll need to look more closely at the call site to understand what's going on. Cheers, Simon (PS if any of what I said contradicts what Simon and/or Roman said, then please ignore me and not them :-)