
Hi all, With foo.hs below, if I compile normally then it takes about 70 seconds to run: $ rm -f *.o *.hi foo $ ghc -cpp -Wall -O2 foo.hs -o foo $ time ./foo real 1m10.266s user 1m9.698s sys 0m0.521s If I turn up the inlining threshold then it takes only about 13 seconds: $ rm -f *.o *.hi foo $ ghc -cpp -Wall -O2 foo.hs -o foo -funfolding-use-threshold=20 $ time ./foo real 0m13.313s user 0m12.838s sys 0m0.450s However, if I copy the definition of shift from base/GHC/Word.hs then it also takes around 13 seconds: $ rm -f *.o *.hi foo $ ghc -cpp -Wall -O2 foo.hs -o foo -DCOPY -fglasgow-exts $ time ./foo real 0m13.394s user 0m12.843s sys 0m0.454s Why does it matter whether the definition is in the current file or is imported from the standard libraries? Thanks Ian module Main (main) where import Foreign.Ptr (Ptr) import Foreign.Marshal.Array (mallocArray, advancePtr) import Foreign.Storable (peek, poke) import Data.Bits ((.|.)) #ifdef COPY import Prelude hiding (Int) import GHC.Exts (shiftRL#, shiftL#) import GHC.Word (Word32(W32#)) import GHC.Base (Int(I#), narrow32Word#, negateInt#, (>=#)) #else import Data.Word (Word32) import Data.Bits (shift) #endif main :: IO () main = do p <- mallocArray 104857600 mapM_ (\_ -> foo p 104857600) [1..10 :: Int] foo :: Ptr Word32 -> Int -> IO () foo p i | p `seq` i `seq` False = undefined foo _ 0 = return () foo p n = do x <- peek p poke p (shift x (-1) .|. shift x (-2) .|. shift x (-3) .|. shift x (-4)) foo (p `advancePtr` 1) (n - 1) #ifdef COPY -- Defn from libraries/base/GHC/Word.hs shift :: Word32 -> Int -> Word32 shift (W32# x#) (I# i#) | i# >=# 0# = W32# (narrow32Word# (x# `shiftL#` i#)) | otherwise = W32# (x# `shiftRL#` negateInt# i#) #endif

On Sun, Apr 03, 2005 at 02:38:30PM +0100, Ian Lynagh wrote:
Why does it matter whether the definition is in the current file or is imported from the standard libraries?
The issue is not whether it is imported but rather how many uses the function has. The routine declared in the standard library can be called from an arbitrary number of places so if it were to inline it everywhere it might duplicate too much code. However, your locally defined version is only used once, so ghc knows it can inline it without code duplication. However, this dramatic improvement perhaps means that ghc's default inlining parameters need to be tweaked? There are several things ghc takes into account when deciding whether to inline a function, the paper "Secrets of the GHC Inliner" is quite good. http://research.microsoft.com/~simonpj/Papers/inlining/ John -- John Meacham - ⑆repetae.net⑆john⑈
participants (2)
-
Ian Lynagh
-
John Meacham