
Hi all, With foo.hs below, if I compile normally then it takes about 70 seconds to run: $ rm -f *.o *.hi foo $ ghc -cpp -Wall -O2 foo.hs -o foo $ time ./foo real 1m10.266s user 1m9.698s sys 0m0.521s If I turn up the inlining threshold then it takes only about 13 seconds: $ rm -f *.o *.hi foo $ ghc -cpp -Wall -O2 foo.hs -o foo -funfolding-use-threshold=20 $ time ./foo real 0m13.313s user 0m12.838s sys 0m0.450s However, if I copy the definition of shift from base/GHC/Word.hs then it also takes around 13 seconds: $ rm -f *.o *.hi foo $ ghc -cpp -Wall -O2 foo.hs -o foo -DCOPY -fglasgow-exts $ time ./foo real 0m13.394s user 0m12.843s sys 0m0.454s Why does it matter whether the definition is in the current file or is imported from the standard libraries? Thanks Ian module Main (main) where import Foreign.Ptr (Ptr) import Foreign.Marshal.Array (mallocArray, advancePtr) import Foreign.Storable (peek, poke) import Data.Bits ((.|.)) #ifdef COPY import Prelude hiding (Int) import GHC.Exts (shiftRL#, shiftL#) import GHC.Word (Word32(W32#)) import GHC.Base (Int(I#), narrow32Word#, negateInt#, (>=#)) #else import Data.Word (Word32) import Data.Bits (shift) #endif main :: IO () main = do p <- mallocArray 104857600 mapM_ (\_ -> foo p 104857600) [1..10 :: Int] foo :: Ptr Word32 -> Int -> IO () foo p i | p `seq` i `seq` False = undefined foo _ 0 = return () foo p n = do x <- peek p poke p (shift x (-1) .|. shift x (-2) .|. shift x (-3) .|. shift x (-4)) foo (p `advancePtr` 1) (n - 1) #ifdef COPY -- Defn from libraries/base/GHC/Word.hs shift :: Word32 -> Int -> Word32 shift (W32# x#) (I# i#) | i# >=# 0# = W32# (narrow32Word# (x# `shiftL#` i#)) | otherwise = W32# (x# `shiftRL#` negateInt# i#) #endif