
simons:
Duncan Coutts writes:
| As I recall from when I profiled this for the ByteString paper | [...], the slowdown compared to a simple hGetBuf 'cat' was all | down to cache locality, because we're cycling between a range | of buffers rather than a single cache-hot buffer.
I believe you are right. The following implementation performs just fine:
import System.IO import qualified Data.ByteString.Base as Str import qualified Data.ByteString as Str import Data.ByteString ( ByteString )
bufsize :: Int bufsize = 4 * 1024
hGet :: Handle -> ByteString -> IO ByteString hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) return (Str.unsafeTake i buf)
catString :: Handle -> Handle -> IO () catString hIn hOut = Str.create bufsize (\_ -> return ()) >>= input where input buf = hGet hIn buf >>= output buf output buf b | Str.null b = return () | otherwise = Str.hPut hOut b >> input buf
main :: IO () main = do mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] catString stdin stdout
time /bin/cat
/dev/null real 0m2.093s user 0m0.024s sys 0m2.068s
time ./cat-bytestring
/dev/null real 0m2.753s user 0m0.568s sys 0m2.184s
That's a useful benchmark. Thanks for looking into this. -- Don