
import System.IO import Foreign ( allocaBytes ) import qualified Data.ByteString as Str
bufsize :: Int bufsize = 4 * 1024
In order to determine I/O performance, a random 512 MB file is copied
from standard input to standard output. All test programs have been
compiled with GHC 6.6.1 using "-O2 -funbox-strict-fields" for
optimization. The time to beat for this test comes from /bin/cat:
$ dd if=/dev/urandom of=test.data bs=1M count=512
$ time /bin/cat
catBuf :: Handle -> Handle -> IO () catBuf hIn hOut = allocaBytes bufsize input where input ptr = hGetBuf hIn ptr bufsize >>= output ptr output _ 0 = return () output ptr n = hPutBuf hOut ptr n >> input ptr
real 0m2.747s 0m2.737s 0m2.758s user 0m0.524s 0m0.416s 0m0.632s sys 0m2.224s 0m2.304s 0m2.124s The second entry is implemented with ByteString:
catString :: Handle -> Handle -> IO () catString hIn hOut = Str.hGet hIn bufsize >>= loop where loop buf | Str.null buf = return () | otherwise = Str.hPut hOut buf >> catString hIn hOut
real 0m7.852s 0m7.817s 0m7.887s user 0m4.764s 0m4.800s 0m4.748s sys 0m3.080s 0m3.000s 0m3.108s When Data.ByteString.Char8 is used instead, the program produces almost identical results. Data.ByteString.Lazy, however, came out differently: real 0m8.184s 0m8.086s 0m8.067s user 0m5.104s 0m5.252s 0m4.948s sys 0m2.940s 0m2.808s 0m3.120s ByteString turns out to be more than two times slower than ordinary buffer I/O. This result comes as a surprise because ByteString _is_ an ordinary memory buffer, so it feels reasonable to expected it to perform about the same. The reason why ByteString cannot compete with hGetBuf appears to be Data.ByteString.Base.createAndTrim. That function allocates space with malloc(), reads data into that buffer, allocates a new buffer, and then copies the data it has just read from the old buffer into the new one before returning it. This approach is quite inefficient for reading large amounts of data. It is particularly odd that Data.ByteString.readFile relies on the same mechanism. The required buffer size is known in advance. There is no point in reading data into a temporary buffer. I may have misread the implementation, but my impression is that readFile currently requires 2*n bytes of memory to read a file of size n. It feels like there is plenty of room for optimization. :-)
main :: IO () main = do mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] catString stdin stdout