ByteString I/O Performance

29 Aug 2007

      ...
import System.IO
import Foreign ( allocaBytes )
import qualified Data.ByteString as Str
...
bufsize :: Int
bufsize = 4 * 1024
In order to determine I/O performance, a random 512 MB file is copied
from standard input to standard output. All test programs have been
compiled with GHC 6.6.1 using "-O2 -funbox-strict-fields" for
optimization. The time to beat for this test comes from /bin/cat:

  $ dd if=/dev/urandom of=test.data bs=1M count=512
  $ time /bin/cat /dev/null

  real          0m2.097s        0m2.135s        0m2.100s
  user          0m0.036s        0m0.028s        0m0.024s
  sys           0m2.060s        0m2.108s        0m2.076s

The first entry is implemented using static buffer I/O:
...
catBuf :: Handle -> Handle -> IO ()
catBuf hIn hOut = allocaBytes bufsize input
  where
  input ptr    = hGetBuf hIn ptr bufsize >>= output ptr
  output  _  0 = return ()
  output ptr n = hPutBuf hOut ptr n >> input ptr
real          0m2.747s        0m2.737s        0m2.758s
  user          0m0.524s        0m0.416s	0m0.632s
  sys           0m2.224s        0m2.304s	0m2.124s

The second entry is implemented with ByteString:
...
catString :: Handle -> Handle -> IO ()
catString hIn hOut = Str.hGet hIn bufsize >>= loop
  where
  loop buf | Str.null buf = return ()
           | otherwise    = Str.hPut hOut buf >> catString hIn hOut
real          0m7.852s	0m7.817s	0m7.887s
  user          0m4.764s	0m4.800s	0m4.748s
  sys           0m3.080s	0m3.000s	0m3.108s

When Data.ByteString.Char8 is used instead, the program produces almost
identical results. Data.ByteString.Lazy, however, came out differently:

  real          0m8.184s	0m8.086s        0m8.067s
  user          0m5.104s	0m5.252s	0m4.948s
  sys           0m2.940s	0m2.808s	0m3.120s

ByteString turns out to be more than two times slower than ordinary
buffer I/O. This result comes as a surprise because ByteString _is_ an
ordinary memory buffer, so it feels reasonable to expected it to perform
about the same. The reason why ByteString cannot compete with hGetBuf
appears to be Data.ByteString.Base.createAndTrim. That function
allocates space with malloc(), reads data into that buffer, allocates a
new buffer, and then copies the data it has just read from the old
buffer into the new one before returning it. This approach is quite
inefficient for reading large amounts of data.

It is particularly odd that Data.ByteString.readFile relies on the same
mechanism. The required buffer size is known in advance. There is no
point in reading data into a temporary buffer. I may have misread the
implementation, but my impression is that readFile currently requires
2*n bytes of memory to read a file of size n.

It feels like there is plenty of room for optimization. :-)
...
main :: IO ()
main = do
  mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ]
  catString stdin stdout

ByteString I/O Performance

Peter Simons