I/O performance drop in ghc 6.12.1

Hi, I just updated to GHC 6.12.1, and I noticed a significant drop in I/O performance that I can't explain. The following code is a simple re-implementation of cat(1), i.e. it just echos all data from standard input to standard output:
module Main ( main ) where
import System.IO import Foreign ( allocaBytes )
bufsize :: Int bufsize = 4 * 1024
catBuf :: Handle -> Handle -> IO () catBuf hIn hOut = allocaBytes bufsize input where input ptr = hGetBuf hIn ptr bufsize >>= output ptr output _ 0 = return () output ptr n = hPutBuf hOut ptr n >> input ptr
main :: IO () main = do mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] catBuf stdin stdout
That program used to have exactly the same performance as /bin/cat, but
now it no longer does:
| $ dd if=/dev/urandom of=test.data bs=1M count=512
|
| $ time /bin/cat

On Jan 14, 2010, at 17:30 , Peter Simons wrote:
I just updated to GHC 6.12.1, and I noticed a significant drop in I/O performance that I can't explain. The following code is a simple re-implementation of cat(1), i.e. it just echos all data from standard input to standard output:
GHC 6.12.1 has the first release of UTF-8 support, so there's translation overhead. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Thu, Jan 14, 2010 at 2:30 PM, Peter Simons
I just updated to GHC 6.12.1, and I noticed a significant drop in I/O performance that I can't explain.
This is probably brought about by the new Unicode I/O support in 6.12. Your file isn't open in binary mode, so it's probably getting translated from something like UTF-8 before it reaches you. Might want to compare the two. I'm a little surprised by the magnitude of the difference; I might have expected it to be 33%, not 400%.

On Thu, Jan 14, 2010 at 11:38 PM, Bryan O'Sullivan
On Thu, Jan 14, 2010 at 2:30 PM, Peter Simons
wrote: I just updated to GHC 6.12.1, and I noticed a significant drop in I/O performance that I can't explain.
This is probably brought about by the new Unicode I/O support in 6.12. Your file isn't open in binary mode, so it's probably getting translated from something like UTF-8 before it reaches you. Might want to compare the two. I'm a little surprised by the magnitude of the difference; I might have expected it to be 33%, not 400%.
Hold on, he's using hGetBuf/hPutBuf. Although I'd suggest wrapping that in bytestrings.. the point is, those functions are documented to ignore encoding and always use binary I/O. There shouldn't be a difference at all. I wonder if the difference goes away if the handle is explicitly set to binary? It shouldn't, but then again it shouldn't exist in the first place. -- Svein Ove Aas

Hi Svein,
Hold on, he's using hGetBuf/hPutBuf.
exactly, that's what I was thinking. When a program requests that 'n' bytes ought to be read into memory at the location designated by the given 'Ptr Word8', how could GHC possibly do any encoding or decoding? That API doesn't allow for multi-byte characters. I would assume that hGetBuf/hPutBuf are the equivalent to POSIX read() and write()?
I wonder if the difference goes away if the handle is explicitly set to binary?
I added an
mapM_ (\h -> hSetBinaryMode h True) [ stdin, stdout ]
to 'main', and it does seem to improve performance a little, but it's
still quite a bit slower than /bin/cat:
| $ time /bin/cat
participants (4)
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Peter Simons
-
Svein Ove Aas