
#13246: hPutBuf issues unnecessary empty write syscalls for large writes -------------------------------------+------------------------------------- Reporter: nh2 | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Runtime | Version: 8.0.2 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: #13245 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- To get good performance, it is better to use few system calls that write lots of data in batch. I found a bug in hPutBuf that makes this concept not work: When using `hPutBuf` with a buffer size greater than 8095 (this number it self is a bug, #13245) bytes, two syscalls are issued instead of one: one empty `write("")` (a zero-bytes-write, which can't do anything useful), and after that the actual useful `write()` of the data. Example code: {{{ main = do withBinaryFile "testfile2" WriteMode $ \ hTo -> do let bufferSize = 8096 allocaBytes bufferSize $ \buffer -> do Main.hPutBuf hTo buffer bufferSize }}} In `strace -f -T` on the compiled binary, we see the syscalls: {{{ write(3, "", 0) = 0 <0.000004> write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000017> }}} As you can see in the timings, this also has a fairly large performance overhead (20% in this case). When using `bufferSize = 8095`, the `write("")` disappears. The problem is this code for `bufWrite` (called by `hPutBuf`): {{{ bufWrite :: Handle__-> Ptr Word8 -> Int -> Bool -> IO Int bufWrite h_@Handle__{..} ptr count can_block = seq count $ do -- strictness hack old_buf@Buffer{ bufRaw=old_raw, bufR=w, bufSize=size } <- readIORef haByteBuffer -- enough room in handle buffer? hPutStrLn System.IO.stderr (show (size, w, count)) if (size - w > count) -- There's enough room in the buffer: -- just copy the data in and update bufR. then do debugIO ("hPutBuf: copying to buffer, w=" ++ show w) copyToRawBuffer old_raw w ptr count writeIORef haByteBuffer old_buf{ bufR = w + count } return count -- else, we have to flush else do debugIO "hPutBuf: flushing first" old_buf' <- Buffered.flushWriteBuffer haDevice old_buf -- TODO: we should do a non-blocking flush here writeIORef haByteBuffer old_buf' -- if we can fit in the buffer, then just loop if count < size then bufWrite h_ ptr count can_block else if can_block then do writeChunk h_ (castPtr ptr) count return count else writeChunkNonBlocking h_ (castPtr ptr) count }}} The check `if (size - w > count)` should be `if (size - w >= count)` instead, because we can do the write all fine if it fits exactly. In the adversarial case, `size - w == count`, we go into the `hPutBuf: flushing first` branch, thus emitting the `write("")`. See https://github.com/ghc/ghc/blame/876b00ba25a615423f48b0cf9d443a9fd5dbd6f4/li... for the full code. Simon Marlow has confirmed this on IRC, I'll submit a patch for it that switches to `>=`. It would be nice if the fix could be released in both GHC 8.2 and and 8.0.3. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13246 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler