
Dear list, I am puzzled by the behaviour of this (stripped-down, uglified) program. It is supposed to run a bunch of shell commands simultaneously, collate their standard output and error, and print their output as though they had run sequentially. module Main where import Control.Concurrent import Data.Maybe import GHC.Handle import GHC.IO import Posix (popen) main = do cmds <- getContents >>= return.lines outCh <- newChan mainThr <- myThreadId forkIO $ do mapM_ (startCommand outCh) cmds writeChan outCh $ killThread mainThr getChanContents outCh >>= sequence_ startCommand :: Chan (IO ()) -> String -> IO () startCommand ch cmd = do -- Prevent lazy reads after forking length cmd `seq` return () (out,err,pid) <- popen "/bin/sh" ["-c",cmd] Nothing -- Prevent deadlock (waiting to read stdout -- while the child waits to write stderr). forkIO $ length err `seq` return () -- culprit line? writeChan ch $ hPutStr stdout out writeChan ch $ hFlush stdout writeChan ch $ hPutStr stderr err writeChan ch $ hFlush stderr Most of the time it works but every so often a chunk of child output gets lost. (I'm sure I've seen it duplicating chunks of output too, but I haven't been able to isolate a test case.) Corruption of stderr seems more common than corruption of stdout. The corruption doesn't get more predictable if I generate a list of commands with fixed 'random' sleeps and use that (so I don't think it's a gross timing issue). The lost data is not necessarily a prefix of the output string, nor a suffix. The lost data does not reappear later in the program's output. The lost data *does* appear to correspond to byte sequences that are written in a single write() by the child process. Example: $ cat test.sh #!/bin/sh echo -n "std" ; sleep 1 echo -n "STD" >&2 ; sleep 1 echo "out" ; sleep 1 echo "ERR" >&2 ; sleep 1 $ for i in `seq 100`; do echo "sleep $((RANDOM % 10)); ./test.sh"; done | ./a.out <snip> stdout STDERR stdout ERR stdout STDERR <snip> stdout STDERR stdout STDstdout STDERR stdout STDERR <snip> If I comment the 'culprit line' above then the program runs without error. This confounds me: I believe that that line should have no effect except to suck the child's stderr into the program as soon as possible. (Am I wrong? And is there a better way of doing this?) Platform: ghc-6.0-7 / RedHat 8.0 / x86. Compilation: ghc -package posix test.hs Where did my output go? Have I run into some gotcha like the 'lazy reads after forking' thing? Can someone tell me what I'm doing wrong? Hopeful thanks, // David -- David Hughes UNIX sysadmin, Serco SA -+- Tel.: +41 22 767 8997 Computing Centre, CERN -+- David.W.Hughes@cern.ch This message expresses my own opinions and should not be construed as the opinions of Serco (who employ me) or of CERN (where I work).