Fwd: Default stdout buffering of child process of createProcess

I'm reposting this message because I think it only went to the google group and not the official haskell-cafe list: On Friday, August 1, 2014 10:06:32 AM UTC-4, Chris Myzie wrote:
As a workaround, I am able to trick the child haskell process into thinking it's running in an interactive terminal by wrapping it with /usr/bin/script:
createProcess (proc "/usr/bin/script" ["-qc","./A","/dev/null"]) { std_out = CreatePipe }
I still think haskell is using screwy defaults for stdout buffering..
On Thursday, July 31, 2014 3:24:47 PM UTC-4, Chris Myzie wrote:
Hello,
I'm trying to write a wrapper process that will be able to read any child process' stdout. The problem I'm running into is that unless I force the child's stdout to LineBuffering, it defaults to BlockBuffering. Is BlockBuffering really the default in this case? I don't want to have to modify all of the child processes that I want to use with this wrapper.
Below is a simple test case. A.hs is the child process, and B.hs is the wrapper. If I run B.hs, I will get no output unless I uncomment the line in A.hs.
Thanks, Chris
------------------------------ A.hs --------------------------- import Control.Concurrent import System.IO
main :: IO () main = do -- hSetBuffering stdout LineBuffering putStrLn "test" >> threadDelay 1000000 >> main
------------------------------ B.hs --------------------------- import Control.Monad import System.IO import System.Process
main :: IO () main = do (_,Just h,_,_) <- createProcess (proc "./A" []) { std_out = CreatePipe } hSetBuffering h LineBuffering forever $ hGetLine h >>= putStrLn

On Fri, Aug 1, 2014 at 11:07 PM, Chris Myzie
As a workaround, I am able to trick the child haskell process into thinking it's running in an interactive terminal by wrapping it with /usr/bin/script:
We discussed this on IRC the other day. Haskell is doing the same thing that C/C++ stdio / iostreams, and most other buffering systems, do: line buffering on terminal-like devices, block buffering on files and pipes. This is generally expected behavior; although it can be confusing to new programmers, ultimately it is more efficient for most programs. Interactive use like this, especially over pipes, is fairly unusual; normally you're just copying data around /en masse/, and block buffering is far more efficient. Note that line buffering is not and can not be implemented at the kernel level for ordinary files or pipes, so the kernel interface is actually character buffering which is extremely inefficient (at least one context switch per individual character). You might want to search for something like "buffering line block pipes files" to see quite a lot of discussion about it, in pretty much every language you can think of. By the way, more efficient than using script(1) is, as I told you in IRC, to use the mechanism it is using directly: pseudo-terminals (ptys). See http://hackage.haskell.org/package/unix-2.7.0.1/docs/System-Posix-Terminal.h... for the standard pty stuff or http://hackage.haskell.org/package/posix-pty-0.1.0/docs/System-Posix-Pty.htm... for what is claimed to be a simpler interface intended for what you are doing. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

I second pseudo-terminals. I find the System.Posix.Terminal to be very
usable (though I've admittedly never tried System.Posix.Pty). I
occasionally dabble with "bots" for terminal-based games (e.g., Nethack),
and have used PTYs with some deal of success.
I've included some code from a Nethack bot attempt from several years ago.
Hopefully you'll find it useful. As a quick usage/rationale overview, I
feed an attoparsec parser by repeatedly calling receive (intermixed with
some intelligent use of `transmit` to deal with messages and other
incomplete screen updates) until I receive a valid map. `hGetNonBlocking`
bypasses the standard buffering mechanisms and returns whatever is in the
PTY buffer (which may be nothing) rather than waiting for the specified
line/block/whatever buffer to fill. `stop` and `start` should be obvious in
their purpose, if not their implementation.
data Local = Local { pty :: Handle }
class Connection a where
transmit :: a -> ByteString -> IO ()
receive :: a -> IO ByteString
stop :: a -> IO ()
start :: IO a
instance Connection Local where
transmit l s = B.hPut (pty l) s
receive l = B.hGetNonBlocking (pty l) 4096
stop l = hClose $ pty l
start = do
(fd1, fd2) <- openPseudoTerminal
(hPty) <- fdToHandle fd1
slave <- fdToHandle fd2
_<- createProcess (proc "sh" ["-c", "/usr/games/bin/nethack"]){
std_in = (UseHandle slave), std_out = (UseHandle slave) }
return $ Local hPty
Cheers,
Elliot Robinson
Phone: (321) 252-9660
Site: www.argiopetech.com
Email: elliot.robinson@argiopetech.com
PGP Fingerprint: 0xD1E72E6A9D0610FFBBF838A6FFB5205A9FEDE59A
On Fri, Aug 1, 2014 at 11:20 PM, Brandon Allbery
On Fri, Aug 1, 2014 at 11:07 PM, Chris Myzie
wrote: As a workaround, I am able to trick the child haskell process into thinking it's running in an interactive terminal by wrapping it with /usr/bin/script:
We discussed this on IRC the other day. Haskell is doing the same thing that C/C++ stdio / iostreams, and most other buffering systems, do: line buffering on terminal-like devices, block buffering on files and pipes. This is generally expected behavior; although it can be confusing to new programmers, ultimately it is more efficient for most programs.
Interactive use like this, especially over pipes, is fairly unusual; normally you're just copying data around /en masse/, and block buffering is far more efficient. Note that line buffering is not and can not be implemented at the kernel level for ordinary files or pipes, so the kernel interface is actually character buffering which is extremely inefficient (at least one context switch per individual character).
You might want to search for something like "buffering line block pipes files" to see quite a lot of discussion about it, in pretty much every language you can think of.
By the way, more efficient than using script(1) is, as I told you in IRC, to use the mechanism it is using directly: pseudo-terminals (ptys). See http://hackage.haskell.org/package/unix-2.7.0.1/docs/System-Posix-Terminal.h... for the standard pty stuff or http://hackage.haskell.org/package/posix-pty-0.1.0/docs/System-Posix-Pty.htm... for what is claimed to be a simpler interface intended for what you are doing.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

It's worth noting that `hGetNonBlocking` only works as described with GHC on *nix. Windows and non-GHC treat it as a blocking hGet. Also, unless I'm mistaken, my example leaks the Handle for the slave PTY (though it would be trivial to fix). --- Elliot Robinson Phone: (321) 252-9660 Site: www.argiopetech.com Email: elliot.robinson@argiopetech.com PGP Fingerprint: 0xD1E72E6A9D0610FFBBF838A6FFB5205A9FEDE59A On Sat, Aug 2, 2014 at 2:32 AM, Elliot Robinson < elliot.robinson@argiopetech.com> wrote:
I second pseudo-terminals. I find the System.Posix.Terminal to be very usable (though I've admittedly never tried System.Posix.Pty). I occasionally dabble with "bots" for terminal-based games (e.g., Nethack), and have used PTYs with some deal of success.
I've included some code from a Nethack bot attempt from several years ago. Hopefully you'll find it useful. As a quick usage/rationale overview, I feed an attoparsec parser by repeatedly calling receive (intermixed with some intelligent use of `transmit` to deal with messages and other incomplete screen updates) until I receive a valid map. `hGetNonBlocking` bypasses the standard buffering mechanisms and returns whatever is in the PTY buffer (which may be nothing) rather than waiting for the specified line/block/whatever buffer to fill. `stop` and `start` should be obvious in their purpose, if not their implementation.
data Local = Local { pty :: Handle }
class Connection a where transmit :: a -> ByteString -> IO () receive :: a -> IO ByteString stop :: a -> IO () start :: IO a
instance Connection Local where transmit l s = B.hPut (pty l) s receive l = B.hGetNonBlocking (pty l) 4096 stop l = hClose $ pty l start = do (fd1, fd2) <- openPseudoTerminal (hPty) <- fdToHandle fd1 slave <- fdToHandle fd2 _<- createProcess (proc "sh" ["-c", "/usr/games/bin/nethack"]){ std_in = (UseHandle slave), std_out = (UseHandle slave) } return $ Local hPty
Cheers, Elliot Robinson Phone: (321) 252-9660 Site: www.argiopetech.com Email: elliot.robinson@argiopetech.com
PGP Fingerprint: 0xD1E72E6A9D0610FFBBF838A6FFB5205A9FEDE59A
On Fri, Aug 1, 2014 at 11:20 PM, Brandon Allbery
wrote: On Fri, Aug 1, 2014 at 11:07 PM, Chris Myzie
wrote: As a workaround, I am able to trick the child haskell process into thinking it's running in an interactive terminal by wrapping it with /usr/bin/script:
We discussed this on IRC the other day. Haskell is doing the same thing that C/C++ stdio / iostreams, and most other buffering systems, do: line buffering on terminal-like devices, block buffering on files and pipes. This is generally expected behavior; although it can be confusing to new programmers, ultimately it is more efficient for most programs.
Interactive use like this, especially over pipes, is fairly unusual; normally you're just copying data around /en masse/, and block buffering is far more efficient. Note that line buffering is not and can not be implemented at the kernel level for ordinary files or pipes, so the kernel interface is actually character buffering which is extremely inefficient (at least one context switch per individual character).
You might want to search for something like "buffering line block pipes files" to see quite a lot of discussion about it, in pretty much every language you can think of.
By the way, more efficient than using script(1) is, as I told you in IRC, to use the mechanism it is using directly: pseudo-terminals (ptys). See http://hackage.haskell.org/package/unix-2.7.0.1/docs/System-Posix-Terminal.h... for the standard pty stuff or http://hackage.haskell.org/package/posix-pty-0.1.0/docs/System-Posix-Pty.htm... for what is claimed to be a simpler interface intended for what you are doing.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

quoth Elliot Robinson
As a quick usage/rationale overview, I feed an attoparsec parser by repeatedly calling receive (intermixed with some intelligent use of `transmit` to deal with messages and other incomplete screen updates) until I receive a valid map. `hGetNonBlocking` bypasses the standard buffering mechanisms and returns whatever is in the PTY buffer (which may be nothing) rather than waiting for the specified line/block/whatever buffer to fill.
If you don't really want non-blocking I/O (i.e., normally returns 0 bytes), System.Posix.IO.ByteString.fdRead is a blocking read that returns available unbuffered data. Handles are for buffered I/O. Donn

On Sat, Aug 2, 2014 at 3:54 AM, Donn Cave
Handles are for buffered I/O.
If this is the case, why is NoBuffering provided? Why does the documentation for Handle explicitly mention non-existent and zero-length buffers? If Handles are the standard cross-platform interface to buffered file IO, what is the standard cross-platform interface to unbuffered IO (output, specifically, since input is technically always at least 1 byte buffered)? --- Elliot Robinson

On Sun, Aug 3, 2014 at 2:43 PM, Elliot Robinson < elliot.robinson@argiopetech.com> wrote:
On Sat, Aug 2, 2014 at 3:54 AM, Donn Cave
wrote: Handles are for buffered I/O.
If this is the case, why is NoBuffering provided? Why does the documentation for Handle explicitly mention non-existent and zero-length buffers? If Handles are the standard cross-platform interface to buffered file IO, what is the standard cross-platform interface to unbuffered IO (output, specifically, since input is technically always at least 1 byte buffered)?
Handles provide a non-buffered interface, but often if you truly need unbuffered I/O you will be better suited by the platform's primitive operations; for which abstractions that enable some kind of reasonably platform independent interface may well cancel out the advantages of unbuffered I/O for those cases. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Sun, Aug 3, 2014 at 3:08 PM, Brandon Allbery
On Sun, Aug 3, 2014 at 2:43 PM, Elliot Robinson < elliot.robinson@argiopetech.com> wrote:
On Sat, Aug 2, 2014 at 3:54 AM, Donn Cave
wrote: Handles are for buffered I/O.
If this is the case, why is NoBuffering provided? Why does the documentation for Handle explicitly mention non-existent and zero-length buffers? If Handles are the standard cross-platform interface to buffered file IO, what is the standard cross-platform interface to unbuffered IO (output, specifically, since input is technically always at least 1 byte buffered)?
Handles provide a non-buffered interface, but often if you truly need unbuffered I/O you will be better suited by the platform's primitive operations; for which abstractions that enable some kind of reasonably platform independent interface may well cancel out the advantages of unbuffered I/O for those cases.
As a platform-specific optimization, this makes total sense. Most of the people who need this optimization know they need it. My fear is that offering "handles are for buffered I/O" (and statements like it) as general case rules leads to the "I'm new to <field> and I'm using System.Posix.IO for unbuffered IO because Handles aren't good for that, and now my program doesn't compile on <non-Posix OS>" ilk of Haskell-Cafe/SO questions. --- Elliot Robinson

quoth Elliot Robinson
As a platform-specific optimization, this makes total sense. Most of the people who need this optimization know they need it. My fear is that offering "handles are for buffered I/O" (and statements like it) as general case rules leads to the "I'm new to <field> and I'm using System.Posix.IO for unbuffered IO because Handles aren't good for that, and now my program doesn't compile on <non-Posix OS>" ilk of Haskell-Cafe/SO questions.
Your fears are misplaced. No one is going to do that, ever. Please bear in mind that we're talking about a Posix Fd produced by another Posix terminal function. That is where you will see one programmer after another feed this Fd into a Handle, under the mistaken impression that it's the only legit way to do I/O on it in Haskell, and then have various problems that we're talking about. When it's for buffered I/O, then it makes sense, otherwise it's probably an error. To address J.K.'s (lest I misspell) probably ironic question seriously, why I certainly say Haskell should strive to be as compatible as possible with ANSI C, whatever Mr. Allbery may think. Gratuitous difference in functionality that's clearly based on C I/O would benefit no one. Donn

On Mon, Aug 4, 2014 at 12:25 AM, Donn Cave
To address J.K.'s (lest I misspell) probably ironic question seriously, why I certainly say Haskell should strive to be as compatible as possible with ANSI C, whatever Mr. Allbery may think. Gratuitous difference in functionality that's clearly based on C I/O would benefit no one.
I *don't* fully agree with this, just because ANSI C is partly in the business of ensuring that ancient programs still behave to some extent, and ancient programs often use buffered I/O in situations where it isn't ideal but was typically better than unbuffered on ancient PDP11s, or vice versa. :) (And often don't use line buffering, as --- if the program is old enough --- it may predate it.) That said, on many platforms pipes kinda give you a taste of that environment; as IPC goes, they're fairly slow, so buffered I/O is often a visibly faster option. (Which is why there are lighter but more complex or less documented / characterized IPC mechanisms around on various Unix-like systems.) And I do mean "visibly"; it's still, even on modern hardware, not *that* difficult to end up with programs where you can see visible pauses between emitted characters if you disable buffering completely, whereas even with line buffering the program both loses the pauses and takes less time to run. But in any case, my main grump here is that anything that makes it harder (or non-viably slower) to interface Haskell with other programs makes it harder to use Haskell in practice. Stringing together programs with pipes is still common in the unix world (despite the efforts of e.g. the Gnome devs...) and Haskell defaulting to unbuffered (or line buffered, in the case of short lines) I/O on pipes would be unfortunate. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Le 04/08/2014 06:25, Donn Cave a écrit :
To address J.K.'s (lest I misspell) probably ironic question seriously, why I certainly say Haskell should strive to be as compatible as possible with ANSI C, whatever Mr. Allbery may think. Gratuitous difference in functionality that's clearly based on C I/O would benefit no one.
Thank you. In fact, B.A. [probably] thinks that this compatibility is important, while I don't care so much. Now, ironic or not ironic: I am afraid that your usage of: "gratuitous difference in functionality" is not correct. The languages are different, their /typical/ domains of application are not the same, and if somebody wants really to construct hybrid applications, I see no reason to mix-up Haskell and C within the same "niche": interfacing or sub-process creation, etc. Use one, or the other, why not? I think that insisting on the Frankesteinism is "gratuitous". Concerning buffering, when somebody (Richard O'K. presumably) says :
I am sick of having to tell students "Java does not buffer by default, it is up to YOU to use BufferedReader and/or BufferedWriter if you do not want I/O to be catastrophically slow". I would say that in many circumstances (Java EE services, etc.) this is far from being catastrophically slow.
The best. Jerzy Karczmarczuk Caen, France

Concerning buffering, when somebody (Richard O'K. presumably) says :
I am sick of having to tell students "Java does not buffer by default, it is up to YOU to use BufferedReader and/or BufferedWriter if you do not want I/O to be catastrophically slow". I would say that in many circumstances (Java EE services, etc.) this is far from being catastrophically slow.
It was me, and we don't do Java EE services. We *do* do information retrieval and data mining and stuff, some stuff coming from files and some stuff coming from files, and 'catastrophically slow' is reliably observed behaviour for that kind of stuff. I know so very little about Java EE services that it never occurred to me that people would use java.io.* for any significant amounts of data in that area. Is the usage of java.io.* in Java EE something that might find an analogue in Haskell use, making unbuffered I/O something Haskell programmers might need frequently? (I'm drawing a distinction here between on the one hand buffered I/O with careful flushing at interaction boundaries and on the other hand unbuffered I/O where each program-level transput call is a system-level transput operation.)

Line buffered would be an inefficient default. Stdout is fully buffered in
c99 if it can be determined that it is not attached to a terminal.
Alexander
On Aug 2, 2014 5:07 AM, "Chris Myzie"
I'm reposting this message because I think it only went to the google group and not the official haskell-cafe list:
On Friday, August 1, 2014 10:06:32 AM UTC-4, Chris Myzie wrote:
As a workaround, I am able to trick the child haskell process into thinking it's running in an interactive terminal by wrapping it with /usr/bin/script:
createProcess (proc "/usr/bin/script" ["-qc","./A","/dev/null"]) { std_out = CreatePipe }
I still think haskell is using screwy defaults for stdout buffering..
On Thursday, July 31, 2014 3:24:47 PM UTC-4, Chris Myzie wrote:
Hello,
I'm trying to write a wrapper process that will be able to read any child process' stdout. The problem I'm running into is that unless I force the child's stdout to LineBuffering, it defaults to BlockBuffering. Is BlockBuffering really the default in this case? I don't want to have to modify all of the child processes that I want to use with this wrapper.
Below is a simple test case. A.hs is the child process, and B.hs is the wrapper. If I run B.hs, I will get no output unless I uncomment the line in A.hs.
Thanks, Chris
------------------------------ A.hs --------------------------- import Control.Concurrent import System.IO
main :: IO () main = do -- hSetBuffering stdout LineBuffering putStrLn "test" >> threadDelay 1000000 >> main
------------------------------ B.hs --------------------------- import Control.Monad import System.IO import System.Process
main :: IO () main = do (_,Just h,_,_) <- createProcess (proc "./A" []) { std_out = CreatePipe } hSetBuffering h LineBuffering forever $ hGetLine h >>= putStrLn
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (7)
-
Alexander Kjeldaas
-
Brandon Allbery
-
Chris Myzie
-
Donn Cave
-
Elliot Robinson
-
Jerzy Karczmarczuk
-
ok@cs.otago.ac.nz