Emulating bash pipe/ process lib

Hi. I want to write a little haskell program executing about 4 programs
passing data via pipes. As my python script seems to be slower than a
bash script I want to try a ghc executable now.
It should invoke different parts of a text to speech chain. This way I
have one interface then.
Talar und #haskell told me that I might use runProcess and pass handles
for stdin and out created by createPipe and fdToHandle.
So my simple test looks like this:
module Main where
import System.IO
import System.Posix.IO
main = do
(fdIn,fdOut) <- createPipe
let (iohIn, iohOut) = (fdToHandle fdIn, fdToHandle fdOut)
hIn <- iohIn
hOut <- iohOut
hPutStr hIn "test"
line <- hGetLine hOut
print line -- should now print test having been piped through my pipe
but I get the error:
pipe2:

Marc Weber Marc Weber wrote:
Hi. I want to write a little haskell program executing about 4 programs passing data via pipes. As my python script seems to be slower than a bash script I want to try a ghc executable now. It should invoke different parts of a text to speech chain. This way I have one interface then.
Talar und #haskell told me that I might use runProcess and pass handles for stdin and out created by createPipe and fdToHandle.
So my simple test looks like this:
module Main where import System.IO import System.Posix.IO
main = do (fdIn,fdOut) <- createPipe let (iohIn, iohOut) = (fdToHandle fdIn, fdToHandle fdOut) hIn <- iohIn hOut <- iohOut hPutStr hIn "test" line <- hGetLine hOut print line -- should now print test having been piped through my pipe
but I get the error: pipe2:
: hPutStr: illegal operation (handle is not open for writing) And in current CVS docs in base.System.Process.hs it is said that createPipe is no longer exported ?
If you want to communicate with external programs via pipes, then System.Process should provide everything you need. Take a look at runInteractiveProcess in particular. Cheers, Simon

On 2/9/06, Marc Weber Marc Weber
Hi. I want to write a little haskell program executing about 4 programs passing data via pipes. As my python script seems to be slower than a bash script I want to try a ghc executable now. It should invoke different parts of a text to speech chain. This way I have one interface then.
There is also the HsShellScript library, which has many utilities for using Haskell as a shell programming language, including good support for piping. You can find it here: http://www.volker-wysk.de/hsshellscript/ -- Kurt

On Thu, 9 Feb 2006, Marc Weber Marc Weber wrote: ...
So my simple test looks like this:
module Main where import System.IO import System.Posix.IO
main = do (fdIn,fdOut) <- createPipe let (iohIn, iohOut) = (fdToHandle fdIn, fdToHandle fdOut) hIn <- iohIn hOut <- iohOut hPutStr hIn "test" line <- hGetLine hOut print line -- should now print test having been piped through my pipe
but I get the error: pipe2:
: hPutStr: illegal operation (handle is not open for writing)
That's right. "The first component is the fd to read from, the second is the write end." You're writing to the read end.
And in current CVS docs in base.System.Process.hs it is said that createPipe is no longer exported ?
Maybe they're just going to export functions like this under their common POSIX names ("pipe", "dup2"), and bring them more into line with standard behavior (execve takes argv[0..n], not argv[1..n].) "Slow" devices like pipes, sockets etc. get along fine with Handles or whatever buffered I/O - as long as you have only one going at a time. Multiple input sources - like, say you want to read a process' output (unit 1) and diagnostic output (unit 2) separately, and either one has the potential to fill up the pipe and block while you're waiting for input on the other pipe - buffers at least complicate the dispatching mechanics you need for this, if not make it impossible. For my money, it makes more sense to learn how deal with pipes directly at the file descriptor level, and at worst if the committees find it too unsightly you may have to write FFI interfaces to get pipe() back. Donn Cave, donn@drizzle.com

Hello Donn, Thursday, February 09, 2006, 8:58:27 PM, you wrote: DC> "Slow" devices like pipes, sockets etc. get along fine with Handles DC> or whatever buffered I/O - as long as you have only one going at a time. DC> Multiple input sources - like, say you want to read a process' output DC> (unit 1) and diagnostic output (unit 2) separately, and either one has DC> the potential to fill up the pipe and block while you're waiting for DC> input on the other pipe - buffers at least complicate the dispatching DC> mechanics you need for this, if not make it impossible. are you tried LineBuffering and NoBuffering? seem that it is developed exactly for this case (plus for terminals) -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Thu, 9 Feb 2006, Bulat Ziganshin wrote:
Thursday, February 09, 2006, 8:58:27 PM, you wrote: DC> "Slow" devices like pipes, sockets etc. get along fine with Handles DC> or whatever buffered I/O - as long as you have only one going at a time. DC> Multiple input sources - like, say you want to read a process' output DC> (unit 1) and diagnostic output (unit 2) separately, and either one has DC> the potential to fill up the pipe and block while you're waiting for DC> input on the other pipe - buffers at least complicate the dispatching DC> mechanics you need for this, if not make it impossible.
are you tried LineBuffering and NoBuffering? seem that it is developed exactly for this case (plus for terminals)
That's part of the idea, it helps keep the data out of buffers where select or whatever can't see it. But then you need functions with semantics that really support unbuffered I/O. When select tells you that the device is readable, you don't know that there is one full line, or how many bytes there are, so hGetLine doesn't apply here, nor would the Haskell equivalent of fread(3) if there were one. The only thing left is hGetChar - one char, then select, then another. (And multi-byte chars cause us to worry about this.) Since POSIX read(2) already supports exactly the functions you need for unbuffered I/O, it's simpler, easier and more efficient to leave the whole business right there at the file descriptor level. I'm sure you can make a non-buffering buffer layer work on top of the file descriptor, but what makes it worth the trouble? Donn Cave, donn@drizzle.com

Hello Donn, Friday, February 10, 2006, 12:47:42 AM, you wrote:
DC> "Slow" devices like pipes, sockets etc. get along fine with Handles DC> or whatever buffered I/O - as long as you have only one going at a time. DC> Multiple input sources - like, say you want to read a process' output DC> (unit 1) and diagnostic output (unit 2) separately, and either one has DC> the potential to fill up the pipe and block while you're waiting for DC> input on the other pipe - buffers at least complicate the dispatching DC> mechanics you need for this, if not make it impossible.
are you tried LineBuffering and NoBuffering? seem that it is developed exactly for this case (plus for terminals)
DC> That's part of the idea, it helps keep the data out of buffers where DC> select or whatever can't see it. DC> But then you need functions with semantics that really support unbuffered DC> I/O. When select tells you that the device is readable, you don't know DC> that there is one full line, or how many bytes there are, so hGetLine DC> doesn't apply here, nor would the Haskell equivalent of fread(3) if there DC> were one. The only thing left is hGetChar - one char, then select, then DC> another. (And multi-byte chars cause us to worry about this.) when i think how to implementat LineBuffering, i decided that it is the only possible way - read byte a time and see for a '\n'. i don't know how System.IO implemented but i think that it should do the same DC> Since POSIX read(2) already supports exactly the functions you need for DC> unbuffered I/O, it's simpler, easier and more efficient to leave the whole DC> business right there at the file descriptor level. can you please describe that read(2) does that is better than reading char-at-a-time? DC> I'm sure you can make DC> a non-buffering buffer layer work on top of the file descriptor, but what DC> makes it worth the trouble? if you don't have I/O library that implements what you need, it is indeed simpler to use lower I/O directly. if you have I/O library that does that you need, it is easier to write: (hIn, hOut) <- createUnixPipe vPutStrLn hOut "hello" s <- vGetLine hIn i'm writing such lib now, so i'm interested to know what i need to do so that it will work ok. -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
Hello Donn,
Friday, February 10, 2006, 12:47:42 AM, you wrote:
DC> "Slow" devices like pipes, sockets etc. get along fine with Handles DC> or whatever buffered I/O - as long as you have only one going at a time. DC> Multiple input sources - like, say you want to read a process' output DC> (unit 1) and diagnostic output (unit 2) separately, and either one has DC> the potential to fill up the pipe and block while you're waiting for DC> input on the other pipe - buffers at least complicate the dispatching DC> mechanics you need for this, if not make it impossible.
are you tried LineBuffering and NoBuffering? seem that it is developed exactly for this case (plus for terminals)
DC> That's part of the idea, it helps keep the data out of buffers where DC> select or whatever can't see it.
DC> But then you need functions with semantics that really support unbuffered DC> I/O. When select tells you that the device is readable, you don't know DC> that there is one full line, or how many bytes there are, so hGetLine DC> doesn't apply here, nor would the Haskell equivalent of fread(3) if there DC> were one. The only thing left is hGetChar - one char, then select, then DC> another. (And multi-byte chars cause us to worry about this.)
when i think how to implementat LineBuffering, i decided that it is the only possible way - read byte a time and see for a '\n'. i don't know how System.IO implemented but i think that it should do the same
Read as much as you can into the buffer without blocking. Line buffering on input is actually implemented exactly the same as block buffering. You might argue that strictly speaking this isn't line buffering, since you can get data from the Handle before the end of line is available. That's true, but I'd argue this is more useful. In fact, we changed block buffering on input handles so that the input buffer doesn't have to be completely full before data can be returned, which is also not strict block buffering, but seems more useful. I suppose conceivably you might want to force a read buffer to be completely full so that you could guarantee to read it all without blocking, but in that case you might as well use hGetBuf & peekArray. Similarly you might want to ensure the buffer contains a complete line before starting to read it, but can use hGetLine anyway. Cheers, Simon

Hello Simon, Friday, February 10, 2006, 2:53:25 PM, you wrote: i'm not very interested to do something fascinating in this area. it seems that it is enough to do 1) non-blocking read of the entire buffer on input 2) flush buffer at each '\n' at output that should be enough to implement LineBuffering for everyone except purists? and for the NoBuffering the same except for flushing after each output operation?
DC> "Slow" devices like pipes, sockets etc. get along fine with Handles DC> or whatever buffered I/O - as long as you have only one going at a time. DC> Multiple input sources - like, say you want to read a process' output DC> (unit 1) and diagnostic output (unit 2) separately, and either one has DC> the potential to fill up the pipe and block while you're waiting for DC> input on the other pipe - buffers at least complicate the dispatching DC> mechanics you need for this, if not make it impossible.
are you tried LineBuffering and NoBuffering? seem that it is developed exactly for this case (plus for terminals)
DC> That's part of the idea, it helps keep the data out of buffers where DC> select or whatever can't see it.
DC> But then you need functions with semantics that really support unbuffered DC> I/O. When select tells you that the device is readable, you don't know DC> that there is one full line, or how many bytes there are, so hGetLine DC> doesn't apply here, nor would the Haskell equivalent of fread(3) if there DC> were one. The only thing left is hGetChar - one char, then select, then DC> another. (And multi-byte chars cause us to worry about this.)
when i think how to implementat LineBuffering, i decided that it is the only possible way - read byte a time and see for a '\n'. i don't know how System.IO implemented but i think that it should do the same
SM> Read as much as you can into the buffer without blocking. Line SM> buffering on input is actually implemented exactly the same as block SM> buffering. SM> You might argue that strictly speaking this isn't line buffering, since SM> you can get data from the Handle before the end of line is available. SM> That's true, but I'd argue this is more useful. In fact, we changed SM> block buffering on input handles so that the input buffer doesn't have SM> to be completely full before data can be returned, which is also not SM> strict block buffering, but seems more useful. SM> I suppose conceivably you might want to force a read buffer to be SM> completely full so that you could guarantee to read it all without SM> blocking, but in that case you might as well use hGetBuf & peekArray. SM> Similarly you might want to ensure the buffer contains a complete line SM> before starting to read it, but can use hGetLine anyway. -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
Friday, February 10, 2006, 2:53:25 PM, you wrote:
i'm not very interested to do something fascinating in this area. it seems that it is enough to do
1) non-blocking read of the entire buffer on input 2) flush buffer at each '\n' at output
that should be enough to implement LineBuffering for everyone except purists? and for the NoBuffering the same except for flushing after each output operation?
Yes, exactly. This is almost what GHC's System.IO currently does, except that for NoBuffering we have a fixed buffer size of 1 byte. It would be safe to have a larger buffer size for NoBuffering read handles, but I didn't recognise that when I wrote it. Cheers, Simon

Hello Simon, Tuesday, February 21, 2006, 4:05:57 PM, you wrote:
i'm not very interested to do something fascinating in this area. it seems that it is enough to do
1) non-blocking read of the entire buffer on input 2) flush buffer at each '\n' at output
that should be enough to implement LineBuffering for everyone except purists? and for the NoBuffering the same except for flushing after each output operation?
SM> Yes, exactly. This is almost what GHC's System.IO currently does, SM> except that for NoBuffering we have a fixed buffer size of 1 byte. It SM> would be safe to have a larger buffer size for NoBuffering read handles, SM> but I didn't recognise that when I wrote it. btw, this makes NoBuffering mode unusable for some tasks. Donn Cave wrote about this here: ...
when i think how to implementat LineBuffering, i decided that it is the only possible way - read byte a time and see for a '\n'. i don't know how System.IO implemented but i think that it should do the same
Don't know - I see that Simon M followed up with an explanation that I think confirms my impression that LineBuffering is indistinguishable from BlockBuffering, for input. I assume it's only there for the sake of output, where it does make a difference. Only NoBuffering is interoperable with select.
DC> Since POSIX read(2) already supports exactly the functions you need for DC> unbuffered I/O, it's simpler, easier and more efficient to leave the whole DC> business right there at the file descriptor level.
can you please describe that read(2) does that is better than reading char-at-a-time?
It returns whatever available data, as long as it's more than 0 bytes and less than the caller-supplied limit. This is the only read operation that works with select (including char-at-a-time, as a special case where the caller-supplied limit is 1.)
DC> I'm sure you can make DC> a non-buffering buffer layer work on top of the file descriptor, but what DC> makes it worth the trouble?
if you don't have I/O library that implements what you need, it is indeed simpler to use lower I/O directly. if you have I/O library that does that you need, it is easier to write:
(hIn, hOut) <- createUnixPipe vPutStrLn hOut "hello" s <- vGetLine hIn
i'm writing such lib now, so i'm interested to know what i need to do so that it will work ok.
It won't! I mean, we can use it the same way as the ordinary Handle in the original example, but we know in principle, if you call vGetLine, it may block regardless of whether select reports input data, because select can't tell you whether there's a full line of input. So you don't have anything to worry about here - this is not your problem. I only wanted to point out that for select-based I/O event multiplexing, we will continue to need file descriptors and system level POSIX I/O, and that the need for this can occur in such ordinary, mundane applications as reading stdout and stderr in parallel. Donn Cave, donn@drizzle.com -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Tuesday, February 21, 2006, 4:05:57 PM, you wrote:
i'm not very interested to do something fascinating in this area. it seems that it is enough to do
1) non-blocking read of the entire buffer on input 2) flush buffer at each '\n' at output
that should be enough to implement LineBuffering for everyone except purists? and for the NoBuffering the same except for flushing after each output operation?
SM> Yes, exactly. This is almost what GHC's System.IO currently does, SM> except that for NoBuffering we have a fixed buffer size of 1 byte. It SM> would be safe to have a larger buffer size for NoBuffering read handles, SM> but I didn't recognise that when I wrote it.
btw, this makes NoBuffering mode unusable for some tasks. Donn Cave wrote about this here:
Someone is confused here, probably me. Could you elaborate on what you're referring to by "this", and "unusable for some tasks"? Cheers, Simon

On Fri, 10 Feb 2006, Bulat Ziganshin wrote: ...
when i think how to implementat LineBuffering, i decided that it is the only possible way - read byte a time and see for a '\n'. i don't know how System.IO implemented but i think that it should do the same
Don't know - I see that Simon M followed up with an explanation that I think confirms my impression that LineBuffering is indistinguishable from BlockBuffering, for input. I assume it's only there for the sake of output, where it does make a difference. Only NoBuffering is interoperable with select.
DC> Since POSIX read(2) already supports exactly the functions you need for DC> unbuffered I/O, it's simpler, easier and more efficient to leave the whole DC> business right there at the file descriptor level.
can you please describe that read(2) does that is better than reading char-at-a-time?
It returns whatever available data, as long as it's more than 0 bytes and less than the caller-supplied limit. This is the only read operation that works with select (including char-at-a-time, as a special case where the caller-supplied limit is 1.)
DC> I'm sure you can make DC> a non-buffering buffer layer work on top of the file descriptor, but what DC> makes it worth the trouble?
if you don't have I/O library that implements what you need, it is indeed simpler to use lower I/O directly. if you have I/O library that does that you need, it is easier to write:
(hIn, hOut) <- createUnixPipe vPutStrLn hOut "hello" s <- vGetLine hIn
i'm writing such lib now, so i'm interested to know what i need to do so that it will work ok.
It won't! I mean, we can use it the same way as the ordinary Handle in the original example, but we know in principle, if you call vGetLine, it may block regardless of whether select reports input data, because select can't tell you whether there's a full line of input. So you don't have anything to worry about here - this is not your problem. I only wanted to point out that for select-based I/O event multiplexing, we will continue to need file descriptors and system level POSIX I/O, and that the need for this can occur in such ordinary, mundane applications as reading stdout and stderr in parallel. Donn Cave, donn@drizzle.com
participants (6)
-
Bulat Ziganshin
-
Bulat Ziganshin
-
Donn Cave
-
Kurt Hutchinson
-
Marc Weber Marc Weber
-
Simon Marlow