runInteractiveProcess and hGetLine on Windows

Hello all, I bumped into a "feature" which might be a bug, but to be certain, I'd like to hear your opinion. I'm running ghc 6.8.2 on Windows XP, and with ghci I do the following: Prelude System.Process System.IO> (inp,outp,err,ph) <- runInteractiveProcess "kpsewhich" ["testfile.txt"] Nothing Nothing ('kpsewhich' is a simple path searching utility used by web2c TeX system, returns the full path to the file if found in the system defined search path, but you all probably know that.) and then: Prelude System.Process System.IO> hGetLine outp which gives me: "./testfile.txt\r" as opposed to "./testfile.txt" which I get on my Linux box. Is the "\r" supposed to be at the end? I thought it is part of the line separator in Windows, and as such, should not be part of the line retrieved? Same thing happens when compiled with ghc. Best Wishes, Harri K.

On Wed, May 07, 2008 at 04:42:45PM +0200, Harri Kiiskinen wrote:
Prelude System.Process System.IO> (inp,outp,err,ph) <- runInteractiveProcess "kpsewhich" ["testfile.txt"] Nothing Nothing ... Prelude System.Process System.IO> hGetLine outp
which gives me:
"./testfile.txt\r"
as opposed to "./testfile.txt" which I get on my Linux box.
Is the "\r" supposed to be at the end? I thought it is part of the line separator in Windows, and as such, should not be part of the line retrieved? Same thing happens when compiled with ghc.
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode). In order to get windows-style line handling, a file handle needs to be opened in text mode, which is certainly not the correct behavior for runInteractiveProcess, which has no knowledge about the particular behavior of the program it's running. "\r\n" as newline should die a rapid death... windows is hard enough without maintaining this sort of stupidity. -- David Roundy Department of Physics Oregon State University

David Roundy wrote:
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode).
I think it would be more accurate to say that runInteractiveProcess has an inadequate API, since you can't indicate whether the interaction with the other process should happen in text or binary mode. Simon: do the new entry points in System.Process take line ending conventions into account?

On Wed, May 07, 2008 at 08:33:23AM -0700, Bryan O'Sullivan wrote:
David Roundy wrote:
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode).
I think it would be more accurate to say that runInteractiveProcess has an inadequate API, since you can't indicate whether the interaction with the other process should happen in text or binary mode.
I don't see any reason to support text mode. It's easy to filter by hand if you absolutely have to deal with ugly applications on ugly platforms. -- David Roundy Department of Physics Oregon State University

Hello David, Wednesday, May 7, 2008, 7:46:11 PM, you wrote:
I don't see any reason to support text mode. It's easy to filter by hand if you absolutely have to deal with ugly applications on ugly platforms.
you mean unix, of course? ;) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Wed, May 07, 2008 at 07:48:45PM +0400, Bulat Ziganshin wrote:
Hello David,
Hi Bulat!
Wednesday, May 7, 2008, 7:46:11 PM, you wrote:
I don't see any reason to support text mode. It's easy to filter by hand if you absolutely have to deal with ugly applications on ugly platforms.
you mean unix, of course? ;)
Maybe I should have said "if you don't care what the actual output of the programs you run is"? Then it would have been clear that I was talking about Windows... David

On May 7, 2008, at 8:46 AM, David Roundy wrote:
On Wed, May 07, 2008 at 08:33:23AM -0700, Bryan O'Sullivan wrote:
David Roundy wrote:
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode).
I think it would be more accurate to say that runInteractiveProcess has an inadequate API, since you can't indicate whether the interaction with the other process should happen in text or binary mode.
I don't see any reason to support text mode.
Doesn't hGetLine imply text mode? What does "Line" mean, otherwise? Donn Cave, donn@avvanta.com

On Wed, May 7, 2008 at 9:12 AM, Donn Cave
Doesn't hGetLine imply text mode? What does "Line" mean, otherwise?
On normal operating systems, "line" means until you reach a '\n' character. In fact, that's also what it means when reading in text mode, it's just that when in text mode on DOS-descended systems, the character sequence "\r\n" is converted to "\n" by the operating system. David

David Roundy
...when in text mode on DOS-descended systems, the character sequence "\r\n" is converted to "\n" by the operating system.
So basically, Windows supports both the "\n" convention and the "\r\n" convention by making a distinction between "text" and "binary" read modes. No other major operating system requires this distinction -- they are all *nixen -- so it seems reasonable to just punt on it. It would be too bad, though, if this resulted in a lot of Windows specific code getting written -- there are a lot of Windows users and eventually they'll unionize or something. People will throw together System.Win32.TextMode or something like that and then projects will be littered with platform specific code, though they needn't be. If we just put up a `textMode` filter, then everyone will have to throw that in front of their reads/writes to guard against corruption on Windows. We'll have verbose, silly looking code. If, on the other hand, we just give in to Windows, then some things are good and some are bad. First of all, if hGetLine has Windows behaviour on Windows and Unix behaviour on Unix, then any data files shipped with Cabal packages will likely need to be newline transformed. That is annoying. On the other hand, the semantics of 'getting a line' will be maintained across platforms, and the Windows users will be pacified (for a time). We all know what appeasement got the British... -- _jsn

On Wed, 2008-05-07 at 10:20 -0700, Jason Dusek wrote:
If, on the other hand, we just give in to Windows, then some things are good and some are bad. First of all, if hGetLine has Windows behaviour on Windows and Unix behaviour on Unix, then any data files shipped with Cabal packages will likely need to be newline transformed. That is annoying.
Cabal already does deal with both (and mixed) conventions because people make files on one system and transfer them to the other. Duncan

On Wed, 2008-05-07 at 08:46 -0700, David Roundy wrote:
On Wed, May 07, 2008 at 08:33:23AM -0700, Bryan O'Sullivan wrote:
David Roundy wrote:
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode).
I think it would be more accurate to say that runInteractiveProcess has an inadequate API, since you can't indicate whether the interaction with the other process should happen in text or binary mode.
I don't see any reason to support text mode. It's easy to filter by hand if you absolutely have to deal with ugly applications on ugly platforms.
If it was only Windows' silly line ending convention I'd be tempted to agree but we probably want to distinguish text and binary handles anyway. You get Chars out of a text handle (with some string ed/decoding) and bytes out of a binary handle. In that case, default line ending convention is just another thing to throw in with the text encoding conversion. Duncan

On Wed, May 07, 2008 at 09:24:46PM +0100, Duncan Coutts wrote:
On Wed, 2008-05-07 at 08:46 -0700, David Roundy wrote:
On Wed, May 07, 2008 at 08:33:23AM -0700, Bryan O'Sullivan wrote:
David Roundy wrote:
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode).
I think it would be more accurate to say that runInteractiveProcess has an inadequate API, since you can't indicate whether the interaction with the other process should happen in text or binary mode.
I don't see any reason to support text mode. It's easy to filter by hand if you absolutely have to deal with ugly applications on ugly platforms.
If it was only Windows' silly line ending convention I'd be tempted to agree but we probably want to distinguish text and binary handles anyway. You get Chars out of a text handle (with some string ed/decoding) and bytes out of a binary handle. In that case, default line ending convention is just another thing to throw in with the text encoding conversion.
But that's a feature that was only added in a very recent ghc, right? I consider it an ugly hack to work around the fact that we have no system for dealing with file encodings. I'd rather consider text mode handles to be an ugly workaround for an ugly system, and have a clean solution for unicode (e.g. one that allows for the reading of files that are not in the locale encoding). I certainly wouldn't want to be forced to live with DOS line endings just to generate unicode output. Fortunately, darcs doesn't do unicode (or need to) or text mode, so I personally am pretty safe from this feature. David

On Wed, 2008-05-07 at 08:33 -0700, Bryan O'Sullivan wrote:
David Roundy wrote:
This is the correct behavior (although it's debatable whether kpsewhich should be outputting in text mode).
I think it would be more accurate to say that runInteractiveProcess has an inadequate API, since you can't indicate whether the interaction with the other process should happen in text or binary mode.
Simon: do the new entry points in System.Process take line ending conventions into account?
It doesn't require any new api: (inh,outh,errh,pid) <- runInteractiveProcess path args Nothing Nothing -- We want to process the output as text. hSetBinaryMode outh False As of a couple weeks ago the docs for runInteractiveProcess even say: -- The 'Handle's are initially in binary mode; if you need them to be -- in text mode then use 'hSetBinaryMode'. Duncan

Thank You all for the lively discussion, and of course, a nice and simple answer to my problem: On Wed, 2008-05-07 at 21:17 +0100, Duncan Coutts wrote:
(inh,outh,errh,pid) <- runInteractiveProcess path args Nothing Nothing -- We want to process the output as text. hSetBinaryMode outh False
As to the following claim:
As of a couple weeks ago the docs for runInteractiveProcess even say:
-- The 'Handle's are initially in binary mode; if you need them to be -- in text mode then use 'hSetBinaryMode'.
At least the haskell.org standard library reference does not say this (http://haskell.org/ghc/docs/latest/html/libraries/process/System-Process.htm...), but the information can be found on the System.IO reference page. As a general comment to this feature, I find it quite acceptable, as on non-Windows systems it does no harm, and on Windows, you get the desired behaviour. Seems to reflect closely the practice and distinction in C (http://en.wikipedia.org/wiki/Newline), too. Harri K.

On Thu, 2008-05-08 at 00:12 +0200, Harri Kiiskinen wrote:
Thank You all for the lively discussion, and of course, a nice and simple answer to my problem:
On Wed, 2008-05-07 at 21:17 +0100, Duncan Coutts wrote:
(inh,outh,errh,pid) <- runInteractiveProcess path args Nothing Nothing -- We want to process the output as text. hSetBinaryMode outh False
As to the following claim:
As of a couple weeks ago the docs for runInteractiveProcess even say:
-- The 'Handle's are initially in binary mode; if you need them to be -- in text mode then use 'hSetBinaryMode'.
At least the haskell.org standard library reference does not say this (http://haskell.org/ghc/docs/latest/html/libraries/process/System-Process.htm...)
Sorry, I wasn't clear. I meant that it was added in the development branch a couple weeks ago. http://darcs.haskell.org/libraries/process/System/Process.hs Duncan

David Roundy wrote:
"\r\n" as newline should die a rapid death... windows is hard enough without maintaining this sort of stupidity.
Windows *does* do a number of very silly things. However, Windows isn't going away any time soon. And personally, I'd prefer it if we could make it easier to support it in Haskell. I think a pure function that takes text formatted in any way and transforms it into some kind of "canonical" form would be a useful starting point. You'd probably want a platform-specific inverse function too. (I notice that FilePath manages to work differently on different platforms, so it's possible.) Currently the only way to do these transformations is to set the right channel mode at the instant you read the text. If we had a set of pure functions, you could transform the text after it's read - even if it's read in the wrong file mode, or you don't know whether it's text or binary until later.

Windows: end of line is \r\n Unix: end of line is \n BUT, these days Windows programs have to deal with text files written on Unix, and Unix programs have to deal with text files written on Windows, especially when mounting networked file systems using things like NFS and Samba. Even when working with local files, there isn't any way for a program on either system to tell where a text file originally came from. So programs on BOTH systems really need to deal with BOTH conventions. We can go further: the Internet convention for end of line is, sadly, and somewhat accidentally, the same as the Windows convention. It's a right pain sometimes having to remember to stuff \r into things on UNIX so that it will go the right way down the wire (according to the strict protocol) to a program on the other end whose designer really wished the \r weren't there, but that's the world we live in. According to the ASCII standard, it was fully legitimate to use backspace and carriage return to get over-striking (which is why ASCII includes oddities such as ^ and ` : they really are for accents, and , did double duty as cedilla, ' as acute accent, =\b/ really was not-equals (as was /\b=), &c). According the the ISO 8859 standard, that's not kosher any more. So there are (on Windows and Unix) no known uses for isolated \r characters. Accordingly, a text mode that simply throws away every \r it comes across will not just be useful on Windows, it will be useful on Unix as well. The old DOS Ctrl-Z convention hasn't been recommended practice on Windows for years, so there's not much point bothering with that.

"Richard A. O'Keefe"
According to the ASCII standard, it was fully legitimate to use backspace and carriage return to get over-striking (which is why ASCII includes oddities such as ^ and ` : they really are for accents, and , did double duty as cedilla, ' as acute accent, =\b/ really was not-equals (as was /\b=), &c). According the the ISO 8859 standard, that's not kosher any more. So there are (on Windows and Unix) no known uses for isolated \r characters.
Say what? I use \r when generating output to a terminal when I want to update the current line of output instead of writing a new line. E.g. for tracking progress in my programs. (As a line terminator followed by \n, it would have no effect though.) -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (11)
-
Andrew Coppin
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
David Roundy
-
Donn Cave
-
Duncan Coutts
-
Harri Kiiskinen
-
Harri Kiiskinen
-
Jason Dusek
-
Ketil Malde
-
Richard A. O'Keefe