
Hello, on http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/System-IO... it says:
An attempt to write a character greater than '\255' to a Handle using the latin1 encoding will result in an error.
However, according to my experience (with GHC 6.12.1 and base 4.2.0.0), latin1 just outputs a character '\cccc' as cccc `mod` 256. Should the docs be changed? Best wishes, Wolfgang

On 16/12/10 20:53, Wolfgang Jeltsch wrote:
Hello,
on
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/System-IO...
it says:
An attempt to write a character greater than '\255' to a Handle using the latin1 encoding will result in an error.
However, according to my experience (with GHC 6.12.1 and base 4.2.0.0), latin1 just outputs a character '\cccc' as cccc `mod` 256.
It seems to be working as advertised for me: $ ghc-6.12.1 --interactive GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. Prelude> import System.IO Prelude System.IO> hSetEncoding stdout latin1 Prelude System.IO> putChar '\256' *** Exception: <stdout>: hPutChar: invalid argument (character is out of range for this encoding) Prelude System.IO> I get the same results with 7.0.1. Can you tell me how to reproduce the problem you're seeing? Cheers, Simon

Am Donnerstag, den 23.12.2010, 10:09 +0000 schrieb Simon Marlow:
On 16/12/10 20:53, Wolfgang Jeltsch wrote:
Hello,
on
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/System-IO...
it says:
An attempt to write a character greater than '\255' to a Handle using the latin1 encoding will result in an error.
However, according to my experience (with GHC 6.12.1 and base 4.2.0.0), latin1 just outputs a character '\cccc' as cccc `mod` 256.
It seems to be working as advertised for me:
$ ghc-6.12.1 --interactive GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. Prelude> import System.IO Prelude System.IO> hSetEncoding stdout latin1 Prelude System.IO> putChar '\256' *** Exception: <stdout>: hPutChar: invalid argument (character is out of range for this encoding) Prelude System.IO>
I get the same results with 7.0.1. Can you tell me how to reproduce the problem you're seeing?
I used hSetBinaryMode stdout True instead of hSetEncoding stdout latin1 . The documentation of hSetBinaryMode says: This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation. It seems that this sentence is wrong. Best wishes, Wolfgang

On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
The documentation of hSetBinaryMode says:
This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation.
It seems that this sentence is wrong.
It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable... Of course, the first 256 Unicode code points do encode in ISO-8859-1 (latin1) to the numerically equivalent bytes. However, I'd strongly support changing the documentation to not reference "latin1". There is great confusion about the latin1 character encoding on the Internet, to the degree that HTML5 will mandate that it be "misinterpreted for compatibility" as Windows-1252[1]. I'm glad we don't make such mistakes, and that's why I don't think we should be "repurposing" latin1 for binary use. Perhaps the right way is that there should be an encoding called "binary" (or "octet"?). Setting hSetBinaryMode to True would set hSetEncoding to binary, and vice-versa. Then this encoding could be defined to have the interesting behavior observed: writing the code-point value mod 256. - Mark [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#cha... Mark Lentczner http://www.ozonehouse.com/mark/ IRC: mtnviewmark

On Mon, Dec 27, 2010 at 09:04:41AM -0800, Mark Lentczner wrote:
On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
The documentation of hSetBinaryMode says:
This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation.
It seems that this sentence is wrong.
It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable...
That sounds like a very good idea. Even better, flag this error at compile time by having a different type for unencoded handles.

On 27/12/2010 17:51, Ross Paterson wrote:
On Mon, Dec 27, 2010 at 09:04:41AM -0800, Mark Lentczner wrote:
On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
The documentation of hSetBinaryMode says:
This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation.
It seems that this sentence is wrong.
It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable...
That sounds like a very good idea. Even better, flag this error at compile time by having a different type for unencoded handles.
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so). Cheers, Simon

Simon Marlow schrieb:
On 27/12/2010 17:51, Ross Paterson wrote:
On Mon, Dec 27, 2010 at 09:04:41AM -0800, Mark Lentczner wrote:
On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
The documentation of hSetBinaryMode says:
This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation.
It seems that this sentence is wrong.
It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable...
That sounds like a very good idea. Even better, flag this error at compile time by having a different type for unencoded handles.
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so).
Isn't this also the purpose of the safer-file-handles package? http://hackage.haskell.org/package/safer-file-handles

On Tue, Jan 04, 2011 at 04:03:34PM +0100, Henning Thielemann wrote:
Simon Marlow schrieb:
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so).
Isn't this also the purpose of the safer-file-handles package?
No, that puts the IOMode and region in the type, not the binary mode.

On Tue, Jan 4, 2011 at 4:27 PM, Ross Paterson
On Tue, Jan 04, 2011 at 04:03:34PM +0100, Henning Thielemann wrote:
Simon Marlow schrieb:
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so).
Isn't this also the purpose of the safer-file-handles package?
No, that puts the IOMode and region in the type, not the binary mode.
Indeed. However, I would be glad to add such a safety feature if we can distill a nice API. What would such an API look like? One approach might be something like this: (not type-checked, not even parsed...) -- Types newtype TextHandle = ... newtype ByteHandle = ... -- Opening openFile :: FilePath -> IOMode -> IO TextHandle openBinaryFile :: FilePath -> IOMode -> IO ByteHandle -- Text input/output hGetLine :: TextHandle -> IO String hPutStrLn :: TextHandle -> String -> IO () ... -- Binary input/output hGetBuf :: ByteHandle -> Ptr a -> Int -> IO Int hPutBuf :: ByteHandle -> Ptr a -> Int -> IO () ... -- General operations on handles (text/binary) class Handle h where hClose :: h -> IO () hIsEOF :: h -> IO Bool hSetBuffering :: h -> BufferMode -> IO () ... instance Handle TextHandle where ... instance Handle ByteHandle where ... The disadvantage of this approach is that all general operations on handles (that should both work for Text- and ByteHandles) need to be put in a class. A way to solve this is to have a single Handle type which is parameterized by a phantom type that represents the Text/Byte mode: (I also use this style to encode the IOMode of Handles in safer-file-handles). {-# LANGUAGE EmptyDataDecls #-} newtype Handle mode = ... data Text data Byte type TextHandle = Handle Text type ByteHandle = Handle Byte The types of the open functions and the types of the text I/O and byte I/O functions remain the same. The types of the general operations will change to: hClose :: Handle mode -> IO () hIsEOF :: Handle mode -> IO Bool hSetBuffering :: Handle mode -> BufferMode -> IO () ... Note that these are now polymorphic in the text/byte mode. In both these approaches the following questions remain: * What about the standard handles? What are their types? Maybe we need stdinText :: TextHandle, stdinBytes :: BytesHandle, etc.. But this will make it more difficult to use them at the same time. * What about hSetBinaryMode :: Handle -> Bool -> IO () ? Should we remove it from such an API. What are your thoughts on this? Regards, Bas

On Tue, Jan 04, 2011 at 07:57:46PM +0100, Bas van Dijk wrote:
On Tue, Jan 4, 2011 at 4:27 PM, Ross Paterson
wrote: On Tue, Jan 04, 2011 at 04:03:34PM +0100, Henning Thielemann wrote:
Simon Marlow schrieb:
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so).
Isn't this also the purpose of the safer-file-handles package?
No, that puts the IOMode and region in the type, not the binary mode.
Indeed. However, I would be glad to add such a safety feature if we can distill a nice API.
What would such an API look like?
My simplistic thought was to to make System.IO.Binary essentially a copy of System.IO minus the text-only operations, with its own Handle type and hClose, etc. Apart from the name clashes, the issues I could see were: - instead of setBinaryMode and setEncoding, you'd have operations to convert between the Handle types, with the extra requirement that the argument Handle not be used later. - you'd probably need both binary and text versions of stdin, stdout and stderr, with a requirement that you only use one of them.

On 04/01/2011 14:50, Simon Marlow wrote:
On 27/12/2010 17:51, Ross Paterson wrote:
On Mon, Dec 27, 2010 at 09:04:41AM -0800, Mark Lentczner wrote:
On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
The documentation of hSetBinaryMode says:
This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation.
It seems that this sentence is wrong.
It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable...
That sounds like a very good idea. Even better, flag this error at compile time by having a different type for unencoded handles.
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so).
As per the above discussion, I formally propose to add the following to System.IO: -- | An encoding in which Unicode code points are translated to bytes -- by taking the code point modulo 256. When decoding, bytes are -- translated directly into the equivalent code point. -- -- This encoding never fails in either direction. However, encoding -- discards informaiton, so encode followed by decode is not the -- identity. binary :: TextEncoding Any objections? Cheers, Simon

On Wednesday 30 March 2011 11:19:55, Simon Marlow wrote:
As per the above discussion, I formally propose to add the following to System.IO:
-- | An encoding in which Unicode code points are translated to bytes -- by taking the code point modulo 256. When decoding, bytes are -- translated directly into the equivalent code point. -- -- This encoding never fails in either direction. However, encoding -- discards informaiton, so encode followed by decode is not the -- identity. binary :: TextEncoding
Any objections?
I object to the typo informaiton instead of information, otherwise it's clear and unambiguous, +1

2011/3/30 Simon Marlow
On 04/01/2011 14:50, Simon Marlow wrote:
On 27/12/2010 17:51, Ross Paterson wrote:
On Mon, Dec 27, 2010 at 09:04:41AM -0800, Mark Lentczner wrote:
On Dec 25, 2010, at 7:34 AM, Wolfgang Jeltsch wrote:
The documentation of hSetBinaryMode says:
This has the same effect as calling hSetEncoding with latin1, together with hSetNewlineMode with noNewlineTranslation.
It seems that this sentence is wrong.
It seems wrong to me in intent. When a handle is in "binary" mode, it shouldn't have any encoding. If things were different, I'd want to propose that doing String I/O to such handles should fail, and that you should only be able to use ByteString with them. But I suppose that isn't viable...
That sounds like a very good idea. Even better, flag this error at compile time by having a different type for unencoded handles.
Good plan. I'll make a proposal to add System.IO.binary. A different type for binary handles is the right thing, but it's a larger undertaking so I don't plan to attack it right now (someone else is welcome to do so).
As per the above discussion, I formally propose to add the following to System.IO:
-- | An encoding in which Unicode code points are translated to bytes -- by taking the code point modulo 256. When decoding, bytes are -- translated directly into the equivalent code point. -- -- This encoding never fails in either direction. However, encoding -- discards informaiton, so encode followed by decode is not the -- identity. binary :: TextEncoding
Any objections?
In the 'bytestring' library, String handling based on this encoding is provided by the Data.ByteString.Char8 module and its lazy cousin. Therefore, I'd suggest to name this non-standard (or is there a standard?) encoding char8 :: TextEncoding Apart from historical reasons, I also prefer this name as it conveys more information about its semantics. best regards, Simon

On 30/03/2011 10:58, Simon Meier wrote:
In the 'bytestring' library, String handling based on this encoding is provided by the Data.ByteString.Char8 module and its lazy cousin. Therefore, I'd suggest to name this non-standard (or is there a standard?) encoding
char8 :: TextEncoding
Apart from historical reasons, I also prefer this name as it conveys more information about its semantics.
There's a clash here as we already use the term "binary" elsewhere in the System.IO API: openBinaryFile, hSetBinary. So, it's not clear to me that being consistent with Data.ByteString is better than being consistent with System.IO. On the other hand, "char8" is a better name than "binary". I don't feel strongly either way here. Cheers, Simon

2011/3/30 Simon Marlow
On 30/03/2011 10:58, Simon Meier wrote:
In the 'bytestring' library, String handling based on this encoding is
provided by the Data.ByteString.Char8 module and its lazy cousin. Therefore, I'd suggest to name this non-standard (or is there a standard?) encoding
char8 :: TextEncoding
Apart from historical reasons, I also prefer this name as it conveys more information about its semantics.
There's a clash here as we already use the term "binary" elsewhere in the System.IO API: openBinaryFile, hSetBinary. So, it's not clear to me that being consistent with Data.ByteString is better than being consistent with System.IO. On the other hand, "char8" is a better name than "binary". I don't feel strongly either way here.
Ah, I didn't see that. However, if I recall correctly, the long-term plan is to introduce a separate type for binary files. I would expect that the caller has to explicitly specify an encoding when writing a String to such a binary file. However, there still might be uses of text file handles with a 'char8' encoding. Hence, in the long-term, we can actually avoid confusion by not choosing 'binary' as the name of this encoding. best regards, Simon
participants (8)
-
Bas van Dijk
-
Daniel Fischer
-
Henning Thielemann
-
Mark Lentczner
-
Ross Paterson
-
Simon Marlow
-
Simon Meier
-
Wolfgang Jeltsch