
On Tue, 2008-02-26 at 12:44 +0000, Ross Paterson wrote:
On Tue, Feb 26, 2008 at 11:47:49AM +0000, Duncan Coutts wrote:
The major problem is with code that assumes GHC's Handles are essentially Word8 and layer their own UTF8 or other decoding over the top. The utf8-string package has this problem for example. Such code should be using openBinaryFile because they are reading/writing binary data, not String text.
As I was saying on cabal-devel, I think this distinction ought to be in the types, i.e. we need, in base, a type distinct from Handle that offers a Word8 interface to binary I/O, as a foundation for various experiments with encodings (which need not all be in base).
I agree. If we can come to a consensus on the interpretation of the H98 text Handles then the next step is to start a discussion on a standard binary IO system (and I'd certainly support using a different type of Handle so we never mix up binary data and [Char]). The main point of difference so far seems to be whether we pick a fixed utf8 encoding or the the current locale encoding or some mixture depending on the kind of IO object. I think that's where we should focus the discussion initially. It'd be nice if there was agreement between the different implementations. It seems we're not far from agreement between at least hugs, ghc and jhc. Ross, perhaps you can put the argument for what hugs currently does - always using the locale for all terminal an text file IO rather than picking a fixed encoding. Duncan