RE: Text in Haskell: a second proposal

15 Aug 2002

      [ moving to haskell-i18n@haskell.org ]
...
For ISO-8859-1 each Char is exactly one Word8, so surely it 
would work fine with partial reads?
decodeCharISO88591 :: Word8 -> Char;
encodeCharISO88591 :: Char -> Word8;
decodeISO88591 :: [Word8] -> [Char];
     decodeISO88591 = fmap decodeCharISO88591;
encodeISO88591 :: [Char] -> [Word8];
     encodeISO88591 = fmap encodeCharISO88591;
Sorry, I thought you were just using ISO8859-1 as an example.
...
...
This is better: it doesn't force you to use lazy I/O, and when
specialised to the IO monad it might get decent performance.  The
problem is that in general I don't think you can assume the lack of
state.  For example: UTF-7 has a state which needs to be retained
between characters, and UTF-16 and UTF-32 have an endianness 
state which
can be changed by a special sequence at the beginning of the 
file.  Some
other encodings have states too.
But it is possible to do this in Haskell...
The rule for the many functions in the standard libraries seems to be 
"implement as much in Haskell as possible". Why is it any 
different with  the file APIs?
I think we've lost track of the discussion here... I'll try to
summarise.

I think character encoding/decoding should be built-in to the I/O
system.  I also think there should be a low-level I/O interface that
doesn't do any encoding, and high-level interfaces to the various
encodings.

Now, you can by all means specify the high-level I/O in terms of the
low-level I/O + encodings, but I strongly suspect that implementing it
that way will be expensive.  Character I/O in Haskell is *already* very
slow (see Doug Bagely's language shootout for evidence), and I don't
want to add another factor of 2 or more to that.  The point is that by
building encoding into the I/O interface the implementor gets the
opportunity to optimise.

Cheers,
	Simon

Simon Marlow

tags

participants (1)