RE: Text in Haskell: a second proposal

[ moving to haskell-i18n@haskell.org ]
For ISO-8859-1 each Char is exactly one Word8, so surely it would work fine with partial reads?
decodeCharISO88591 :: Word8 -> Char;
encodeCharISO88591 :: Char -> Word8;
decodeISO88591 :: [Word8] -> [Char]; decodeISO88591 = fmap decodeCharISO88591;
encodeISO88591 :: [Char] -> [Word8]; encodeISO88591 = fmap encodeCharISO88591;
Sorry, I thought you were just using ISO8859-1 as an example.
This is better: it doesn't force you to use lazy I/O, and when specialised to the IO monad it might get decent performance. The problem is that in general I don't think you can assume the lack of state. For example: UTF-7 has a state which needs to be retained between characters, and UTF-16 and UTF-32 have an endianness state which can be changed by a special sequence at the beginning of the file. Some other encodings have states too.
But it is possible to do this in Haskell...
The rule for the many functions in the standard libraries seems to be "implement as much in Haskell as possible". Why is it any different with the file APIs?
I think we've lost track of the discussion here... I'll try to summarise. I think character encoding/decoding should be built-in to the I/O system. I also think there should be a low-level I/O interface that doesn't do any encoding, and high-level interfaces to the various encodings. Now, you can by all means specify the high-level I/O in terms of the low-level I/O + encodings, but I strongly suspect that implementing it that way will be expensive. Character I/O in Haskell is *already* very slow (see Doug Bagely's language shootout for evidence), and I don't want to add another factor of 2 or more to that. The point is that by building encoding into the I/O interface the implementor gets the opportunity to optimise. Cheers, Simon
participants (1)
-
Simon Marlow