RE: Text in Haskell: a second proposal

At 2002-08-09 03:26, Simon Marlow wrote:
Why combine I/O and {en,de}coding? Firstly, efficiency.
Hmm... surely the encoding functions can be defined efficiently?
decodeISO88591 :: [Word8] -> [Char]; encodeISO88591 :: [Char] -> [Word8]; -- uses low octet of codepoint
You could surely define them as native functions very efficiently, if necessary.
That depends what you mean by efficient: these functions represent an extra layer of intermediate list between the handle buffer and the final [Char], and furthermore they don't work with partial reads - the input has to be a lazy stream gotten from hGetContents. I don't want to be forced to use lazy I/O.
A monadic stream-transformer:
decodeStreamUTF8 :: (Monad m) => m Word8 -> m Char;
hGetChar h = decodeStreamUTF8 (hGetWord8 h);
This works provided each Char corresponds to a contiguous block of Word8s, with no state between them. I think that includes all the standard character encoding schemes.
This is better: it doesn't force you to use lazy I/O, and when specialised to the IO monad it might get decent performance. The problem is that in general I don't think you can assume the lack of state. For example: UTF-7 has a state which needs to be retained between characters, and UTF-16 and UTF-32 have an endianness state which can be changed by a special sequence at the beginning of the file. Some other encodings have states too. Cheers, Simon
participants (1)
-
Simon Marlow