RE: Text in Haskell: a second proposal

At 2002-08-13 04:13, Simon Marlow wrote:
That depends what you mean by efficient: these functions represent an extra layer of intermediate list between the handle buffer and the final [Char], and furthermore they don't work with partial reads - the input has to be a lazy stream gotten from hGetContents.
For ISO-8859-1 each Char is exactly one Word8, so surely it would work fine with partial reads? decodeCharISO88591 :: Word8 -> Char; encodeCharISO88591 :: Char -> Word8; decodeISO88591 :: [Word8] -> [Char]; decodeISO88591 = fmap decodeCharISO88591; encodeISO88591 :: [Char] -> [Word8]; encodeISO88591 = fmap encodeCharISO88591;
A monadic stream-transformer:
decodeStreamUTF8 :: (Monad m) => m Word8 -> m Char;
hGetChar h = decodeStreamUTF8 (hGetWord8 h);
This works provided each Char corresponds to a contiguous block of Word8s, with no state between them. I think that includes all the standard character encoding schemes.
This is better: it doesn't force you to use lazy I/O, and when specialised to the IO monad it might get decent performance. The problem is that in general I don't think you can assume the lack of state. For example: UTF-7 has a state which needs to be retained between characters, and UTF-16 and UTF-32 have an endianness state which can be changed by a special sequence at the beginning of the file. Some other encodings have states too.
But it is possible to do this in Haskell... The rule for the many functions in the standard libraries seems to be "implement as much in Haskell as possible". Why is it any different with the file APIs? -- Ashley Yakeley, Seattle WA
participants (1)
-
Ashley Yakeley