RE: Text in Haskell: a second proposal

14 Aug 2002

      At 2002-08-13 04:13, Simon Marlow wrote:
...
That depends what you mean by efficient: these functions represent an
extra layer of intermediate list between the handle buffer and the final
[Char], and furthermore they don't work with partial reads - the input
has to be a lazy stream gotten from hGetContents.
For ISO-8859-1 each Char is exactly one Word8, so surely it would work 
fine with partial reads?

     decodeCharISO88591 :: Word8 -> Char;

     encodeCharISO88591 :: Char -> Word8;

     decodeISO88591 :: [Word8] -> [Char];
     decodeISO88591 = fmap decodeCharISO88591;

     encodeISO88591 :: [Char] -> [Word8];
     encodeISO88591 = fmap encodeCharISO88591;
...
...
A monadic stream-transformer:
decodeStreamUTF8 :: (Monad m) => m Word8 -> m Char;
hGetChar h = decodeStreamUTF8 (hGetWord8 h);
This works provided each Char corresponds to a contiguous block of 
Word8s, with no state between them. I think that includes all the 
standard character encoding schemes.
This is better: it doesn't force you to use lazy I/O, and when
specialised to the IO monad it might get decent performance.  The
problem is that in general I don't think you can assume the lack of
state.  For example: UTF-7 has a state which needs to be retained
between characters, and UTF-16 and UTF-32 have an endianness state which
can be changed by a special sequence at the beginning of the file.  Some
other encodings have states too.
But it is possible to do this in Haskell...

The rule for the many functions in the standard libraries seems to be 
"implement as much in Haskell as possible". Why is it any different with 
the file APIs?

-- 
Ashley Yakeley, Seattle WA

Ashley Yakeley

tags

participants (1)