RE: Text in Haskell: a second proposal

13 Aug 2002


      ...
At 2002-08-09 03:26, Simon Marlow wrote:
...
Why combine I/O and {en,de}coding?  Firstly, efficiency.
Hmm... surely the encoding functions can be defined efficiently?
decodeISO88591 :: [Word8] -> [Char];
    encodeISO88591 :: [Char] -> [Word8]; -- uses low octet of 
codepoint
You could surely define them as native functions very efficiently, if 
necessary.
That depends what you mean by efficient: these functions represent an
extra layer of intermediate list between the handle buffer and the final
[Char], and furthermore they don't work with partial reads - the input
has to be a lazy stream gotten from hGetContents.  I don't want to be
forced to use lazy I/O.
...
A monadic stream-transformer:
decodeStreamUTF8 :: (Monad m) => m Word8 -> m Char;
hGetChar h = decodeStreamUTF8 (hGetWord8 h);
This works provided each Char corresponds to a contiguous block of 
Word8s, with no state between them. I think that includes all the 
standard character encoding schemes.
This is better: it doesn't force you to use lazy I/O, and when
specialised to the IO monad it might get decent performance.  The
problem is that in general I don't think you can assume the lack of
state.  For example: UTF-7 has a state which needs to be retained
between characters, and UTF-16 and UTF-32 have an endianness state which
can be changed by a special sequence at the beginning of the file.  Some
other encodings have states too.

Cheers,
	Simon

Simon Marlow

tags

participants (1)