
[Crossposted to Haskell and Libraries. Replies to Libraries.] {- Good things about this text library design: * Efficient implementation should be straightforward * Character coder interface is public, so users can supply their own encodings, or write coder transformers (there are some in the proposal) Bad things: * There's no way to implement fgetpos/fsetpos type functionality, because coders don't expose their internal state. (In fact, there would need to be a way to explicitly copy the state, since it may well include IORefs, Ptrs, etc.) Is this a serious problem? -} module System.TextIOFirstDraft (...) where -- A BlockRecoder takes source and destination buffers and does some sort -- of translation between them. It returns the number of values (not -- bytes!) consumed and the number of values produced. It does not have to -- empty the input buffer or fill the output buffer on each call, but it -- must do something (i.e. it's not acceptable to return (0,0)). Coders -- will in general have internal state which is updated on each call. type BlockRecoder from to = Ptr from -> BlockLength -> Ptr to -> BlockLength -> IO (BlockLength,BlockLength) type TextEncoder = BlockRecoder Word32 Octet type TextDecoder = BlockRecoder Octet Word32 -- IO TextEncoder and IO TextDecoder below denote "coder factories" which -- produce a new coder in its initial state each time they're called. compatibilityEncoder :: IO TextEncoder -- "mod 256" currentLocaleEncoder :: IO TextEncoder iso88591Encoder :: IO TextEncoder latin1Encoder = iso88591Encoder utf8Encoder, utf16BEEncoder, utf16LEEncoder, utf32BEEncoder, utf32LEEncoder :: IO TextEncoder compatibilityDecoder :: IO TextDecoder currentLocaleDecoder :: IO TextDecoder iso88591Decoder :: IO TextDecoder latin1Decoder = iso88591Decoder utf8Decoder, utf16BEDecoder, utf16LEDecoder, utf32BEDecoder, utf32LEDecoder :: IO TextDecoder -- An attempt at supporting setlocale-type locale strings. lookupEncoder :: String -> Maybe (IO TextEncoder) lookupDecoder :: String -> Maybe (IO TextDecoder) -- prependBOM takes an existing UTF encoder and causes it to prepend -- a BOM (byte-order mark) to its output. autodetectUTF takes an existing -- decoder and modifies it to check for a BOM, switching to the -- appropriate type of UTF decoding if one is found. prependBOM :: IO TextEncoder -> IO TextEncoder autodetectUTF :: IO TextDecoder -> IO TextDecoder -- Attaches a TextInputChannel to the supplied InputChannel. After -- this operation the InputChannel should be considered owned by the -- TextInputChannel; any attempt to use it directly will cause -- unpredictable results. This takes a decoder factory rather than a -- decoder to prevent the error of attaching the same decoder to more than -- one channel. icAttachTextDecoder :: InputChannel -> IO TextDecoder -> IO TextInputChannel ticGetChar :: TextInputChannel -> IO Char ticGetLine :: TextInputChannel -> IO String ticLazyGetContents :: TextInputChannel -> IO String ocAttachTextEncoder :: OutputChannel -> IO TextEncoder -> IO TextOutputChannel tocPutChar :: TextOutputChannel -> Char -> IO () tocPutStr :: TextOutputChannel -> String -> IO () tocPutStrLn :: TextOutputChannel -> String -> IO () -- ... etc ... -- Ben