
In article
[Crossposted to Haskell and Libraries. Replies to Libraries.]
There's a Haskell Internationalistion mailing list too. Also check out the project on SF: http://sourceforge.net/projects/haskell-i18n/ There's a bunch of my code for Unicode properties, plus a couple of UTF8 implementations.
module System.TextIOFirstDraft (...) where
could be put in Text.* hierarchy
type BlockRecoder from to = Ptr from -> BlockLength -> Ptr to -> BlockLength -> IO (BlockLength,BlockLength)
UArray and MArray would be slightly cleaner if you're doing the IO thing. But actually my biggest problem is that this is in the IO monad. Given your code, I should be able to write these without resorting to unsafePerformIO: encodeUTF8 :: String -> [Word8] decodeUTF8 :: [Word8] -> Maybe String -- Nothing if not valid Actually, if one makes certain assumptions about encodings, you could get away with something like this: type Encoder base t = t -> [base] type Decoder base t = forall m. (Monad m) => m base -> m t Is this any less efficient? Probably not if you're writing your BlockRecoders in Haskell.
type TextEncoder = BlockRecoder Word32 Octet type TextDecoder = BlockRecoder Octet Word32
On GHC, Char has exactly the range 0 to 0x10FFFF, as per Unicode codepoints. If this becomes standardised as part of an internationalisation effort, you might want to use Char rather than Word32. -- Ashley Yakeley, Seattle WA