
On Thu, 2002-08-29 at 10:22, Martin Norbäck wrote:
tor 2002-08-29 klockan 04.20 skrev Ashley Yakeley:
A SourceForge project for the internationalisation effort is active at http://sourceforge.net/projects/haskell-i18n/
I've added my Unicode character properties code. Check it out (cvs co).
Nice. Who will supply good UTF-8 code? I have some at http://www.dtek.chalmers.se/~d95mback/gettext/ but it is not in good shape.
With the ICFP contest finally over, I have just committed mine to CVS (thanks for setting it up Ashley!). I hope it is of reasonable quality, I've not performance-tested it. I'm looking forward to all feed-back.
Where should a UTF-8 module be put? Text.UTF8?
In accordance with Simon's hierarchy page, I've put it into Text.Encoding.UTF8.
something like (just drafting here):
Text.UTF8.encodeChar :: Char -> [Word8] -- (or Array?) Text.UTF8.encodeString :: String -> [Word8] -- (or Array?) Text.UTF8.decodeChar :: [Word8] -> Either (Char, [Word8]) Error Text.UTF8.decodeString :: [Word8] -> (String, [Word8], [Error])
Pretty much! I have: encodeOne :: Char -> [Word8] -- encodeChar is probably prettier encode :: String -> [Word8] -- encodeString? I don't care. decodeOne :: [Word8] -> (Either Error Char, Int, [Word8]) -- 2nd. component: number of bytes consumed, -- 3rd. component: rest of bytes decode :: [Word8] -> (String, [(Error,Int)]) -- 2nd. component: list of errors and their index in the byte stream -- Maybe we should reverse the order of error/index -- so it looks like any association list? Comments welcome. Regards, Sven Moritz