Re: UTF-8 encode/decode libraries.

Sven Panne wrote:
Hmmm, "String -> [Word8]" would be nicer...
My UTF8 encoder is toUTF8 :: String -> String but an obvious alternative would be toUTF8 :: Enum codedChar => String -> [codedChar] and I could implement this quite easily, by globally-exchanging chr with toEnum. It would then be appropriate to SPECIALIZE to types String -> String and String -> [Word8], satisfying both the purists and those who actually want to write the output to a file.
... and here: "[Word8] -> String" or "[Word8] -> Maybe String and my UTF8 decoder has type
fromUTF8WE :: Monad m => String -> m String Errors are reported by "fail". If for example you import Control.Monad.Error that means you have a function returning either an error message or the converted string fromUTF8WE :: String -> Either String String Of course for Word8, you would change the type of the decoder to fromUTF8WE :: (Monad m,Enum codedChar) => [codedChar] -> m String Incidentally I am *hoping* I shall be able to say that my UTF8 code is LGPL but you know what University administrators are like ...

Oh, and I forgot to say that of course my UTF8 encoder/decoder handles all shift sequences, not just those with 3 or fewer bytes.

Am Dienstag, 4. Mai 2004 11:16 schrieb George Russell:
Sven Panne wrote:
Hmmm, "String -> [Word8]" would be nicer...
My UTF8 encoder is toUTF8 :: String -> String but an obvious alternative would be toUTF8 :: Enum codedChar => String -> [codedChar] and I could implement this quite easily, by globally-exchanging chr with toEnum. It would then be appropriate to SPECIALIZE to types String -> String and String -> [Word8], satisfying both the purists and those who actually want to write the output to a file.
Writing UTF-8 to a file should be done using binary output anyway, since UTF-8 is a sequence of octets. So Word8 would also be the way to go for the "file writers".
... and here: "[Word8] -> String" or "[Word8] -> Maybe String
and my UTF8 decoder has type
fromUTF8WE :: Monad m => String -> m String
Errors are reported by "fail". If for example you import Control.Monad.Error that means you have a function returning either an error message or the converted string
fromUTF8WE :: String -> Either String String
I like this "error handling via monads" and use it myself a lot.
Of course for Word8, you would change the type of the decoder to
fromUTF8WE :: (Monad m,Enum codedChar) => [codedChar] -> m String
Incidentally I am *hoping* I shall be able to say that my UTF8 code is LGPL but you know what University administrators are like ...
Wolfgang
participants (2)
-
George Russell
-
Wolfgang Jeltsch