New subject: UTF-8 encode/decode libraries.

26 Apr 2004

      Duncan Coutts wrote:
...
On Mon, 2004-04-26 at 18:49, David Brown wrote: [...]
toUTF :: String -> String
Hmmm, "String -> [Word8]" would be nicer...
...
fromUTF :: String -> String
... and here: "[Word8] -> String" or "[Word8] -> Maybe String".
Furthermore, UTF-8 is not restricted to a maximum of 3 bytes per character,
here an excerpt from "man utf8" on my SuSE Linux:

        * UTF-8  encoded  UCS  characters  may  be up to six bytes
          long, however the Unicode standard specifies no  characters
          above  0x10ffff, so Unicode characters can only be up to
          four bytes long in UTF-8.

IIRC we discussed encoders/decoders quite some time ago on the libraries
mailing list, but nothing really happened, which is a pity. We should
strive for something more general than UTF-8 <-> UCS/Unicode, there are
quite a few more widely used encodings, e.g. GSM 03.38, etc. Any takers?

Cheers,
    S.

Re: UTF-8 encode/decode libraries.

Sven Panne

David Brown

tags

participants (2)