Re: To show or not to show french accents

The problem is that if you are reading single bytes, 233 is not necessarily é.
Erm, Internationalisation is not my thin as such... but I can't help commenting that from a systems point of view this is an utterly bad sitiation to be in... I though Haskell used unicode? I thought in unicode the id of a character was fixed irrespective of language. Where is unicode support lacking? Regards, Keean Schupke.

MR K P SCHUPKE wrote:
The problem is that if you are reading single bytes, 233 is not necessarily é.
Erm, Internationalisation is not my thin as such... but I can't help commenting that from a systems point of view this is an utterly bad sitiation to be in... I though Haskell used unicode? I thought in unicode the id of a character was fixed irrespective of language. Where is unicode support lacking?
Regards, Keean Schupke.
quoting from the latest version of Unicode standard: "The Unicode Standard specifies a numeric value (code point) and a name for each of its characters.[...] Unicode provides for three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF- 16), and an 8-bit form (UTF-8)." Hence in Unicode proper, characters are encoded as numbers (or actually "code points"), not bytes. The byte-oriented encoding variant is UTF-8. In UTF-8, however the byte "233" does not represent any character on its own, but can only occur as the first byte of a 3 byte sequence. OTOH, UTF-8 encodes characters in ASCII range in the same way as ASCII. Regards, Marcin Benke
participants (2)
-
Marcin Benke
-
MR K P SCHUPKE