
20 Aug
2002
20 Aug
'02
12:06 a.m.
Hi, I just implemented a UTF-8 coder and decoder in Haskell. While reading the Unicode standard I realized what someone had pointed out earlier with respect to code values versus code points: Unicode, while "usually" using 16-bit words, supports "surrogate pairs" to handle all 31 bits of UCS-4. The report says, Char is a 16-bit Unicode value. What's the stance on surrogate pairs? How are we going to support those? My code currently just errors "unsupported" when encountering a surrogate. Regards, Sven Moritz