surrogate code points in a Char

18 Nov 2009

      Hi.

The Unicode Standard (version 4.0, section 3.9, D31 - pag 76) says:

"""Because surrogate code points are not included in the set of Unicode
scalar values, UTF-32 code units in the range 0000D800 .. 0000DFFF are
ill-formed"""

However GHC does not reject this code units:

Prelude> print '\x0000D800'
'\55296'

Is this a correct behaviour?
Note that Python, too (2.5.4, UCS4 build, Linux Debian), accept these
code units.

Thanks  Manlio

Manlio Perillo

Edward Kmett

Mark Lentczner

Colin Adams

tags

participants (4)