New subject: UniCode

5 Oct 2001


      Fri, 5 Oct 2001 23:23:50 +1000, Andrew J Bromage  pisze:
...
There is a set of one million (more correctly, 1M) Unicode characters
which are only accessible using surrogate pairs (i.e. two UTF-16
codes).  There are currently none of these codes assigned,
This information is out of date. AFAIR about 40000 of them is assigned.
Most for Chinese (current, not historic).
...
So rare, in fact, that the cost of strings taking up twice the
space that the currently do simply isn't worth the cost.
In Haskell strings already have high overhead. In GHC a Char# value
(inside Char object) always takes the same size as the pointer
(32 or 64 bits), no matter how much of it is used.
...
It just goes to show that strings are not merely arrays of characters
like some languages would have you believe.
In Haskell String = [Char]. It's true that Char values don't
necessarily correspond to glyphs, but Strings are composed of Chars.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTÊPCZA
QRCZAK

Re: UniCode

Marcin 'Qrczak' Kowalczyk

Andrew J Bromage

tags

participants (2)