New subject: Unicode in source

21 Aug 2002

      On Wed, 2002-08-21 at 12:02, Simon Marlow wrote:
...
...
Apparently, this isn't quite supported by GHC:
Prelude> map Char.ord "\74\749\7490"
        [74,237,66]
which is, of course, the values modulo 256.
I think you've found a bug. [...]
Oh, oops. :)
...
(aside: aren't there problems with Unicode not being a fixed-width
character set?  Some characters are expected to combine with others to
form a glyph, there are multiple versions of some characters with
different widths, there are several widths of space, etc.)
I think (...) these issues should not pose a problem.

variable-width characters:
Unicode specifically doesn't say anything about the glyph representation
of the characters. So it is reasonable to assume there will be
fixed-width unicode character sets. Remember that even our latin
alphabet has characters of different width (i vs. w) which we just
somehow manage to fit into glyphs of the same width. If one's editor
would really use a variable-width font he'll already have the problem
with ASCII.

composition characters:
I think we should interpret each character in the source as exactly one
and leave any possible composition to the level of editing tools. The
way I imagine the use of these composition characters is, for instance,
as keyboard input to an editor which then composes them into a single
char before writing anything to a file. I'd say this issue belongs to
the domain of text processing.

Regards,
Sven Moritz

RE: [Haskell-i18n] Unicode in source

Sven Moritz Hallberg

Glynn Clements

Sven Moritz Hallberg

Alastair Reid

Glynn Clements

Glynn Clements

Simon Marlow

ketil＠ii.uib.no

Simon Marlow

tags

participants (5)