
On Wed, 2002-08-21 at 12:02, Simon Marlow wrote:
Apparently, this isn't quite supported by GHC:
Prelude> map Char.ord "\74\749\7490" [74,237,66]
which is, of course, the values modulo 256.
I think you've found a bug. [...]
Oh, oops. :)
(aside: aren't there problems with Unicode not being a fixed-width character set? Some characters are expected to combine with others to form a glyph, there are multiple versions of some characters with different widths, there are several widths of space, etc.)
I think (...) these issues should not pose a problem. variable-width characters: Unicode specifically doesn't say anything about the glyph representation of the characters. So it is reasonable to assume there will be fixed-width unicode character sets. Remember that even our latin alphabet has characters of different width (i vs. w) which we just somehow manage to fit into glyphs of the same width. If one's editor would really use a variable-width font he'll already have the problem with ASCII. composition characters: I think we should interpret each character in the source as exactly one and leave any possible composition to the level of editing tools. The way I imagine the use of these composition characters is, for instance, as keyboard input to an editor which then composes them into a single char before writing anything to a file. I'd say this issue belongs to the domain of text processing. Regards, Sven Moritz