
Sven Moritz Hallberg wrote:
(aside: aren't there problems with Unicode not being a fixed-width character set? Some characters are expected to combine with others to form a glyph, there are multiple versions of some characters with different widths, there are several widths of space, etc.)
I think (...) these issues should not pose a problem.
variable-width characters: Unicode specifically doesn't say anything about the glyph representation of the characters. So it is reasonable to assume there will be fixed-width unicode character sets. Remember that even our latin alphabet has characters of different width (i vs. w) which we just somehow manage to fit into glyphs of the same width. If one's editor would really use a variable-width font he'll already have the problem with ASCII.
For fonts which aren't restricted to Western alphabets, there are two common interpretations of "fixed width". One interpretation is that all glyphs are exactly the same width, so even "narrow" characters ("l", "i", "1") are as wide as the widest CJK characters. Many users will dislike such fonts; apart from looking rather odd, they also waste screen space. The other interpretation is that all glyphs have widths which are an integral number of "columns". Western (latin, cyrillic, Greek) characters are a single column wide, while CJK characters are typically two columns wide. The (Unix98) wcwidth() function can be used to obtain the width (in columns) of a given wide character (wchar_t) in the current locale.
composition characters: I think we should interpret each character in the source as exactly one and leave any possible composition to the level of editing tools. The way I imagine the use of these composition characters is, for instance, as keyboard input to an editor which then composes them into a single char before writing anything to a file. I'd say this issue belongs to the domain of text processing.
Character I/O functions should probably ignore composition, i.e.
LATIN_SMALL_LETTER_A + COMBINING_ACUTE_ACCENT should appear as two
separate characters to the application.
However, layout will only "work" if the compiler (or is it a
preprocessor?) uses the same algorithm as the editor. If the editor
shows a composition sequence as a single character cell, it needs to
be treated as a single column for the purposes of layout.
--
Glynn Clements