
"Simon Marlow"
for Unicode characters, with the first form being applicable to code points below 64K, and the second to all of Unicode.
There are several problems with using this kind of encoding in source files, as pointed out by Sven Moritz Hallberg (indentation, syntax ambiguities, etc.), so I'd prefer to stick to standard encodings such as UTF-8 for source files.
So, in essence, we remove the \uHHHH paragraph from 2.1 in the report? I'm not sure it wouldn't be nice to have a way to specify Unicode characters in identifiers, but if you propose to postpone it until and if it becomes a problem, I have no problems with that. Note that editors will probably display unknown characters as \NNNN or similar escape codes, this will break (visible) layout anyway.
(aside: aren't there problems with Unicode not being a fixed-width character set? Some characters are expected to combine with others to form a glyph, there are multiple versions of some characters with different widths, there are several widths of space, etc.)
I'm not familiar with all the nooks and crannies of Unicode, but I would have thought that the width of characters is a feature of the *font*, not the character set. So in a fixed-width font, each character should have the same width, also things like "ff"-ligature, "'n" and so on. Without a fixed-width font, layout becomes a bit meaningless. IIUC, for combining characters, where the code-point doesn't represent a printable glyph but a modification of the preceeding one, this will probably make a mess. Perhaps combining characters should be disallowed? I still maintain that if you use layout, "do", "of", "where", etc, should be followed by a line break. This would, I think, solve most layout problems, and even Ashley might be tempted to let go of braces and semicolons. :-) -kzm -- If I haven't seen further, it is by standing in the footprints of giants