Re: [Haskell-i18n] Unicode in source

21 Aug 2002

      "Simon Marlow"  writes:
...
...
for Unicode characters, with the first form being applicable to code
points below 64K, and the second to all of Unicode.
...
There are several problems with using this kind of encoding in source
files, as pointed out by Sven Moritz Hallberg (indentation, syntax
ambiguities, etc.), so I'd prefer to stick to standard encodings such as
UTF-8 for source files.
So, in essence, we remove the \uHHHH paragraph from 2.1 in the report?

I'm not sure it wouldn't be nice to have a way to specify Unicode
characters in identifiers, but if you propose to postpone it until and
if it becomes a problem, I have no problems with that.

Note that editors will probably display unknown characters as \NNNN or
similar escape codes, this will break (visible) layout anyway.
...
(aside: aren't there problems with Unicode not being a fixed-width
character set?  Some characters are expected to combine with others to
form a glyph, there are multiple versions of some characters with
different widths, there are several widths of space, etc.)
I'm not familiar with all the nooks and crannies of Unicode, but I
would have thought that the width of characters is a feature of the
*font*, not the character set.  So in a fixed-width font, each
character should have the same width, also things like "ff"-ligature,
"'n" and so on.  Without a fixed-width font, layout becomes a bit
meaningless. 

IIUC, for combining characters, where the code-point doesn't represent
a printable glyph but a modification of the preceeding one, this will
probably make a mess.  Perhaps combining characters should be
disallowed?

I still maintain that if you use layout, "do", "of", "where", etc,
should be followed by a line break.  This would, I think, solve most
layout problems, and even Ashley might be tempted to let go of braces
and semicolons. :-)

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

Re: [Haskell-i18n] Unicode in source

ketil＠ii.uib.no