RE: [Haskell-i18n] Unicode in source

Apparently, this isn't quite supported by GHC:
Prelude> map Char.ord "\74\749\7490" [74,237,66]
which is, of course, the values modulo 256.
I think you've found a bug. It works with a single character: Prelude> Char.ord '\xffff' 65535 but not with a string. Thanks for the report :)
Anyway, if the report is corrected to not limit us to 16 bits, this at least gives us enough mechanism to use Unicode in string and character constants.
What about using it in identifiers? I suggest the following formats:
#hhhh and ##hhhhhhhh
for Unicode characters, with the first form being applicable to code points below 64K, and the second to all of Unicode.
There are several problems with using this kind of encoding in source files, as pointed out by Sven Moritz Hallberg (indentation, syntax ambiguities, etc.), so I'd prefer to stick to standard encodings such as UTF-8 for source files. At least that way you'll be able to get an editor that will display the file as it is indented to be. (aside: aren't there problems with Unicode not being a fixed-width character set? Some characters are expected to combine with others to form a glyph, there are multiple versions of some characters with different widths, there are several widths of space, etc.) Cheers, Simon

"Simon Marlow"
for Unicode characters, with the first form being applicable to code points below 64K, and the second to all of Unicode.
There are several problems with using this kind of encoding in source files, as pointed out by Sven Moritz Hallberg (indentation, syntax ambiguities, etc.), so I'd prefer to stick to standard encodings such as UTF-8 for source files.
So, in essence, we remove the \uHHHH paragraph from 2.1 in the report? I'm not sure it wouldn't be nice to have a way to specify Unicode characters in identifiers, but if you propose to postpone it until and if it becomes a problem, I have no problems with that. Note that editors will probably display unknown characters as \NNNN or similar escape codes, this will break (visible) layout anyway.
(aside: aren't there problems with Unicode not being a fixed-width character set? Some characters are expected to combine with others to form a glyph, there are multiple versions of some characters with different widths, there are several widths of space, etc.)
I'm not familiar with all the nooks and crannies of Unicode, but I would have thought that the width of characters is a feature of the *font*, not the character set. So in a fixed-width font, each character should have the same width, also things like "ff"-ligature, "'n" and so on. Without a fixed-width font, layout becomes a bit meaningless. IIUC, for combining characters, where the code-point doesn't represent a printable glyph but a modification of the preceeding one, this will probably make a mess. Perhaps combining characters should be disallowed? I still maintain that if you use layout, "do", "of", "where", etc, should be followed by a line break. This would, I think, solve most layout problems, and even Ashley might be tempted to let go of braces and semicolons. :-) -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (2)
-
ketil@ii.uib.no
-
Simon Marlow