New subject: Unicode in source

21 Aug 2002

      ...
Apparently, this isn't quite supported by GHC:
Prelude> map Char.ord "\74\749\7490"
        [74,237,66]
which is, of course, the values modulo 256.
I think you've found a bug.  It works with a single character:

  Prelude> Char.ord '\xffff'
  65535

but not with a string.  Thanks for the report :)
...
Anyway, if the report is corrected to not limit us to 16 bits, this at
least gives us enough mechanism to use Unicode in string and
character constants.
What about using it in identifiers?  I suggest the following formats:
#hhhh
and     ##hhhhhhhh
for Unicode characters, with the first form being applicable to code
points below 64K, and the second to all of Unicode.
There are several problems with using this kind of encoding in source
files, as pointed out by Sven Moritz Hallberg (indentation, syntax
ambiguities, etc.), so I'd prefer to stick to standard encodings such as
UTF-8 for source files.  At least that way you'll be able to get an
editor that will display the file as it is indented to be.

(aside: aren't there problems with Unicode not being a fixed-width
character set?  Some characters are expected to combine with others to
form a glyph, there are multiple versions of some characters with
different widths, there are several widths of space, etc.)

Cheers,
	Simon

RE: [Haskell-i18n] Unicode in source

Simon Marlow

ketil＠ii.uib.no

tags

participants (2)