[Haskell-i18n] Unicode in source

21 Aug 2002

      I threw out some suggestions on how to encode non-ascii characters in
Haskell source code.  Did we conclude anything on this?

Looking at the report, we have the following:

| Escape characters for the Unicode character set, including control
| characters such as \^X, are also provided. Numeric escapes such as
| \137 are used to designate the character with decimal representation
| 137; octal (e.g. \o137) and hexadecimal (e.g. \x37) representations
| are also allowed. Numeric escapes that are out-of-range of the
| Unicode standard (16 bits) are an error.

Apparently, this isn't quite supported by GHC:

        Prelude> map Char.ord "\74\749\7490"
        [74,237,66]

which is, of course, the values modulo 256.

Anyway, if the report is corrected to not limit us to 16 bits, this at
least gives us enough mechanism to use Unicode in string and
character constants. 

What about using it in identifiers?  I suggest the following formats:

        #hhhh
and     ##hhhhhhhh

for Unicode characters, with the first form being applicable to code
points below 64K, and the second to all of Unicode.

(I still think using LaTeXy or HTMLish syntax as synonyms is a good
idea, as in

        ø
        {\alpha;}

if this could be conveniently incorporated in the compilers, but it's
probably not crucial.  It'd be nice if my .lhs'es would print the
right glyphs in the code, but I suppose this is better handled by
LaTeX.)

I'd prefer to tackle the layout issue by simply requiring the magic
words ('do', 'of', etc.) to always be followed by a line break, but I
suppose it *is* possible to have preprocessing software automatically
readjust indentation to keep the semantics.  In that case, I'd vote
for indentation to be a count of actual characters in the code, i.e.,
#hhhh contributes five to the indentation, but if translated to the
character it represents, it contributes one (barring Unicode
weirdness, of course).

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

[Haskell-i18n] Unicode in source

ketil＠ii.uib.no