
I threw out some suggestions on how to encode non-ascii characters in Haskell source code. Did we conclude anything on this? Looking at the report, we have the following: | Escape characters for the Unicode character set, including control | characters such as \^X, are also provided. Numeric escapes such as | \137 are used to designate the character with decimal representation | 137; octal (e.g. \o137) and hexadecimal (e.g. \x37) representations | are also allowed. Numeric escapes that are out-of-range of the | Unicode standard (16 bits) are an error. Apparently, this isn't quite supported by GHC: Prelude> map Char.ord "\74\749\7490" [74,237,66] which is, of course, the values modulo 256. Anyway, if the report is corrected to not limit us to 16 bits, this at least gives us enough mechanism to use Unicode in string and character constants. What about using it in identifiers? I suggest the following formats: #hhhh and ##hhhhhhhh for Unicode characters, with the first form being applicable to code points below 64K, and the second to all of Unicode. (I still think using LaTeXy or HTMLish syntax as synonyms is a good idea, as in ø {\alpha;} if this could be conveniently incorporated in the compilers, but it's probably not crucial. It'd be nice if my .lhs'es would print the right glyphs in the code, but I suppose this is better handled by LaTeX.) I'd prefer to tackle the layout issue by simply requiring the magic words ('do', 'of', etc.) to always be followed by a line break, but I suppose it *is* possible to have preprocessing software automatically readjust indentation to keep the semantics. In that case, I'd vote for indentation to be a count of actual characters in the code, i.e., #hhhh contributes five to the indentation, but if translated to the character it represents, it contributes one (barring Unicode weirdness, of course). -kzm -- If I haven't seen further, it is by standing in the footprints of giants