Re: [Haskell-cafe] Re: Richer (than ascii) notation for haskell source?

15 May 2008

      On 15 May 2008, at 8:33 pm, Yitzchak Gale wrote:
...
The point is that it is always best to keep language syntax
as simple as possible, for many reasons. In the case of Unicode,
that means staying as close as possible to the spirit of Unicode and
minimizing our own ad hoc rules.
In particular, Unicode has explicit guidance about what an
identifier should be, in UAX#31:
http://www.unicode.org/reports/tr31/tr31-9.html

I've only recently started slogging my way through the
capital-city-telephone-book-size Unicode 5.0 book.  (I was
tolerably current to 4.0)  Imagine my stress levels on
discovering that Unicode 5.1 is already out, with another
"1,624 newly encoded characters", including a capital letter
version of "ß".  It is deeply ironic that one of the things
that keeps changing is the stability policy.  Another of the
things that has changed is UAX#31.
...
Adding one more
keyword is way simpler than adding a bunch of complex
rules to the lexer.
Um, there's no way a Haskell lexer is going to comply with
the Unicode rules without a fair bit of complexity.  The
basic idea is simply <id start><id continue>*, but there
are rules about when ZWJ and ZWNJ are allowed.  The real
issue here is Unicode compliance, and the Unicode rules say
that a mixture of scripts is OK.  Er, it's not actually
that simple.  They do recommend that the scripts in table 4
_not_ be allowed in identifiers, so if you fancied writing
some of your identifiers in Shavian, you may or may not be
out of luck.  (Just why a Coptic priest who is also a
coder should be discouraged from using the Coptic script in
his programs escapes me.)
...
A lot less moving parts to break.
Especially if those lexer rules are not so consistent with
built-in Unicode concepts such as letter and symbol, glyph
direction, etc.
UAX#31 definitely allows identifiers with any mixture of
left to right and right to left characters.  The *intent* is
that anything even remotely reasonable should be accepted,
and should keep on being accepted, but of course the devil
is in the details.
...

...
So I think the best and simplest idea is to make
the letter lambda a keyword.
The lambda that people actually *want* in Haskell is in fact
the >mathematical< small letter lambda, not the Greek letter.
UAX#31 explicitly envisages "mathematically oriented programming
languages that make distinctive use of the Mathematical Alphanumeric
Symbols".  I don't think there can be much argument about this
being the right way to encode the symbol used in typeset versions
of Haskell.  There are three arguments against using it routinely:
  (a) It is outside the 16-bit range that Java is happy with,
      making it hard to write Haskell tools in Java.  But then,
      about 40% of the characters in Unicode are now outside the
      16-bit range that Java is comfortable with, which is just too
      bad for Java.  Haskell tools should be written in Haskell,
      and should cope with 20-bit characters.  (I used to say 21-
      bit, but Unicode 5 promises never to go beyond 16 planes.)
  (b) It is outside the range of characters currently available in
      fonts.  A character you cannot type or see isn't much use.
      Implementations *will* catch up, but what do we do now?
  (c) People *can* type a Greek small letter now, and will not be
      interested in making fine distinctions between characters that
      look pretty much the same.  So people will *expect* the Greek
      letter to work, even if a pedant like me says it's the wrong
      character.

Of course, we could always take an upside down lambda and put some
bars through it and use ¥ for lambda.  (Pop quiz: why would some
people not be surprised to see this instead of \ ?)  [It's a joke.]

All of this seems to leave Greek small letter lambda as a keyword
as being the simplest solution, but it's easy to predict that it
will cause confusion.
...
True, you need a space after it
then. You already need spaces between the variables after the
lambda, so anyway you might say that would be more consistent.
Who says there is more than one variable?
\(x,y,z)-> doesn't have any spaces.
\x -> \y -> \z -> needs spaces, but that's because
->\ is a single token, not because of the identifiers.

--
"I don't want to discuss evidence." -- Richard Dawkins, in an
interview with Rupert Sheldrake.  (Fortean times 232, p55.)

Re: [Haskell-cafe] Re: Richer (than ascii) notation for haskell source?

Richard A. O'Keefe