
On 15 May 2008, at 8:33 pm, Yitzchak Gale wrote:
The point is that it is always best to keep language syntax as simple as possible, for many reasons. In the case of Unicode, that means staying as close as possible to the spirit of Unicode and minimizing our own ad hoc rules.
In particular, Unicode has explicit guidance about what an identifier should be, in UAX#31: http://www.unicode.org/reports/tr31/tr31-9.html I've only recently started slogging my way through the capital-city-telephone-book-size Unicode 5.0 book. (I was tolerably current to 4.0) Imagine my stress levels on discovering that Unicode 5.1 is already out, with another "1,624 newly encoded characters", including a capital letter version of "ß". It is deeply ironic that one of the things that keeps changing is the stability policy. Another of the things that has changed is UAX#31.
Adding one more keyword is way simpler than adding a bunch of complex rules to the lexer.
Um, there's no way a Haskell lexer is going to comply with the Unicode rules without a fair bit of complexity. The basic idea is simply <id start><id continue>*, but there are rules about when ZWJ and ZWNJ are allowed. The real issue here is Unicode compliance, and the Unicode rules say that a mixture of scripts is OK. Er, it's not actually that simple. They do recommend that the scripts in table 4 _not_ be allowed in identifiers, so if you fancied writing some of your identifiers in Shavian, you may or may not be out of luck. (Just why a Coptic priest who is also a coder should be discouraged from using the Coptic script in his programs escapes me.)
A lot less moving parts to break. Especially if those lexer rules are not so consistent with built-in Unicode concepts such as letter and symbol, glyph direction, etc.
UAX#31 definitely allows identifiers with any mixture of left to right and right to left characters. The *intent* is that anything even remotely reasonable should be accepted, and should keep on being accepted, but of course the devil is in the details.
So I think the best and simplest idea is to make the letter lambda a keyword.
The lambda that people actually *want* in Haskell is in fact the >mathematical< small letter lambda, not the Greek letter. UAX#31 explicitly envisages "mathematically oriented programming languages that make distinctive use of the Mathematical Alphanumeric Symbols". I don't think there can be much argument about this being the right way to encode the symbol used in typeset versions of Haskell. There are three arguments against using it routinely: (a) It is outside the 16-bit range that Java is happy with, making it hard to write Haskell tools in Java. But then, about 40% of the characters in Unicode are now outside the 16-bit range that Java is comfortable with, which is just too bad for Java. Haskell tools should be written in Haskell, and should cope with 20-bit characters. (I used to say 21- bit, but Unicode 5 promises never to go beyond 16 planes.) (b) It is outside the range of characters currently available in fonts. A character you cannot type or see isn't much use. Implementations *will* catch up, but what do we do now? (c) People *can* type a Greek small letter now, and will not be interested in making fine distinctions between characters that look pretty much the same. So people will *expect* the Greek letter to work, even if a pedant like me says it's the wrong character. Of course, we could always take an upside down lambda and put some bars through it and use ¥ for lambda. (Pop quiz: why would some people not be surprised to see this instead of \ ?) [It's a joke.] All of this seems to leave Greek small letter lambda as a keyword as being the simplest solution, but it's easy to predict that it will cause confusion.
True, you need a space after it then. You already need spaces between the variables after the lambda, so anyway you might say that would be more consistent.
Who says there is more than one variable? \(x,y,z)-> doesn't have any spaces. \x -> \y -> \z -> needs spaces, but that's because ->\ is a single token, not because of the identifiers. -- "I don't want to discuss evidence." -- Richard Dawkins, in an interview with Rupert Sheldrake. (Fortean times 232, p55.)