
On 2006-10-24 at 12:43PDT Ashley Yakeley wrote:
Ketil Malde wrote:
Tempting to use B8 Cedilla, since it looks somewhat like a comma, and is less useful for other purposes -- but perhaps it would be to easily confused with a real comma?
I have some dim recollection that there is an ISO (or possibly some other standards body) standard that says that rather than commas or points, we should use narrow spaces between groups of digits in numbers. I can't find it now, though -- can anyone? If true, this would suggest the use of one of the SPACE unicodes 2006, 2009, 200a ... but this would of course be a bad idea in a language that uses space for application. Underline is much better.
I would advise against this until we have a bit more of a plan for extended characters in Haskell source. [...]
I think the original proposal -- of allowing underlines in lieu of spaces in numbers -- is far better than using an operator. This is a piece of light-weight convenience syntax at a purely lexical level, and is exactly the sort of thing that is easy to do in a language definition/compiler but thorny if done post-hoc. If an operator, what happens to hexadecimal numbers? 0xffff_3729 makes perfect sense as hex and the "_" does a nice job of separating the digits into readable groups. 0xffff~~3729 looks similar, but doesn't mean the same thing at all. 0xffff~~0x3729 is ugly and probably less readable than the unbroken form. There's also the (perhaps unlikely, but truly grotesque) possibility of wanting a number like 0x3864_face, entering 0x3864~~face and having face = 42 elsewhere in the code. Or, decimal 124~~l24 -- if you are lucky you'll get an undefined variable message, which would be the same as for 124l24, but if unlucky, you'll get no error message instead of "No instance for (Num (Integer -> a))" Furthermore, there's no way for an operator to distinguish between three and some other number of digits (at compile time!), leading to such misleading looking presentations as 22~~40~~65. No. A small alteration to the lexical syntax for the sake of improved readability seems perfectly justifiable as long as it doesn't make the lexical syntax /significantly/ more complicated or harder to learn. So in the simplest form, we would have decimal -> digit{[_]digit} octal -> octit{[_]octit} hexadecimal -> hexit{[_]hexit} integer -> decimal | 0o octal | 0O octal | 0x hexadecimal | 0X hexadecimal float -> decimal . decimal [exponent] | decimal exponent exponent -> (e | E) [+ | -] decimal although my preference would be something a bit more restrictive, requiring numbers to have groups of the same number of digits after each “_” and beginning with a shorter group (ie 12_000_000 and 1200_0000 would be valid but 1247_000 would not). I'm not wedded to this requirement (and it would take a more sophisticated grammar to formalise). I have another dim recollection that something like this was discussed (verbally) at one of the early Haskell meetings, but no idea what became of it. Does anyone remember? Jón -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk