
On 2006-10-25 at 20:57-0000 Aaron Denney wrote:
On 2006-10-25, Jon Fairbairn
wrote: No. A small alteration to the lexical syntax for the sake of improved readability seems perfectly justifiable as long as it doesn't make the lexical syntax /significantly/ more complicated or harder to learn.
Sure. But some of us don't find it terribly readable.
I'm not sure what you are saying here. Assessing readability by introspection is terribly unreliable. Unfamiliarity with the presentation of numbers with underlines is likely to make them feel a bit awkward to begin with, but habituation is likely to change that. We do know from venerable experiments that humans can easily identify small groups of things without counting. Most people can recognise three easily, few people can recognise eight. So it's no surprise that the standard presentation of numbers groups digits in threes. If you were to conduct an experiment on yourself that presented you with numbers displayed in all three forms (ungrouped, thin spaced and with underlines) and timed how long it took you to read them out, I'd be surprised if the underline grouped form (even while still unfamiliar) didn't beat the ungrouped form. Quickly now, is 20000000000 tens of millions, tens or hundreds orthousands of millions? Now try the same for 2_000_000_000 or 20_000_000_000.
I think the ~~ operator hack gets 90% of the "benefit" for those who want it.
I thought my earlier message adequately demonstrated that it does /not/. Another case: if you change “square 123479010987” to “square 123_479_010_987” to improve readability it still means the same thing. If you change it to “square 123~~479~~010~~987” it doesn't.
although my preference would be something a bit more restrictive, requiring numbers to have groups of the same number of digits after each “_” and beginning with a shorter group (ie 12_000_000 and 1200_0000 would be valid but 1247_000 would not). I'm not wedded to this requirement (and it would take a more sophisticated grammar to formalise).
The only reason to put it in the lexer/parser is to avoid misleading cases,
yes
which needs thas additional restriction, or something similar, like always 3 for decimal, 4 for hex, 3 for oct, or whatever.
No. I certainly would prefer a requirement that the groups be the same length, but the intention is that the value would be got simply by stripping out the underlines. So while 19_00 would be an idiosyncratic way of writing 1_900 (intended to be read nineteen hundred, one would presume), it wouldn't be misleading in the way that 19~~00 (which would evaluate to 19_000) would be. -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk