
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Simon Marlow wrote:
So does this suggest that under a negation-is-part-of-numeric-token regime, 123-456 should be two tokens (a positive number then a negative number, here), as is signum-456 ...
Yes, absolutely.
[see note 1 at the end responding irrelevantly to that] Okay, here we go with the through descriptions... Warn about any "-" that precedes without spaces a numeric literal, that is not an application of "negate" to that literal. This includes when it's infix (n-1) and when it's out-precedenced (-2^6). ===> A file that does not trigger this warning is safe to have negative numeric literals added to the syntax / lexer. [see Note 2 at the end about how commonly this warning might occur in practice] Warn about any "-" that DOES NOT precede-without-spaces a numeric literal, that nonetheless means negate. ===> A file that triggers neither this nor the previous warning is safe to have negative numeric literals added AND interpretation of unqualified operator "-" as negate removed. "Reverse" warnings, for those who want to take advantage of negative numeric literal syntax and then possibly convert to Haskell98 syntax easily: If a "-" isn't followed immediately by a numeric literal, the only thing to watch out for (and warn about) is the "forbidden section" (- 1), which could mean an actual section (\x -> x - 1) in the "new" syntax. For actual negative literals: warn when literal is the left-hand-side an infix expression with relevant precedence ((> 6, which changes program behaviour) or (= 6 and not left-associative, which causes a parse error)). (being on the right-hand side, e.g. (x ^^ -1) is completely unambiguous, and expressions like (-1 + 2) mean the same thing either way). Also, warn if the literal is part of a function application: either it would become infix in '98 syntax (e.g. (signum -2)) or just negate multiple things to the right (e.g. (-1 foo)) (some of these are type errors assuming (->) isn't made an instance of Num, but that's a later stage in the compilation process). Should we allow "positive numeric literals" +37 as well, for symmetry, so we can also break (n+1) as well as (n-1)? (and also break (+1), which is actually an asymmetric problem since that isn't a section in the first place in Haskell98) Implementation notes: I haven't looked at the part of GHC's code where it deals with fixity resolution yet, but I'm concerned that GHC might throw away information about where parentheses were in the original code at the same time - which is important information for determining whether some of the warnings are valid, it seems. For the purpose of warnings, I would explicitly keep track, for unqualified operator "-", whether it was followed by a digit (which is the unique and certain determiner that a numeric literal follows. Octal and hexadecimal start with 0c for some "c" and floating-point always starts with a decimal digit). This would probably involve adding an argument isomorphic to Bool to the constructor "ITminus". Then in compiler/parser/Lexer.x just before the @varsym rule (since alex is first maximal-munch, then top-to-bottom in the .x file, in matching choice), add rules "-" / [0-9] { minus followed by number } "-" { minus not followed by number } ( the [0-9] pattern could be refined perhaps... ) Then this notation has to be carried on through the Parser.y, which shouldn't be too hard. For negative numeric literals, I think extra rules in the lexer would be added, '-' followed by the various numeric literal types (this seems a little repetitious, is there an easier way?). The varieties of literals that were standard in the first place (i.e. non-unboxed) will get " / { extension is on }" qualifications to their patterns. mkHsNegApp (in RdrHsSyn.lhs) will be simplified or removed, since we are moving towards a more sensible treatment of negative literals. Another implementation choice could be to recognize the "minus followed by number" in the parser, but then it might be hard to distinguish between '98-syntax negate, subtraction, and negative unboxed literals, without ambiguity in the parser? (Negative) numeric literals can occur in patterns, not just expressions; that may or may not need tweaks specific to it. Test cases!!!! I suppose I should make a bunch of them, that deal with every oddity I can think of, since I have already been thinking about them... (1 Prelude.-1) is infix with either syntax, and shouldn't (probably) be warned about, etc., etc. -- which explain better what the intended behaviour is anyway. Note 1: I happen to think it's silly to allow two such tokens such that one begins at the same character-location that the previous one ends, but that's clearly a completely separate issue. I have been bitten by - -fglasgow-exts and x$y z (template haskell syntax $identifier, which is rather similar to the proposed negative literal syntax) before; maybe I don't even want infix operators adjacent to identifiers normally! (but in practice everything tends to work out without difficulty) Note 2: looking through the results for http://www.google.com/codesearch for lang:haskell [0-9a-zA-Z_'#)]-[0-9] suggests that expressions like (n-1) without spaces are mildly popular. I wouldn't trust the "number of results" though, because (1) results in comments are included, (2) who knows what code it's searching, and (3) searching for lang:haskell [-][0-9] gave me fewer results than the more restrictive lang:haskell [^0-9a-zA-Z_'#)]-[0-9] . The "#" was included in case there were glasgowIdentifiers#, and the rest of the symbols could have been useful if *&$%- didn't make one infix operator. Feeling excessively thorough, Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGHqihHgcxvIWYTTURAk54AJ9rsqBgu1kKJqudazzuBm6u5WujiACg2f1Y sTrl1AZrHXxzMtnpez6OSEY= =ktjn -----END PGP SIGNATURE-----