[GHC] #10583: Chaos in Lexeme.hs

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: goldfire Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- I've been looking at the `Lexeme` module (in `basicTypes`), where -- as far as I can tell -- utter chaos reigns. (Full disclosure: I wrote this module some time ago, inheriting its code from various places. But I clearly did a poor job of it.) Here is a sampling of the chaos: * `isLexConSym` claims to recognize type and data constructor infix symbols. But it requires symbols to start with a `:` (or be `->`). This is out-of-date with respect to the change in type constructor infix symbols in 7.6(?), which now do not need to start with a `:`. * `isVarSymChar` and `okSymChar` both purport to recognize characters that are valid parts of symbolic identifiers. But they have entirely different, unrelated implementations. These should be the '''same''' function, I believe. * The `notFollowedBySymbol` function defined in `parser/Lexer.x` overlaps with the functions above. But it has a '''third''' implementation, different than either of these other two. * The `isLexXXX` functions all just look at first characters, except for `isLexVarSym`, which looks at all characters. There is a reason for this -- that GHC-generated names start with a `$` but should be printed prefix -- but I'm not sure I buy it. Is it sufficient to look at the first two characters instead of the first one? I'm happy to make the code changes around this, but I need some advice from someone who has more knowledge about both Haskell's lexical structure and quite possibly Unicode. Happily, the function in `Lexeme` are not used much. But it would be awfully nice if they did the right thing when they are used. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: goldfire Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: 10582 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Cleaning this up looks like a great service, thank you. I don't have an opinions of my own; it's not an area I've been working in. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: goldfire Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: 10582 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by ezyang): Hello Richard, If you look at the types and the use-sites of the functions in question, you can get a clue what they are used for. * `isLexVarSym` operates on `FastString`s, and is primarily called by `isSymOcc` to do tests on `OccName`s. We variously need this to do things like check if a symbol name is valid (language check for `TypeOperators`) or make a decision about how to pretty-print a symbol, etc. * `okSymChar` operates on `String`s, and is used by the Template Haskell conversion interface to test that a TH-generated identifier looks like a valid symbol (if it is one.) * `notFollowedBySymbol` operates on the Alex state, and is used to make decisions in the Lexer. This one might be a bit more permissive than the others, since lexer errors are not nice for users but errors in the type- checker are much nicer. So probably `isLexVarSym` and `okSymChar` can and should be combined, but you'll have to do a goofy conversion from String to FastString to do it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: goldfire Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: 10582 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by goldfire): We use `isLexVarSym` to check if a symbol name is valid? That should be refactored, because quite clearly the other `isLexXXX` functions do not do validity checking. And I don't agree about the goofy conversion: I want to combine `isVarSymChar` and `okSymChar`, both of type `Char -> Bool`. As for `notFollowedBySymbol`: This needs to be spot on -- a mistake in either direction would lead to wrong behavior. One place it's used is to detect banana brackets for arrow notation; see #10582. In the end, I can figure out what these functions are used for by poking around, but I just don't quite know what their specification is -- as in, what (precisely) they should do. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: goldfire Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 10582 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by goldfire): Update: This just got a bit more annoying, because some of the chaotic implementation has been moved to `ghc-boot`. See Phab:D1313. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: goldfire Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 10582, 11046 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by oerjan): * cc: oerjan (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10583: Chaos in Lexeme.hs -------------------------------------+------------------------------------- Reporter: goldfire | Owner: albertus Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: | testsuite/tests/th/T11046.hs Blocked By: | Blocking: 10582, 11046 Related Tickets: | Differential Rev(s): Phab:D1451 Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * differential: => Phab:D1451 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10583#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC