[GHC] #11529: Show instance of Char should print literals for non-ascii printable charcters

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Show instance for non-ascii characters prints their character codes. This is sad for Haskell users that speaks language other than English. {{{#!hs
'A' 'A' 'Ä' '\196' '漢' '\28450' print $ [(++"'s dad"), (++"'s mom")] <*> ["Simon", "John"] ["Simon's dad","John's dad","Simon's mom","John's mom"] print $ [(++"の父"), (++"の母")] <*> ["田中", "山田"] ["\30000\20013\12398\29238","\23665\30000\12398\29238","\30000\20013\12398\27597","\23665\30000\12398\27597"] }}}
The function that needs improvement is showLitChar in GHC.Show, which currently prints any character larger than ASCII code 127 by its character code: http://haddock.stackage.org/lts-5.1/base-4.8.2.0/src/GHC-Show.html {{{#!hs showLitChar :: Char -> ShowS showLitChar c s | c > '\DEL' = showChar '\\' (protectEsc isDec (shows (ord c)) s) }}} On the other hand, there is GHC.Unicode.isPrint, the predicate for printable Unicode characters, that is calling on a foreign function u_iswprint for the knowledge. https://hackage.haskell.org/package/base-4.8.2.0/docs/src/GHC.Unicode.html#i... I think one of the solution is to import and call u_iswprint from GHC.Show, too, but I don't know it's against any design choices. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nushio): One of application that is broken by this change, is when a customized Show instance of a type is controlled by other variables in that type. For example, the following code simulates a press code that respects privacy for people of age under 20. {{{#!hs data Sex = Male | Female data Person = Person {name :: String, age :: Int, sex :: Sex} instance Show Person where show (Person _ a Male ) | a < 20 = "A boy (" ++ show a ++ ")" show (Person _ a Female) | a < 20 = "A girl (" ++ show a ++ ")" show (Person n a _ ) = n assert $ show (Person "村主崇行" 19 Male) == "A boy (19)" assert $ show (Person "村主崇行" 20 Male) == "\26449\20027\23815\34892" }}} I'm very looking forward to learn other drawbacks of this change. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Absolutely any code in the entire world that relies on the current behavior will break. The current behavior is expressed in the reference implementation in the Haskell 2010 report. Frankly, changing it ''is not an option''. You can write your own function to unescape valid Unicode. You can also write your own `UShow` class if you like with a method for showing various things using Unicode generally. You can then try to convince other developers to depend on your package and write instances of your class. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nushio): Dear dfeuer, thank you for pointing out that the `Show Char` is specified in Haskell 2010. I believe the corresponding section is the following: https://www.haskell.org/onlinereport/haskell2010/haskellch16.html#x24-217000... {{{ 16.6 String representations showLitChar :: Char -> ShowS Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example: showLitChar '\n' s = "\\n" ++ s }}} where "Haskell source-language escape conventions" are defined, in turn, in Section 2.6 https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-200002.6 . Correct me if I'm wrong. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by thomie): You can put something like this in your `.ghci` file: {{{ :seti -XScopedTypeVariables :{ let myShow :: Show a => a -> String myShow x = go (show x) where go :: String -> String go [] = [] go s@(x:xs) = case x of '\"' -> '\"' : str ++ "\"" ++ go rest '\'' -> '\'' : char : '\'' : go rest' _ -> x : go xs where (str :: String, rest):_ = reads s (char :: Char, rest'):_ = reads s :} :{ let myPrint :: Show a => a -> IO () myPrint = putStrLn . myShow :} :set -interactive-print=myPrint }}} Example: {{{ Prelude> [(++"の父"), (++"の母")] <*> ["田中", "山田"] ["田中の父","山田の父","田中の母","山田の母"] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: invalid | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nushio): * status: new => closed * resolution: => invalid Comment: Dear thomie, thank you for your comment. Yes, `-interactive-print` is a great feature! I regret that I was not able to search out this has been done for years. There are also several customized `show` function proposed, like `myShow` here. However, when I used it in some detail, I found that printing in Unicode has many corner cases that are more difficult than it seems .... As far as I have searched, I cannot find a unicode-printing function that satisfies `read . unicode_show == id` for sufficiently many types. For example, https://gist.github.com/nushio3/4a10f3c0092295696daf Thus, I decided to start a small package `-interactive-print`ing. http://hackage.haskell.org/package/unicode-show I wish this helps many people enjoy Haskell! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: invalid | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nushio): By the way, now I know that this issue was a language feature, rather than lack of implementation, I think it is proper to close this ticket. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: invalid | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by thomie): GHCi could be changed to show unicode characters nicely by default. The code is in the function `tcUserStmt` in `compiler/typecheck/TcRnDriver.hs`. Expressions: {{{ -- The plans are: -- A. [it <- e; print it] but not if it::() -- B. [it <- e] -- C. [let it = e; print it] }}} Statements: {{{ -- The plans are: -- [stmt; print v] if one binder and not v::() -- [stmt] otherwise }}} Replace `print` by `putStrLn . uShow`, with a suitable `uShow`. That shouldn't break anyone's code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * status: closed => new * resolution: invalid => Comment: Oh, silly me, `print` in `tcUserStmt` of course uses that interactive printer setting I mentioned in comment:4. So my suggestion is to change the default interactive printer to display unicode characters nicely. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by thomie): There is also a bug, `ghci -fprint-bind-result` doesn't use the interactive printer for statements: {{{ Prelude> :set -fprint-bind-result Prelude> let x = "の父" "\12398\29238" Prelude> "の父" "の父" }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by lelf): * cc: lelf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nushio): (+1) to suggestion that to change the default interactive printer to display unicode characters nicely. The algorithm in `unicode-show` might be suitable for the purpose, although there should be various opinions on what is the "nice way to print unicode." By the way, if we update the default interactive printer, will we be breaking the `doctests` that `show`s values with unicodes, forcing them to update the expected results from the interpreter? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): I would love for something like comment:4 to become the default in ghci. It could even be simpler/stupider and just replace any sequence like `\12345` with the corresponding Unicode character wherever it appears. I mean when would you ever have such a string in the output of `show`, short of a weird custom Show instance? And it would be more robust to other weird custom Show instances, that used quotes in an unbalanced fashion. I don't think we should replace `\n` or `\ESC` or especially `\\` though. Just printable Unicode characters outside the ASCII range, probably. And we could decline to do the replacement if the replacement character can't be encoded in the user's locale. One drawback is that the user's font might not contain the Unicode characters in question, like mine does not contain `\12345`. So there should probably be an option to disable these replacements. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): This recently come up again on ghc-devs, https://mail.haskell.org/pipermail/ghc-devs/2016-March/011655.html. It has also come up repeatedly in the past as pointed out in that thread, • 2016: https://mail.haskell.org/pipermail/haskell- cafe/2016-February/122874.html • 2012: http://stackoverflow.com/questions/14039726/how-to-make-haskell- or-ghci-able-to-show-chinese-characters-and-run-chinese-char • 2012 again: https://mail.haskell.org/pipermail/haskell- cafe/2012-July/102569.html • 2011: http://stackoverflow.com/questions/5535512/how-to-hack-ghci-or- hugs-so-that-it-prints-unicode-chars-unescaped • 2010: https://mail.haskell.org/pipermail/haskell- cafe/2010-August/082823.html -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * cc: RyanGlScott (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chak): Replying to [comment:2 dfeuer]:
Absolutely any code in the entire world that relies on the current behavior will break. The current behavior is expressed in the reference implementation in the Haskell 2010 report. Frankly, changing it ''is not an option''. You can write your own function to unescape valid Unicode. You can also write your own `UShow` class if you like with a method for showing various things using Unicode generally. You can then try to convince other developers to depend on your package and write instances of your class.
I disagree. I think, the current implementation is actually wrong and does not adhere to the standard. The standard states in 16.6 that `showLitChar` be defined as follows:
Convert a character to a string using only printable characters, using Haskell source-language escape conventions.
However, the current implementation of `showLitChar` fail to use `isPrint`; instead it uses a naive condition, `c > '\DEL'`, to determine printability. This is wrong. The solution is simple, replace the condition `c > '\DEL'` by `not (isPrint c)` in the definition of `showLitChar`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by allbery_b): isPrint does not answer the question "can this character be displayed by the current user given their current locale?". That would require it to be in IO, and would limit the ability to use it in other contexts. isPrint answers the question "is the Unicode codepoint contained in the given Char considered printable by the version of the Unicode standard to which the runtime conforms?". It is not the correct question to ask here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by erikd): * cc: erikd (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

isPrint does not answer the question "can this character be displayed by
#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chak): Replying to [comment:16 allbery_b]: the current user given their current locale?". That would require it to be in IO, and would limit the ability to use it in other contexts.
isPrint answers the question "is the Unicode codepoint contained in the
given Char considered printable by the version of the Unicode standard to which the runtime conforms?".
It is not the correct question to ask here.
It is, however, what the standard prescribes. IMHO it is also the right thing to do as it leads to less unexpected behaviour than the current implementation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11529: Show instance of Char should print literals for non-ascii printable charcters -------------------------------------+------------------------------------- Reporter: nushio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nushio): I believe there is an ambiguity in the specification {{{ showLitChar :: Char -> ShowS "Convert a character to a string using only printable characters }}} whether "printable" means ASCII printable or Unicode printable. How shall we solve the ambiguity? By the way, I use `map fromEnum` to investigate the content of the string when lack of appropriate font or when I am debugging a pretty printer. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11529#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC