[GHC] #15118: Printing non-ASCII characters to console on Windows

#15118: Printing non-ASCII characters to console on Windows -------------------------------------+------------------------------------- Reporter: lehins | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- As part of an initiative of getting stack to work properly on Windows for users with international names (https://github.com/commercialhaskell/stack/issues/3988) and working on trying to find a fix for {{{ghc-pkg}}} - #15021 I discovered a weird behavior that have been known for a while and does affect other languages, not only Haskell. First of all here is the default behavior on Windows with Locale that isn't Cyrillic for this program: {{{ main :: IO () main = putStrLn "Алексей Кулешевич" }}} {{{ PS C:\phab\windows-console> stack exec -- console console.EXE: <stdout>: commitBuffer: invalid argument (invalid character) }}} Now consider this program: {{{ main :: IO () main = do hSetEncoding stdout utf8 putStrLn "Алексей Кулешевич" }}} Compiling and running it on Windows 7 with English locale results in: {{{ PS C:\phab\windows-console> stack exec -- console ╨É╨╗╨╡╨║╤ü╨╡╨╣ ╨Ü╤â╨╗╨╡╤ê╨╡╨▓╨╕╤ç PS C:\phab\windows-console> chcp 65001 Active code page: 65001 PS C:\phab\windows-console> stack exec -- console Алексей Кулешевич лешевич �ич }}} No knowledge of Russian is necessary in order to see that after the code page is set to {{{65001}}} there are characters printed to the console that don't belong there. That seems to be bug in Windows handling of unicode characters, since it's the exactly same result is `cmd` as well as Powershell and has been reported with other languages like Perl and Java. Worth noting that this also directly affects `ghc`, whenever {{{GHC_CHARENC}}} environment variable is set to {{{"UTF-8"}}}. Besides the bug being described above it is sad that we need to rely on both the code page and the handle encoding to be set correctly in order to even see the semi-correct output without a total program crash. The fix being proposed here is to use {{{WriteConsoleW}}} API call instead of writing to a handle, but only when the handle is actually a console and not pipe. This allows us to print unicode characters correctly without changing or relying on the setting of the current code page. Here is a sample output with my recent experiments: {{{ PS C:\phab\windows-console> chcp Active code page: 437 PS C:\phab\windows-console> stack exec -- console Алексей Кулешевич }}} I'll add some code examples of proposed solution in the upcoming days. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15118 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15118: Printing non-ASCII characters to console on Windows -------------------------------------+------------------------------------- Reporter: lehins | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: duplicate | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4471 #11394 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Phyx-): * status: new => closed * resolution: => duplicate * related: => #4471 #11394 Comment: This is a duplicate of a number of tickets. We already know about the issue and the solution and have been working on it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15118#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15118: Printing non-ASCII characters to console on Windows ---------------------------------+---------------------------------------- Reporter: lehins | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: duplicate | Keywords: Operating System: Windows | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4471 #11394 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Changes (by Phyx-): * os: Unknown/Multiple => Windows -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15118#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC