[GHC] #15021: ghc-pkg list crashes on Windows when unicode character is in the path

#15021: ghc-pkg list crashes on Windows when unicode character is in the path ----------------------------------------+--------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Keywords: | Operating System: Windows Architecture: Unknown/Multiple | Type of failure: None/Unknown Test Case: | Blocked By: Blocking: | Related Tickets: #10762 Differential Rev(s): | Wiki Page: ----------------------------------------+--------------------------------- Below I will attach examples of how `ghc-pkg list` crashes with {{{ <stdout>: commitBuffer: invalid argument (invalid character) }}} when the user name contains a non-ASCII character (in my case `日`). I don't think it's important what the user name is, but more likely whether or not `日` is in some path (probably the path to the package database). The issue is trivially reproducible (I tried on Windows 10, and Server 2016 as deployed by AWS) with cmd.exe and PowerShell. In the below, `chcp 20127` sets the code page to US-ASCII, and `chcp 65001`, sets it to UTF-8. This per-terminal / per-process setting is different from the system locale (as can be set [https://www.java.com/en/download/help/locale.xml this way]). The the behaviour of different combinations of system locale and code page: * English (US) system locale * `chcp 437` (English default): **crash** * `chcp 65001` (UTF-8): no crash * `chcp 20127` (ASCII): **crash** * Japanese system locale * `chcp 932` (Japanese default): no crash * `chcp 65001` (UTF-8): no crash * `chcp 20127` (ASCII): **crash** In none of these situatins should `ghc-pkg list` crash. It breaks all the build tools; from conversatins on the `haskell-jp` Slack, I have learned that Japanese users have learned to avoid unicode in their user names due to this issue. I believe the fix to `ghc-pkg` should be similar to [https://github.com/ghc/ghc/commit/1b56c40578374a15b4a2593895710c68b0e2a717 this fix for `ghc`] (#10762), where the encoding of the stdout `Handle` is set to UTF-8. ---- With English (US) system locale, as copy-pasted from PowerShell: {{{ Windows PowerShell Copyright (C) 2016 Microsoft Corporation. All rights reserved. PS C:\Users\日> chcp Active code page: 437 PS C:\Users\日> C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\bin\ghc-pkg.EXE list C:\Users\ghc-pkg.EXE: <stdout>: commitBuffer: invalid argument (invalid character) PS C:\Users\日> chcp 65001 Active code page: 65001 PS C:\Users\日> C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\bin\ghc-pkg.EXE list C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\lib\package.conf.d Cabal-2.0.1.0 Win32-2.5.4.1 array-0.5.2.0 base-4.10.1.0 binary-0.8.5.1 bytestring-0.10.8.2 containers-0.5.10.2 deepseq-1.4.3.0 directory-1.3.0.2 filepath-1.4.1.2 (ghc-8.2.2) ghc-boot-8.2.2 ghc-boot-th-8.2.2 ghc-compact-0.1.0.0 ghc-prim-0.5.1.1 ghci-8.2.2 haskeline-0.7.4.0 hoopl-3.10.2.2 hpc-0.6.0.3 integer-gmp-1.0.1.0 pretty-1.1.3.3 process-1.6.1.0 rts-1.0 template-haskell-2.12.0.0 time-1.8.0.2 transformers-0.5.2.0 xhtml-3000.2.2 PS C:\Users\日> chcp 20127 Active code page: 20127 PS C:\Users\日> C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\bin\ghc-pkg.EXE list C:\Users\ghc-pkg.EXE: <stdout>: commitBuffer: invalid argument (invalid character) }}} With Japanese system locale, as copied from PowerShell: {{{ Windows PowerShell Copyright (C) 2016 Microsoft Corporation. All rights reserved. PS C:\Users\日> chcp Active code page: 932 PS C:\Users\日> C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\bin\ghc-pkg.exe list C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\lib\package.conf.d Cabal-2.0.1.0 Win32-2.5.4.1 array-0.5.2.0 base-4.10.1.0 binary-0.8.5.1 bytestring-0.10.8.2 containers-0.5.10.2 deepseq-1.4.3.0 directory-1.3.0.2 filepath-1.4.1.2 (ghc-8.2.2) ghc-boot-8.2.2 ghc-boot-th-8.2.2 ghc-compact-0.1.0.0 ghc-prim-0.5.1.1 ghci-8.2.2 haskeline-0.7.4.0 hoopl-3.10.2.2 hpc-0.6.0.3 integer-gmp-1.0.1.0 pretty-1.1.3.3 process-1.6.1.0 rts-1.0 template-haskell-2.12.0.0 time-1.8.0.2 transformers-0.5.2.0 xhtml-3000.2.2 PS C:\Users\日> chcp 65001 Active code page: 65001 PS C:\Users\日> C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\bin\ghc-pkg.exe list C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\lib\package.conf.d Cabal-2.0.1.0 Win32-2.5.4.1 array-0.5.2.0 base-4.10.1.0 binary-0.8.5.1 bytestring-0.10.8.2 containers-0.5.10.2 deepseq-1.4.3.0 directory-1.3.0.2 filepath-1.4.1.2 (ghc-8.2.2) ghc-boot-8.2.2 ghc-boot-th-8.2.2 ghc-compact-0.1.0.0 ghc-prim-0.5.1.1 ghci-8.2.2 haskeline-0.7.4.0 hoopl-3.10.2.2 hpc-0.6.0.3 integer-gmp-1.0.1.0 pretty-1.1.3.3 process-1.6.1.0 rts-1.0 template-haskell-2.12.0.0 time-1.8.0.2 transformers-0.5.2.0 xhtml-3000.2.2 PS C:\Users\日> chcp 20127 Active code page: 20127 PS C:\Users\日> C:\Users\日 \AppData\Local\Programs\stack\x86_64-windows\ghc-8.2.2\bin\ghc-pkg.exe list C:\Users\ghc-pkg.exe: <stdout>: commitBuffer: invalid argument (invalid character) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path ---------------------------------+---------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: | Keywords: Operating System: Windows | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #10762 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by lehins): I can confirm this issue with Cyrillic characters on Windows 7 with latest GHC. If locale set to Russian through the control panel everything works fine, but changing the codepage does break {{{ghc-pkg}}} {{{ PS C:\Users\Алексей\ghc-8.4.2\bin> chcp 437 Active code page: 437 PS C:\Users\???????\ghc-8.4.2\bin> .\ghc-pkg.exe list C:\Users\ghc-pkg.exe: <stdout>: commitBuffer: invalid argument (invalid character) PS C:\Users\???????\ghc-8.4.2\bin> chcp 866 Active code page: 866 PS C:\Users\Алексей\ghc-8.4.2\bin> .\ghc-pkg.exe list C:\Users\Алексей\ghc-8.4.2\lib\package.conf.d Cabal-2.2.0.1 Win32-2.6.1.0 array-0.5.2.0 base-4.11.1.0 binary-0.8.5.1 bytestring-0.10.8.2 containers-0.5.11.0 deepseq-1.4.3.0 directory-1.3.1.5 filepath-1.4.2 (ghc-8.4.2) ghc-boot-8.4.2 ghc-boot-th-8.4.2 ghc-compact-0.1.0.0 ghc-prim-0.5.2.0 ghci-8.4.2 haskeline-0.7.4.2 hpc-0.6.0.3 integer-gmp-1.0.2.0 mtl-2.2.2 parsec-3.1.13.0 pretty-1.1.3.6 process-1.6.3.0 rts-1.0 stm-2.4.5.0 template-haskell-2.13.0.0 text-1.2.3.0 time-1.8.0.2 transformers-0.5.5.0 xhtml-3000.2.2.1 PS C:\Users\Алексей\ghc-8.4.2\bin> }}} Curiously, setting {{{chcp 65001}}} when locale is already set to Russian through the control panel results in a powershell crash, but not when locale is English(United States). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: | Keywords: Operating System: Windows | Architecture: Type of failure: GHC doesn't work | Unknown/Multiple at all | Test Case: Blocked By: | Blocking: Related Tickets: #10762, #15096 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * failure: None/Unknown => GHC doesn't work at all * related: #10762 => #10762, #15096 Comment: This was discovered as part of the effort to make Haskell tooling work well for users with non-ASCII user names: https://github.com/commercialhaskell/stack/issues/3988 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: | Keywords: Operating System: Windows | Architecture: Type of failure: GHC doesn't work | Unknown/Multiple at all | Test Case: Blocked By: | Blocking: Related Tickets: #10762, #15096 | Differential Rev(s): Phab:D4642 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * differential: => Phab:D4642 Comment: See Phab:D4642 for a quick attempt at a fix. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: | Keywords: Operating System: Windows | Architecture: Type of failure: GHC doesn't work | Unknown/Multiple at all | Test Case: Blocked By: | Blocking: Related Tickets: #10762, #15096 | Differential Rev(s): Phab:D4642 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => patch -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: | Keywords: Operating System: Windows | Architecture: Type of failure: GHC doesn't work | Unknown/Multiple at all | Test Case: Blocked By: | Blocking: Related Tickets: #10762, #15096 | Differential Rev(s): Phab:D4642 Wiki Page: | -------------------------------------+------------------------------------- Comment (by lehins): The quickfix proposed is OK, but it't not enough, since the right way is to alos set the code page to 65001 on Windows so after the utf8 encoding is applied it not only doesn't error out but also actually prints out the correct characters. I was able to get this working to day, will try to add a fix tomorrow. Here is the quickfix proposed by @bgamari in action: {{{ PS C:\phab\ghc-pkg\.stack-work\install\b82bf5d2\bin> .\ghc-pkg.exe list --global-package-db C:\Users\Алексей\ghc-8.4.2\lib\package.conf.d C:\Users\╨É╨╗╨╡╨║╤ü╨╡╨╣\ghc-8.4.2\lib\package.conf.d Cabal-2.2.0.1 Win32-2.6.1.0 array-0.5.2.0 base-4.11.1.0 binary-0.8.5.1 bytestring-0.10.8.2 containers-0.5.11.0 }}} Here is with a bit of extra (patch coming tomorrow, it's too late for me right now to submit anything) {{{ PS C:\phab\ghc-pkg\.stack-work\install\b82bf5d2\bin> .\ghc-pkg.exe list --global-package-db C:\Users\Алексей\ghc-8.4.2\lib\package.conf.d C:\Users\Алексей\ghc-8.4.2\lib\package.conf.d Cabal-2.2.0.1 Win32-2.6.1.0 array-0.5.2.0 base-4.11.1.0 binary-0.8.5.1 bytestring-0.10.8.2 containers-0.5.11.0 deepseq-1.4.3.0 directory-1.3.1.5 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: | Keywords: Operating System: Windows | Architecture: Type of failure: GHC doesn't work | Unknown/Multiple at all | Test Case: Blocked By: | Blocking: Related Tickets: #10762, #15096 | Differential Rev(s): Phab:D4642 Wiki Page: | -------------------------------------+------------------------------------- Comment (by lehins): @bgamari, I was thinking it would be appropriate to simply change the code page of the console together with encoding on `stdout` handle, like I did in the example above, but doing a bit more research on this issue lead me to the conclusion that it would not be a valid way to go about it. Mutating global console state that can affect other processes is just not cool. I would like to say that your approach is the correct way to deal with this issue at the moment as it allows for ghc-pkg to function without errors, so I think you can disregard my previous comment. At the same time, while trying to figure this issue out I realized there is a bigger problem at hand and it is directly related to this ticket, but considering that it has much bigger impact and the solution would affect all Windows users I opened a separate ticket for it: #15118 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: 8.6.1 Component: ghc-pkg | Version: 8.2.2 Resolution: fixed | Keywords: Operating System: Windows | Architecture: Type of failure: GHC doesn't work | Unknown/Multiple at all | Test Case: Blocked By: | Blocking: Related Tickets: #10762, #15096 | Differential Rev(s): Phab:D4642 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => fixed Comment: Done. Thanks for your help, lehins! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15021#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15021: ghc-pkg list crashes on Windows when unicode character is in the path
-------------------------------------+-------------------------------------
Reporter: nh2 | Owner: (none)
Type: bug | Status: closed
Priority: normal | Milestone: 8.6.1
Component: ghc-pkg | Version: 8.2.2
Resolution: fixed | Keywords:
Operating System: Windows | Architecture:
Type of failure: GHC doesn't work | Unknown/Multiple
at all | Test Case:
Blocked By: | Blocking:
Related Tickets: #10762, #15096 | Differential Rev(s): Phab:D4642
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari
participants (1)
-
GHC