How does GHC avoid "<stderr>: hPutChar: invalid argument (invalid character)"?

Hello Haskellers, Currently, I’m working on this issue https://github.com/haskell/haddock/pull/566, where haddock crashes when printing the Unicode “bullet character” http://www.fileformat.info/info/unicode/char/2022/index.htm on stderr whose character encoding is not UTF-8. In the beforementioned pull request, I just added hSetEncoding stderr utf8 as a quick-and-dirty workaround. But GHC actually doesn’t do so: GHC prints “?” instead of the bullet character when stderr is not Unicode-compatible. So, I believe there’s a better way to handle the case, and GHC knows it. Then, how does GHC detect the handle’s character encoding and convert incompatible characters (such as the bullet character) into “?” to avoid the error? I couldn’t get it by reading the source of GHC a bit. Thanks in advance!

On Mon, Apr 10, 2017 at 7:24 PM, Yuji Yamamoto
Currently, I’m working on this issue https://github.com/haskell/haddock/pull/566,
where haddock crashes when printing the Unicode “bullet character” http://www.fileformat.info/info/unicode/char/2022/index.htm on stderr whose character encoding is not UTF-8.
In the beforementioned pull request, I just added hSetEncoding stderr utf8 as a quick-and-dirty workaround. But GHC actually doesn’t do so: GHC prints “?” instead of the bullet character when stderr is not Unicode-compatible.
https://downloads.haskell.org/~ghc/8.0.2/docs/html/libraries/base-4.9.1.0/GH... note the suffixes you can add to an encoding name. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

This sounds to me like a classical case of a wrongly (or not) configured locale/terminal. As a starting point I would check both what the locale variables are set to (run "locale" in the shell) and check what encoding the terminal is set to. I've noticed a lot of linux installs have completely borked configuration with regards to these and in those cases GHC can't do much. Side note: Just forcing output to be utf8 is *definitely* the wrong thing to do, as it'd break any environment set to a different encoding... Cheers, Merijn
On 11 Apr 2017, at 1:24, Yuji Yamamoto
wrote: Hello Haskellers,
Currently, I’m working on this issue, where haddock crashes when printing the Unicode “bullet character” on stderr whose character encoding is not UTF-8.
In the beforementioned pull request, I just added hSetEncoding stderr utf8 as a quick-and-dirty workaround. But GHC actually doesn’t do so: GHC prints “?” instead of the bullet character when stderr is not Unicode-compatible.
So, I believe there’s a better way to handle the case, and GHC knows it. Then, how does GHC detect the handle’s character encoding and convert incompatible characters (such as the bullet character) into “?” to avoid the error? I couldn’t get it by reading the source of GHC a bit.
Thanks in advance!
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Brandon and Merijn,
Thank you for advice.
But I should have noted that the issue is about Windows...
So things to take care might be different...
2017-04-11 16:46 GMT+09:00 Merijn Verstraaten
This sounds to me like a classical case of a wrongly (or not) configured locale/terminal. As a starting point I would check both what the locale variables are set to (run "locale" in the shell) and check what encoding the terminal is set to. I've noticed a lot of linux installs have completely borked configuration with regards to these and in those cases GHC can't do much.
Side note: Just forcing output to be utf8 is *definitely* the wrong thing to do, as it'd break any environment set to a different encoding...
Cheers, Merijn
On 11 Apr 2017, at 1:24, Yuji Yamamoto
wrote: Hello Haskellers,
Currently, I’m working on this issue, where haddock crashes when printing the Unicode “bullet character” on stderr whose character encoding is not UTF-8.
In the beforementioned pull request, I just added hSetEncoding stderr utf8 as a quick-and-dirty workaround. But GHC actually doesn’t do so: GHC prints “?” instead of the bullet character when stderr is not Unicode-compatible.
So, I believe there’s a better way to handle the case, and GHC knows it. Then, how does GHC detect the handle’s character encoding and convert incompatible characters (such as the bullet character) into “?” to avoid the error? I couldn’t get it by reading the source of GHC a bit.
Thanks in advance!
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- 山本悠滋 twitter: @igrep GitHub: https://github.com/igrep Facebook: http://www.facebook.com/igrep Google+: https://plus.google.com/u/0/+YujiYamamoto_igrep
participants (3)
-
Brandon Allbery
-
Merijn Verstraaten
-
Yuji Yamamoto