
Hello all, I'm attempting to output some Unicode on the windows console. I set my windows console code page to utf-8 using "chcp 65001". The program: -- Test.hs main = putStr "λ.x→x" The output of `runghc Test.hs`: λ.x→xxxx
From within ghci, typing `main`:
λ*** Exception: <stdout>: hPutChar: permission denied (Permission denied) I suspect both of these outputs are evidence of bugs. Might I be doing something wrong? (aside from using windows ;)) TIA, David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

On Mon, Nov 1, 2010 at 10:20 PM, David Sankel
Hello all,
I'm attempting to output some Unicode on the windows console. I set my windows console code page to utf-8 using "chcp 65001".
The program:
-- Test.hs main = putStr "λ.x→x"
The output of `runghc Test.hs`:
λ.x→xxxx
From within ghci, typing `main`:
λ*** Exception: <stdout>: hPutChar: permission denied (Permission denied)
I suspect both of these outputs are evidence of bugs. Might I be doing something wrong? (aside from using windows ;))
I forgot to mention that I'm using Windows XP with ghc 6.12.3. -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

This is evidence for the broken Unicode support in the Windows
terminal and not a problem with GHC. I experienced the same many
times.
2010/11/2 David Sankel
On Mon, Nov 1, 2010 at 10:20 PM, David Sankel
wrote: Hello all, I'm attempting to output some Unicode on the windows console. I set my windows console code page to utf-8 using "chcp 65001". The program:
-- Test.hs main = putStr "λ.x→x"
The output of `runghc Test.hs`:
λ.x→xxxx
From within ghci, typing `main`:
λ*** Exception: <stdout>: hPutChar: permission denied (Permission denied)
I suspect both of these outputs are evidence of bugs. Might I be doing something wrong? (aside from using windows ;))
I forgot to mention that I'm using Windows XP with ghc 6.12.3.
-- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Is there a ghc "wontfix" bug ticket for this? Perhaps we can make a small C
test case and send it to the Microsoft people. Some[1] are reporting success
with Unicode console output.
David
[1] http://www.codeproject.com/KB/cpp/unicode_console_output.aspx
On Tue, Nov 2, 2010 at 3:49 AM, Krasimir Angelov
This is evidence for the broken Unicode support in the Windows terminal and not a problem with GHC. I experienced the same many times.
2010/11/2 David Sankel
: On Mon, Nov 1, 2010 at 10:20 PM, David Sankel
wrote: Hello all, I'm attempting to output some Unicode on the windows console. I set my windows console code page to utf-8 using "chcp 65001". The program:
-- Test.hs main = putStr "λ.x→x"
The output of `runghc Test.hs`:
λ.x→xxxx
From within ghci, typing `main`:
λ*** Exception: <stdout>: hPutChar: permission denied (Permission
denied)
I suspect both of these outputs are evidence of bugs. Might I be doing something wrong? (aside from using windows ;))
I forgot to mention that I'm using Windows XP with ghc 6.12.3.
-- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

It is possible to output some non Latin1 symbols if you use the wide
string API but not all of them. Basically the console supports all
European language but nothing else - Latin, Cyrillic and Greek.
2010/11/2 David Sankel
Is there a ghc "wontfix" bug ticket for this? Perhaps we can make a small C test case and send it to the Microsoft people. Some[1] are reporting success with Unicode console output. David
[1] http://www.codeproject.com/KB/cpp/unicode_console_output.aspx
On Tue, Nov 2, 2010 at 3:49 AM, Krasimir Angelov
wrote: This is evidence for the broken Unicode support in the Windows terminal and not a problem with GHC. I experienced the same many times.
2010/11/2 David Sankel
: On Mon, Nov 1, 2010 at 10:20 PM, David Sankel
wrote: Hello all, I'm attempting to output some Unicode on the windows console. I set my windows console code page to utf-8 using "chcp 65001". The program:
-- Test.hs main = putStr "λ.x→x"
The output of `runghc Test.hs`:
λ.x→xxxx
From within ghci, typing `main`:
λ*** Exception: <stdout>: hPutChar: permission denied (Permission denied)
I suspect both of these outputs are evidence of bugs. Might I be doing something wrong? (aside from using windows ;))
I forgot to mention that I'm using Windows XP with ghc 6.12.3.
-- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

On 2 November 2010 21:05, David Sankel
Is there a ghc "wontfix" bug ticket for this? Perhaps we can make a small C test case and send it to the Microsoft people. Some[1] are reporting success with Unicode console output.
I confirmed that I can output Chinese unicode from Haskell. You can test this by using a program like: main = putStrLn "我学习电脑科学" When you run it: 1. You need to use "chcp 65001" to set the console code page to UTF8 2. It is very likely that your Windows console won't have the fonts required to actually make sense of the output. Pipe the output to foo.txt. If you open this file in notepad you will see the correct characters show up. If you want to see the actual correct output in the console, there are some more issues: 1. You need to do some registry hacking to use e.g. "SimSum Regular" as the console font. 2. Even if you do this, my understanding is that it probably won't work (you will still get junk output in the form of the actual UTF-8 bytes). I think you would instead need to use "chcp 936" (the Simplified Chinese GBK code page) which tells the Windows API to output GBK code points instead of the UTF-8 encoding. These should then render correctly. However, to install the code page so chcp works you need to have "East Asian language support" installed (so Windows 7 Professional users like me are out of luck, because it appears to have been dropped in favour of "Language packs", which are only available for 7 Ultimate/Enterprise...) I don't know how all this would adapt to the lambda character. Maybe you need to use a Greek code page?? And I have no idea where that "permission denied" error is coming from. In summary, this will probably never work properly. This sort of rubbish is why I switched to OS X :-) Cheers, Max

Hello Max, Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use "chcp 65001" to set the console code page to UTF8 2. It is very likely that your Windows console won't have the fonts required to actually make sense of the output. Pipe the output to foo.txt. If you open this file in notepad you will see the correct characters show up.
it will work even without chcp. afaik nor ghc nor windows adjusts text being output to current console codepage -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 03/11/2010 10:36, Bulat Ziganshin wrote:
Hello Max,
Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use "chcp 65001" to set the console code page to UTF8 2. It is very likely that your Windows console won't have the fonts required to actually make sense of the output. Pipe the output to foo.txt. If you open this file in notepad you will see the correct characters show up.
it will work even without chcp. afaik nor ghc nor windows adjusts text being output to current console codepage
GHC certainly does. We use GetConsoleCP() when deciding what code page to use by default - see libraries/base/GHC/IO/Encoding/CodePage.hs. Windows Consoles use Unicode internally. I presume at some point between WriteFile() and the console some decoding is supposed to happen, but I don't know where that is, or how well it works (other evidence on this thread suggests not very well). Cheers, Simon

On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow
On 03/11/2010 10:36, Bulat Ziganshin wrote:
Hello Max,
Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use "chcp 65001" to set the console code page to UTF8
2. It is very likely that your Windows console won't have the fonts required to actually make sense of the output. Pipe the output to foo.txt. If you open this file in notepad you will see the correct characters show up.
it will work even without chcp. afaik nor ghc nor windows adjusts text being output to current console codepage
GHC certainly does. We use GetConsoleCP() when deciding what code page to use by default - see libraries/base/GHC/IO/Encoding/CodePage.hs.
This can actually be quite helpful. I've discovered that if you have a console set to code page 65001 (UTF-8) and use WriteConsoleA (the non-wide version) with UTF-8 encoded strings, the console displays the text properly! So the solution seems to be, when outputting to a utf8 console use WriteConsoleA. David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

On 04/11/2010 02:35, David Sankel wrote:
On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: On 03/11/2010 10:36, Bulat Ziganshin wrote:
Hello Max,
Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use "chcp 65001" to set the console code page to UTF8 2. It is very likely that your Windows console won't have the fonts required to actually make sense of the output. Pipe the output to foo.txt. If you open this file in notepad you will see the correct characters show up.
it will work even without chcp. afaik nor ghc nor windows adjusts text being output to current console codepage
GHC certainly does. We use GetConsoleCP() when deciding what code page to use by default - see libraries/base/GHC/IO/Encoding/CodePage.hs.
This can actually be quite helpful. I've discovered that if you have a console set to code page 65001 (UTF-8) and use WriteConsoleA (the non-wide version) with UTF-8 encoded strings, the console displays the text properly!
So the solution seems to be, when outputting to a utf8 console use WriteConsoleA.
We need someone to rewrite the IO library backend for Win32. Currently it is going via the msvcrt POSIX emulation layer, i.e. using write() and pseudo-file-descriptors. More than a few problems have been caused by this, and it's totally unnecessary except that we get to share some code between the POSIX and Windows backends. We ought to be using the native Win32 APIs and HANDLE directly, then we could use WriteConsoleA. This is a prerequisite for having a decent Win32 implementation of the IO manager too, and we could get proper support for hGetNonBlocking. We're not talking about a lot of code here - basically a replacement for the modules GHC.IO.FD and GHC.IO.Handle.FD (about 1000 lines in total). Some of the low-level Win32 support might have to be imported from the Win32 package though. Any volunteers? Cheers, Simon

On Thu, Nov 4, 2010 at 6:09 AM, Simon Marlow
On 04/11/2010 02:35, David Sankel wrote:
On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: On 03/11/2010 10:36, Bulat Ziganshin wrote:
Hello Max,
Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use "chcp 65001" to set the console code page to UTF8 2. It is very likely that your Windows console won't have the fonts required to actually make sense of the output. Pipe the output to foo.txt. If you open this file in notepad you will see the correct characters show up.
it will work even without chcp. afaik nor ghc nor windows adjusts text being output to current console codepage
GHC certainly does. We use GetConsoleCP() when deciding what code page to use by default - see libraries/base/GHC/IO/Encoding/CodePage.hs.
This can actually be quite helpful. I've discovered that if you have a console set to code page 65001 (UTF-8) and use WriteConsoleA (the non-wide version) with UTF-8 encoded strings, the console displays the text properly!
So the solution seems to be, when outputting to a utf8 console use WriteConsoleA.
We need someone to rewrite the IO library backend for Win32. Currently it is going via the msvcrt POSIX emulation layer, i.e. using write() and pseudo-file-descriptors. More than a few problems have been caused by this, and it's totally unnecessary except that we get to share some code between the POSIX and Windows backends. We ought to be using the native Win32 APIs and HANDLE directly, then we could use WriteConsoleA.
It looks like replacing the POSIX layer isn't necessary to fix the Unicode console output bug. I've made a ticket and in a comment I illustrate the _setmode call that magically makes everything work: http://hackage.haskell.org/trac/ghc/ticket/4471 I could attempt a ghc patch for this, but I don't have any experience with the ghc code. Perhaps someone could add this _setmode call with relative ease? David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)
participants (5)
-
Bulat Ziganshin
-
David Sankel
-
Krasimir Angelov
-
Max Bolingbroke
-
Simon Marlow