On Thu, Nov 4, 2010 at 6:09 AM, Simon Marlow <marlowsd@gmail.com> wrote:
On 04/11/2010 02:35, David Sankel wrote:
On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow <marlowsd@gmail.com
<mailto:marlowsd@gmail.com>> wrote:

   On 03/11/2010 10:36, Bulat Ziganshin wrote:

       Hello Max,

       Wednesday, November 3, 2010, 1:26:50 PM, you wrote:

           1. You need to use "chcp 65001" to set the console code page
           to UTF8
           2. It is very likely that your Windows console won't have
           the fonts
           required to actually make sense of the output. Pipe the
           output to
           foo.txt. If you open this file in notepad you will see the
           correct
           characters show up.


       it will work even without chcp. afaik nor ghc nor windows
       adjusts text
       being output to current console codepage


   GHC certainly does.  We use GetConsoleCP() when deciding what code
   page to use by default - see libraries/base/GHC/IO/Encoding/CodePage.hs.



This can actually be quite helpful. I've discovered that if you have a
console set to code page 65001 (UTF-8) and use WriteConsoleA (the
non-wide version) with UTF-8 encoded strings, the console displays the
text properly!

So the solution seems to be, when outputting to a utf8 console use
WriteConsoleA.

We need someone to rewrite the IO library backend for Win32.  Currently it is going via the msvcrt POSIX emulation layer, i.e. using write() and pseudo-file-descriptors.  More than a few problems have been caused by this, and it's totally unnecessary except that we get to share some code between the POSIX and Windows backends.  We ought to be using the native Win32 APIs and HANDLE directly, then we could use WriteConsoleA.

It looks like replacing the POSIX layer isn't necessary to fix the Unicode console output bug. I've made a ticket and in a comment I illustrate the _setmode call that magically makes everything work:

http://hackage.haskell.org/trac/ghc/ticket/4471

I could attempt a ghc patch for this, but I don't have any experience with the ghc code. Perhaps someone could add this _setmode call with relative ease?

David

--
David Sankel
Sankel Software
www.sankelsoftware.com
585 617 4748 (Office)