
On Sat, Aug 25, 2007 at 05:43:53AM -0400, Gwern Branwen wrote:
However, while it works fine for your basic bread and butter ASCII characters, I noticed that it does terrible things to more exotic phrases involving Unicode, such as "Henri Poincaré". I borrowed some code from utf-string, and that improved it a little bit - "Henri Poincar�" now becomes "Henri Poincarý" which is still better than "Henri Poincar\245" or whatever.
Well, it is not better: Poincar\245 is right, the other is wrong, since part of the character has been truncated.
As far as I can tell, whenever a String involving UTF-8 stuff leaves the Haskell environment, it gets messed up. This is a little hard to test since things get messed up even when I test them in GHCi. :) But I'm sure it has to be something to do with Haskell, since I know I can copy and paste such strings with no problems using the mouse, and I know that my shell isn't the problem and nor are the Surfraw programs I use to pass them to Firefox (and Firefox obviously has no problems handling those characters).
The problem is not Haskell, it's me (and you..;-): actually , while I have a clear undertsnding of Unicode and utf-8 in general, I don't know how to handle multi-byte characters in any language that is not Php...;-) I hope Mats Jansborg is still reading the mailing list: hopefully he could give us some direction.
Any ideas on how to fix this?
----------------------------------------------------------------------------- -- | -- Module : XMonadContrib.XSelection -- Copyright : (C) 2007 Andrea Rossato
thanks for the gift, but copyright should belong to you, since you wrote it...;-) If I test hxsel with the first 3 Cyrillic characters of this page: http://gorgias.mine.nu/unicode.php I get: \u041f\u0440\u0438 which is the correct answer. The problem is: how can I convert this unicode characters into something that can be printed? Andrea