question about GHC and Unicode

In GHC there's a GHC.Unicode library, but for a string such as "ΧΑΟΣΣ", a GHC compiled program prints it as a string of unknown characters, and in the interpreter, the string evaluates to a string of escape sequences instead of displaying properly. Is there a way to get/activate unicode support in GHC?

zefria:
In GHC there's a GHC.Unicode library, but for a string such as " *AIOO", a GHC compiled program prints it as a string of unknown characters, and in the interpreter, the string evaluates to a string of escape sequences instead of displaying properly.
Is there a way to get/activate unicode support in GHC?
GHC supports unicode internally, and String and Char are all unicode. To do unicode IO however, you need to use the utf8-string package: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string Just import and use IO functions from System.IO.UTF8, and that's it! -- Don

On Sun, Apr 27, 2008 at 1:02 PM, Don Stewart
GHC supports unicode internally, and String and Char are all unicode.
To do unicode IO however, you need to use the utf8-string package:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string
Just import and use IO functions from System.IO.UTF8, and that's it!
crap... I was hoping for a switch or something. The program is supposed to be an interpreter that includes unicode IO support, and I wanted to use readline or editline or similar for it. Is there any library you know of that would have that kind of an ability while still using unicode? Between readline and unicode, unicode is the more important, but a readline ability would still be very nice.

zefria:
On Sun, Apr 27, 2008 at 1:02 PM, Don Stewart
wrote: GHC supports unicode internally, and String and Char are all unicode.
To do unicode IO however, you need to use the utf8-string package:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string
Just import and use IO functions from System.IO.UTF8, and that's it!
crap... I was hoping for a switch or something.
The program is supposed to be an interpreter that includes unicode IO support, and I wanted to use readline or editline or similar for it. Is there any library you know of that would have that kind of an ability while still using unicode? Between readline and unicode, unicode is the more important, but a readline ability would still be very nice.
Ian Lynagh wrote a pure haskell readline implementation a while ago, that would be trivial to change to use System.IO.UTF8.getLine -- or you could write your own (basic history/readline editing is fairly simple). -- Don

Hello Ian, Sunday, April 27, 2008, 11:39:03 PM, you wrote:
Ian Lynagh wrote a pure haskell readline implementation a while ago,
I think that was Malcolm Wallace.
it may become even more interesting if Malcolm thinks that it was Don :)) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Ian Lynagh wrote a pure haskell readline implementation a while ago,
I think that was Malcolm Wallace.
... and it is part of the readline package! See System.Console.SimpleLineEditor. Needless to say, it is far from perfect, but does give some basic facilities. There are lots of ways it could be extended. (I haven't checked, but I really hope the cabal mechanisms for configuring the readline package do not require a working actual- readline installation, because SimpleLineEditor is explicitly designed to workaround its absence! It is supposed to provide a uniform interface regardless of whether readline is there.) Regards, Malcolm

On Sun, 2008-04-27 at 21:44 +0100, Malcolm Wallace wrote:
Ian Lynagh wrote a pure haskell readline implementation a while ago,
I think that was Malcolm Wallace.
... and it is part of the readline package! See System.Console.SimpleLineEditor. Needless to say, it is far from perfect, but does give some basic facilities. There are lots of ways it could be extended.
(I haven't checked, but I really hope the cabal mechanisms for configuring the readline package do not require a working actual- readline installation, because SimpleLineEditor is explicitly designed to workaround its absence! It is supposed to provide a uniform interface regardless of whether readline is there.)
It's certainly possible though from a quick look at the .cabal description and configure.ac script it's not quite clear if it'll do what you want. Duncan

ah ha, yes. I seem to have found it here: http://www.haskell.org/pipermail/glasgow-haskell-bugs/2001-April/000401.html I'll use this as a base. Many thanks to the both of you. I'll be sure to submit it to Hackage for others to enjoy if I end up adding some capability to it.

zefria:
ah ha, yes. I seem to have found it here: http://www.haskell.org/pipermail/glasgow-haskell-bugs/2001-April/000401.html
I'll use this as a base. Many thanks to the both of you. I'll be sure to submit it to Hackage for others to enjoy if I end up adding some capability to it.
yeah, a pure unicode-readline would be an awesome contribution! -- Don

On Sun April 27 2008 2:02:25 pm Don Stewart wrote:
zefria:
In GHC there's a GHC.Unicode library, but for a string such as " *AIOO", a GHC compiled program prints it as a string of unknown characters, and in the interpreter, the string evaluates to a string of escape sequences instead of displaying properly.
Is there a way to get/activate unicode support in GHC?
GHC supports unicode internally, and String and Char are all unicode.
To do unicode IO however, you need to use the utf8-string package:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string
Just import and use IO functions from System.IO.UTF8, and that's it!
That's a wonderful interface, but unfortunately it appears to assume that your Unicode I/O is always UTF-8, and never UTF-16. I happen to deal with more UTF-16 data than UTF-8 over here at the moment. (Did I mention UTF-7 or UTF-EBCDIC? horrors...) There is a Haskell binding for iconv for those that need UTF-16, but it doesn't appear to have as convenient an interface. -- John

John Goerzen wrote:
That's a wonderful interface, but unfortunately it appears to assume that your Unicode I/O is always UTF-8, and never UTF-16. I happen to deal with more UTF-16 data than UTF-8 over here at the moment.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding seems to have UTF-16.

On Apr 29, 2008, at 7:19 PM, Albert Y. C. Lai wrote:
John Goerzen wrote:
That's a wonderful interface, but unfortunately it appears to assume that your Unicode I/O is always UTF-8, and never UTF-16. I happen to deal with more UTF-16 data than UTF-8 over here at the moment.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding seems to have UTF-16.
Just as a comment on the original comment: maybe it's just an unfortunate choice of phrasing, but is it really *that* surprising that a package called "utf8-string" assumes UTF-8 encoding?

On Wed April 30 2008 11:30:05 am Brandon S. Allbery KF8NH wrote:
On Apr 29, 2008, at 7:19 PM, Albert Y. C. Lai wrote:
John Goerzen wrote:
That's a wonderful interface, but unfortunately it appears to assume that your Unicode I/O is always UTF-8, and never UTF-16. I happen to deal with more UTF-16 data than UTF-8 over here at the moment.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding seems to have UTF-16.
Just as a comment on the original comment: maybe it's just an unfortunate choice of phrasing, but is it really *that* surprising that a package called "utf8-string" assumes UTF-8 encoding?
Nope, never said it was. I was just pointing out that the OP (IIRC) asked for Unicode in general, and didn't specify UTF-8 specifically... and that UTF-8 is just part of that bigger picture.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (9)
-
Albert Y. C. Lai
-
Brandon S. Allbery KF8NH
-
Bulat Ziganshin
-
Daniel Gee
-
Don Stewart
-
Duncan Coutts
-
Ian Lynagh
-
John Goerzen
-
Malcolm Wallace