How to work around GHC bug

Hi! The following program demonstrates a (another) GHC bug. The command line isn't properly decoded to Unicode. arg.hs------------------------------ import System main = do [a] <- getArgs putStrLn (show a) -------------------------------------- When called like this: ./arg ä The program will output this: "\195\164" I'll report this as a bug in the GHC Trac. But for now, I need to work around the problem somehow. The encoders in GHC.IO.Encoding all work on buffers. How do I recode the command line, in order to get proper Unicode strings? Bye Volker

On 14 March 2012 15:08, Ozgur Akgun
On 14 March 2012 13:51, Volker Wysk
wrote: import System
main = do
[a] <- getArgs
putStrLn (show a)
a here is already of type String. If you don't call show on it, it'll do the expected thing.
He means that the UTF-8 encoded string passed to the program should be decoded into unicode points into Chars. So putStrLn (length a) should be 1 were it decoded, but it's actually 2. You can't use this string properly, there is no Char containing the ä. See?

Am Mittwoch 14 März 2012, 15:08:38 schrieben Sie:
On 14 March 2012 13:51, Volker Wysk
wrote: import System
main = do
[a] <- getArgs
putStrLn (show a)
a here is already of type String. If you don't call show on it, it'll do the expected thing.
Try: main = do [a] <- getArgs putStrLn a
That's not true. The result is: ./tmp $ ./arg ä ä Bye Volker

Quoth Volker Wysk
I'll report this as a bug in the GHC Trac. But for now, I need to work around the problem somehow. The encoders in GHC.IO.Encoding all work on buffers. How do I recode the command line, in order to get proper Unicode strings?
Data.Text might work for you. I'm not guaranteeing that you'll get a "proper Unicode string" out of this, but you'll get a String with one (LATIN-1) value per character: import qualified Data.ByteString.Char8 as P import Data.Text.Encoding (decodeUtf8) import qualified Data.Text as T let arghs = ["\195\164"] let args = map (T.unpack . decodeUtf8 . P.pack) arghs Donn

Am Mittwoch 14 März 2012, 16:19:33 schrieben Sie:
Quoth Volker Wysk
, I'll report this as a bug in the GHC Trac. But for now, I need to work around the problem somehow. The encoders in GHC.IO.Encoding all work on buffers. How do I recode the command line, in order to get proper Unicode strings?
Data.Text might work for you. I'm not guaranteeing that you'll get a "proper Unicode string" out of this, but you'll get a String with one (LATIN-1) value per character:
import qualified Data.ByteString.Char8 as P import Data.Text.Encoding (decodeUtf8) import qualified Data.Text as T
let arghs = ["\195\164"] let args = map (T.unpack . decodeUtf8 . P.pack) arghs
Yes, this works! Thanks! Volker

On Wed, 14 Mar 2012 14:51:59 +0100
Volker Wysk
Hi!
The following program demonstrates a (another) GHC bug. The command line isn't properly decoded to Unicode.
arg.hs------------------------------
import System main = do [a] <- getArgs putStrLn (show a)
--------------------------------------
When called like this:
./arg ä
The program will output this:
"\195\164"
First, no need for 'show'. It will try to serialize value in a portable way. 'putStrLn a' should do the right thing (but see below). Second, it was fixed in ghc-7.2+. System.Environment now returns properly encoded Strings instead of byte-encoded oddity it returned before. So you can use 'putStrLn a' there. Previous version would require you to recode the result to proper String -- Sergei
participants (5)
-
Christopher Done
-
Donn Cave
-
Ozgur Akgun
-
Sergei Trofimovich
-
Volker Wysk