How to get program command line arguments in Unicode-aware way (Unix/Linux)?

Hi, I am trying to process command line arguments that may contain Unicode (cyrillic in this example) characters. The standard GHC's getArgs seems to pass whatever was obtained from the underlying C library without any regard to encoding, e. g the following program (testarg.hs): module Main where import System.Environment main = do x <- getArgs mapM (putStrLn . show) x being invoked (ghc 6.10.1) runghc testarg -T 'при<в>ет' prints the following: "-T" "\208\191\209\128\208\184<\208\178>\208\181\209\130" (not correct, all bytes were passed without proper encoding) Is there any way to get program arguments in GHC Unicode-aware? Or at least assuming that they are always in UTF-8? Something like System.IO.UTF8, but for command line arguments? Thanks. PS: BTW runhugs testarg -T 'при<в>ет' prints: "-T" "\1087\1088\1080<\1074>\1077\1090" which is correct. -- Dimitry Golubovsky Anywhere on the Web

Excerpts from Dimitry Golubovsky's message of Wed Mar 11 21:42:14 -0500 2009:
Hi,
I am trying to process command line arguments that may contain Unicode (cyrillic in this example) characters.
The standard GHC's getArgs seems to pass whatever was obtained from the underlying C library without any regard to encoding, e. g the following program (testarg.hs):
module Main where
import System.Environment
main = do x <- getArgs mapM (putStrLn . show) x
being invoked (ghc 6.10.1)
runghc testarg -T 'при<в>ет'
prints the following:
"-T" "\208\191\209\128\208\184<\208\178>\208\181\209\130"
(not correct, all bytes were passed without proper encoding)
Is there any way to get program arguments in GHC Unicode-aware? Or at least assuming that they are always in UTF-8? Something like System.IO.UTF8, but for command line arguments?
Thanks.
PS: BTW runhugs testarg -T 'при<в>ет' prints:
"-T" "\1087\1088\1080<\1074>\1077\1090"
which is correct.
Hello, Would this approach work using utf8-string? import Codec.Binary.UTF8.String import System.Environment import Control.Monad main = do x <- liftM (map decodeString) getArgs mapM_ (putStrLn . encodeString) x Austin

Hello Dimitry, Thursday, March 12, 2009, 5:42:14 AM, you wrote: depends on your OS. for windows i use this code: myGetArgs = do alloca $ \p_argc -> do p_argv_w <- commandLineToArgvW getCommandLineW p_argc argc <- peek p_argc argv_w <- peekArray (i argc) p_argv_w mapM peekTString argv_w >>== tail foreign import stdcall unsafe "windows.h GetCommandLineW" getCommandLineW :: LPTSTR foreign import stdcall unsafe "windows.h CommandLineToArgvW" commandLineToArgvW :: LPCWSTR -> Ptr CInt -> IO (Ptr LPWSTR) note that it doesn't skip over +RTS sections. btw, i plan to make unicode-aware version of System.Directory module to solve all these problems
I am trying to process command line arguments that may contain Unicode (cyrillic in this example) characters.
The standard GHC's getArgs seems to pass whatever was obtained from the underlying C library without any regard to encoding, e. g the following program (testarg.hs):
module Main where
import System.Environment
main = do x <- getArgs mapM (putStrLn . show) x
being invoked (ghc 6.10.1)
runghc testarg -T 'при<в>ет'
prints the following:
"-T" "\208\191\209\128\208\184<\208\178>\208\181\209\130"
(not correct, all bytes were passed without proper encoding)
Is there any way to get program arguments in GHC Unicode-aware? Or at least assuming that they are always in UTF-8? Something like System.IO.UTF8, but for command line arguments?
Thanks.
PS: BTW runhugs testarg -T 'при<в>ет' prints:
"-T" "\1087\1088\1080<\1074>\1077\1090"
which is correct.
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (3)
-
Austin Seipp
-
Bulat Ziganshin
-
Dimitry Golubovsky