hGetContents Illegal byte sequence / ghc-pkg

Hello café, I have a program that is crashing, and I have no idea why: module Main where import System.Process (readProcessWithExitCode) main :: IO () main = do _ <- readProcessWithExitCode "ghc-pkg" ["describe", "hoopl"] "" putStrLn "Should never get here" this is using the process package from hackage. The program crashes with minimal-test: fd:5: hGetContents: invalid argument (invalid byte sequence) minimal-test: thread blocked indefinitely in an MVar operation inspecting the source of readProcessWithExitCode yields an obvious explanation to the MVar problem, but I don't understand why hGetContents is so offended. For the lazy it is defined as follows: readProcessWithExitCode :: FilePath -- ^ command to run -> [String] -- ^ any arguments -> String -- ^ standard input -> IO (ExitCode,String,String) -- ^ exitcode, stdout, stderr readProcessWithExitCode cmd args input = do (Just inh, Just outh, Just errh, pid) <- createProcess (proc cmd args){ std_in = CreatePipe, std_out = CreatePipe, std_err = CreatePipe } outMVar <- newEmptyMVar -- fork off a thread to start consuming stdout out <- hGetContents outh _ <- forkIO $ C.evaluate (length out) >> putMVar outMVar () -- fork off a thread to start consuming stderr err <- hGetContents errh _ <- forkIO $ C.evaluate (length err) >> putMVar outMVar () -- now write and flush any input when (not (null input)) $ do hPutStr inh input; hFlush inh hClose inh -- done with stdin -- wait on the output takeMVar outMVar takeMVar outMVar hClose outh hClose errh -- wait on the process ex <- waitForProcess pid return (ex, out, err) Now having looked at the source of ghc-pkg it is dumping it's output using putStr and friends, so that should be using my local encoding on the system, right? and so should hGetContents in my program..? Now, for the curious: the reason I care is that this problem has effectively prevented me from using virthualenv. Sadness and woe.

On Sat, Aug 11, 2012 at 12:13:45PM +0100, Benjamin Edwards wrote:
Hello café,
I have a program that is crashing, and I have no idea why:
module Main where
import System.Process (readProcessWithExitCode)
main :: IO () main = do _ <- readProcessWithExitCode "ghc-pkg" ["describe", "hoopl"] "" putStrLn "Should never get here"
this is using the process package from hackage. The program crashes with
minimal-test: fd:5: hGetContents: invalid argument (invalid byte sequence) minimal-test: thread blocked indefinitely in an MVar operation
inspecting the source of readProcessWithExitCode yields an obvious explanation to the MVar problem, but I don't understand why hGetContents is so offended.
The 'invalid argument' error from hGetContents indicates that a wrong encoding is being assumed. I don't know enough about how putStr/hGetContents decide on an encoding, but in any case it works for me (that is, it prints "Should never get here"). The likely sticking point is that one of the authors of hoopl, João Dias, has a name which contains U+00E3: LATIN SMALL LETTER A WITH TILDE. Try doing ghc-pkg describe hoopl > hoopl.txt file hoopl.txt to get an indication of what encoding is being used, or manually take a look at the bytes being generated using ghc-pkg describe hoopl | hexdump -C I don't know what the solution is but at least this should give some additional information. -Brent

On Sat, Aug 11, 2012 at 7:13 AM, Benjamin Edwards
inspecting the source of readProcessWithExitCode yields an obvious explanation to the MVar problem, but I don't understand why hGetContents is so offended.
I think last time I looked into this there was no normalization of package.conf data; if it was read in in a particular encoding, it was stored in that encoding and you can get an exception trying to dump it in a different encoding. Certainly I have found ISO8859-1 encoded text in my package.conf.d/* files, when I would expect (and was configured for) UTF-8. This also suggests that ISO8859-1 text in a *.cabal file would remain ISO8859-1 in the resulting package.conf file even if UTF-8 encoding was active at the time. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

I had this same problem a couple weeks ago when trying to install
virthualenv and I don't really understand it got into a bad state, but the
way I solved it was by fixing the locale settings on my gentoo machine so
that I'm using UTF8. That just involved a few changes in /etc and then the
problem went away.
On Sat, Aug 11, 2012 at 7:13 AM, Benjamin Edwards
Hello café,
I have a program that is crashing, and I have no idea why:
module Main where
import System.Process (readProcessWithExitCode)
main :: IO () main = do _ <- readProcessWithExitCode "ghc-pkg" ["describe", "hoopl"] "" putStrLn "Should never get here"
this is using the process package from hackage. The program crashes with
minimal-test: fd:5: hGetContents: invalid argument (invalid byte sequence) minimal-test: thread blocked indefinitely in an MVar operation
inspecting the source of readProcessWithExitCode yields an obvious explanation to the MVar problem, but I don't understand why hGetContents is so offended.
For the lazy it is defined as follows:
readProcessWithExitCode :: FilePath -- ^ command to run -> [String] -- ^ any arguments -> String -- ^ standard input -> IO (ExitCode,String,String) -- ^ exitcode, stdout, stderr readProcessWithExitCode cmd args input = do (Just inh, Just outh, Just errh, pid) <- createProcess (proc cmd args){ std_in = CreatePipe, std_out = CreatePipe, std_err = CreatePipe }
outMVar <- newEmptyMVar
-- fork off a thread to start consuming stdout out <- hGetContents outh _ <- forkIO $ C.evaluate (length out) >> putMVar outMVar ()
-- fork off a thread to start consuming stderr err <- hGetContents errh _ <- forkIO $ C.evaluate (length err) >> putMVar outMVar ()
-- now write and flush any input when (not (null input)) $ do hPutStr inh input; hFlush inh hClose inh -- done with stdin
-- wait on the output takeMVar outMVar takeMVar outMVar hClose outh hClose errh
-- wait on the process ex <- waitForProcess pid
return (ex, out, err)
Now having looked at the source of ghc-pkg it is dumping it's output using putStr and friends, so that should be using my local encoding on the system, right? and so should hGetContents in my program..?
Now, for the curious: the reason I care is that this problem has effectively prevented me from using virthualenv. Sadness and woe.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Aug 11, 2012 at 4:13 AM, Benjamin Edwards
Hello café,
I have a program that is crashing, and I have no idea why:
module Main where
import System.Process (readProcessWithExitCode)
main :: IO () main = do _ <- readProcessWithExitCode "ghc-pkg" ["describe", "hoopl"] "" putStrLn "Should never get here"
this is using the process package from hackage. The program crashes with
minimal-test: fd:5: hGetContents: invalid argument (invalid byte sequence) minimal-test: thread blocked indefinitely in an MVar operation
inspecting the source of readProcessWithExitCode yields an obvious explanation to the MVar problem, but I don't understand why hGetContents is so offended.
For the lazy it is defined as follows:
readProcessWithExitCode :: FilePath -- ^ command to run -> [String] -- ^ any arguments -> String -- ^ standard input -> IO (ExitCode,String,String) -- ^ exitcode, stdout, stderr readProcessWithExitCode cmd args input = do (Just inh, Just outh, Just errh, pid) <- createProcess (proc cmd args){ std_in = CreatePipe, std_out = CreatePipe, std_err = CreatePipe }
outMVar <- newEmptyMVar
-- fork off a thread to start consuming stdout out <- hGetContents outh _ <- forkIO $ C.evaluate (length out) >> putMVar outMVar ()
-- fork off a thread to start consuming stderr err <- hGetContents errh _ <- forkIO $ C.evaluate (length err) >> putMVar outMVar ()
-- now write and flush any input when (not (null input)) $ do hPutStr inh input; hFlush inh hClose inh -- done with stdin
-- wait on the output takeMVar outMVar takeMVar outMVar hClose outh hClose errh
-- wait on the process ex <- waitForProcess pid
return (ex, out, err)
Now having looked at the source of ghc-pkg it is dumping it's output using putStr and friends, so that should be using my local encoding on the system, right? and so should hGetContents in my program..?
Now, for the curious: the reason I care is that this problem has effectively prevented me from using virthualenv. Sadness and woe.
I would recommend using ByteStrings. There is a link to a version of readProcessWithExitCode that uses ByteString instead of String here: http://www.haskell.org/pipermail/libraries/2012-August/018263.html
participants (5)
-
Benjamin Edwards
-
Brandon Allbery
-
Brent Yorgey
-
David Fox
-
David McBride