
I am just coming to haskell, and I wrote a simple command to get some input from a pdf file I just wanted the output of the command so I did something like import System.Process (runInteractiveCommand) import IO (hGetContents) -- | returns the text of the first page of the pdf at the given path, needs pdftotext getTextOfPdf :: String -> IO String getTextOfPdf pdfPath = do (inp,out,err,pid) <- runInteractiveCommand ("pdftotext -l 1 "+ +pdfPath++" -") return (hGetContents out) I don't care about error handling, if something goes wrong it is ok to hang or crash, but knowing unix I wondered if this would do the right thing or if it would create a zombi process. I was about to ask, but then I thought "let's test it", and sure enough the zombi stays there. I tried to even to allocate more than one, wait, I even managed to exhaust the resources of my machine... So here is what I would have liked to happen: when the pid gets garbage collected it tries to wait for the process, and if that fails the pid stays around longer and will try to wait later. Too difficult? I don't know, but it is what I had expected from haskell. Failing that I would have expected clear hints that one should wait for external processes in the documentation, and I found none. So what is the way out? I could do a forkIO and wait for the process but then I wonder do I have to wait for the thread, or dead thread become zombi? I that is the case then the only way out would be to give back also the pid and make the waiting responsibility of the caller, not very nice, but probably the real solution. I have seen that in missingh there is a pipeout example but it seems to me that you are still responsible to wait for the process (with ensureSuccess). Maybe the correct thing is not being able to ignore the return code of the process.... but now I am becoming suspicious of other things, for example a file handle, you need to close it or you can expect that it will be closed when the handle is garbage collected? Are there other places where you need to pay attention to be sure that you are releasing the resources you acquired? I suppose problems can come only when an external resource is involved, or not So 1) documentation should specify if one should do some specific action to free resources, can someone fix this? 2) is there a clean (haskell;) way to deal with this? 3) other places apart external processes where you have to pay attention? What is your wisdom... thanks Fawzi

fmohamed:
I am just coming to haskell, and I wrote a simple command to get some input from a pdf file
I just wanted the output of the command so I did something like
import System.Process (runInteractiveCommand) import IO (hGetContents)
-- | returns the text of the first page of the pdf at the given path, needs pdftotext getTextOfPdf :: String -> IO String getTextOfPdf pdfPath = do (inp,out,err,pid) <- runInteractiveCommand ("pdftotext -l 1 "+ +pdfPath++" -") return (hGetContents out)
I don't care about error handling, if something goes wrong it is ok to hang or crash, but knowing unix I wondered if this would do the right thing or if it would create a zombi process.
I was about to ask, but then I thought "let's test it", and sure enough the zombi stays there. I tried to even to allocate more than one, wait, I even managed to exhaust the resources of my machine...
So here is what I would have liked to happen: when the pid gets garbage collected it tries to wait for the process, and if that fails the pid stays around longer and will try to wait later.
Wait for the process to terminate, using waitForProcess pid I've a sketch for a nice wrapper for the low level process code here, http://www.cse.unsw.edu.au/~dons/code/newpopen/ Cheers, Don

[...] Wait for the process to terminate, using
waitForProcess pid
Thanks fro the prompt response Don, I should have said it, but I knew about waitForProcess pid but I did not want to use it. The reason is the following, if I do getTextOfPdf pdfPath = do (inp,out,err,pid) <- runInteractiveCommand ("pdftotext -l 1 "+ +pdfPath++" -") waitForProcess pid return (hGetContents out) it deadlocks, because I did not read out yet, and if the buffer is full then the process sleeps waiting for me to read. So I can forkIO the wait, but then the question is do I have to wait for the thread? Giving back the pid along with out is a solution, but I did not want to caller to know about the detail that I am using an external process to get the data. Also I did not want to read everything in memory, as I think you do in your example, ok I could, but why should I, we are lazy right?... Fawzi

Wait for the process to terminate, using
waitForProcess pid
I've a sketch for a nice wrapper for the low level process code here,
What's missing? I'd like to use it, but I don't like unreleased libraries :) /David

On Feb 6, 2007, at 8:39 PM, David Waern wrote:
Wait for the process to terminate, using
waitForProcess pid
I've a sketch for a nice wrapper for the low level process code here,
What's missing? I'd like to use it, but I don't like unreleased libraries :)
I would use it just as tutorial, for real I would use http://software.complete.org/missingh/static/doc/System-Cmd-Utils.html if you need. Fawzi

6 feb 2007 kl. 22.33 skrev Fawzi Mohamed:
On Feb 6, 2007, at 8:39 PM, David Waern wrote:
Wait for the process to terminate, using
waitForProcess pid
I've a sketch for a nice wrapper for the low level process code here,
What's missing? I'd like to use it, but I don't like unreleased libraries :)
I would use it just as tutorial, for real I would use
http://software.complete.org/missingh/static/doc/System-Cmd-Utils.html
if you need.
I just want to get the output of a command as a String. In this case the newopen library is the simplest thing. /David

davve:
Wait for the process to terminate, using
waitForProcess pid
I've a sketch for a nice wrapper for the low level process code here,
What's missing? I'd like to use it, but I don't like unreleased libraries :)
Last time I checked, it needed to be more careful with stdout and stderr, and to have a way to return those values too. There was a useful thread on libraries@ back in December. -- Don

I am replying to myself, but anyway with it seems (from the documentation) that forkIO (do{ waitForProcess pid; return () }) is the best solution, and does not seem to lead to wasted resources. Anyway I am still interested in knowing if there are better solutions, or other places where one cannot assume that the garbage collector will reclaim all the resources and one has to call a special function to ensure it. Fawzi On Feb 6, 2007, at 2:38 PM, Fawzi Mohamed wrote:
I am just coming to haskell, and I wrote a simple command to get some input from a pdf file
I just wanted the output of the command so I did something like
import System.Process (runInteractiveCommand) import IO (hGetContents)
-- | returns the text of the first page of the pdf at the given path, needs pdftotext getTextOfPdf :: String -> IO String getTextOfPdf pdfPath = do (inp,out,err,pid) <- runInteractiveCommand ("pdftotext -l 1 "+ +pdfPath++" -") return (hGetContents out)
I don't care about error handling, if something goes wrong it is ok to hang or crash, but knowing unix I wondered if this would do the right thing or if it would create a zombi process.
I was about to ask, but then I thought "let's test it", and sure enough the zombi stays there. I tried to even to allocate more than one, wait, I even managed to exhaust the resources of my machine...
So here is what I would have liked to happen: when the pid gets garbage collected it tries to wait for the process, and if that fails the pid stays around longer and will try to wait later.
Too difficult? I don't know, but it is what I had expected from haskell. Failing that I would have expected clear hints that one should wait for external processes in the documentation, and I found none.
So what is the way out? I could do a forkIO and wait for the process but then I wonder do I have to wait for the thread, or dead thread become zombi? I that is the case then the only way out would be to give back also the pid and make the waiting responsibility of the caller, not very nice, but probably the real solution. I have seen that in missingh there is a pipeout example but it seems to me that you are still responsible to wait for the process (with ensureSuccess).
Maybe the correct thing is not being able to ignore the return code of the process.... but now I am becoming suspicious of other things, for example a file handle, you need to close it or you can expect that it will be closed when the handle is garbage collected? Are there other places where you need to pay attention to be sure that you are releasing the resources you acquired? I suppose problems can come only when an external resource is involved, or not
So
1) documentation should specify if one should do some specific action to free resources, can someone fix this?
2) is there a clean (haskell;) way to deal with this?
3) other places apart external processes where you have to pay attention?
What is your wisdom... thanks
Fawzi _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi, Am Dienstag, den 06.02.2007, 15:56 +0100 schrieb Fawzi Mohamed:
I am replying to myself, but anyway with it seems (from the documentation) that
forkIO (do{ waitForProcess pid; return () })
is the best solution, and does not seem to lead to wasted resources.
This does not work for me. According to strace, the spawned process is in a blocking write, the haskell process is in a blocking waitpid, and I am sure that the program would consume all output. Greetings, Joachim -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de ICQ#: 74513189

mail:
Hi,
Am Dienstag, den 06.02.2007, 15:56 +0100 schrieb Fawzi Mohamed:
I am replying to myself, but anyway with it seems (from the documentation) that
forkIO (do{ waitForProcess pid; return () })
is the best solution, and does not seem to lead to wasted resources.
This does not work for me. According to strace, the spawned process is in a blocking write, the haskell process is in a blocking waitpid, and I am sure that the program would consume all output.
Any difference with -threaded ? --Don

On Feb 7, 2007, at 1:05 AM, Donald Bruce Stewart wrote:
mail:
Hi,
Am Dienstag, den 06.02.2007, 15:56 +0100 schrieb Fawzi Mohamed:
I am replying to myself, but anyway with it seems (from the documentation) that
forkIO (do{ waitForProcess pid; return () })
is the best solution, and does not seem to lead to wasted resources.
This does not work for me. According to strace, the spawned process is in a blocking write, the haskell process is in a blocking waitpid, and I am sure that the program would consume all output.
Any difference with -threaded ?
strange for me it works beautifully, even without -threaded, maybe your process writes to stderr? you could do something like forkIO (do{ length(hGetContents err); return () }) (not tested) if you are not interested to the error output (and maybe also a length of the input to be sure that you have read it all... Fawzi

Hi, Am Mittwoch, den 07.02.2007, 11:05 +1100 schrieb Donald Bruce Stewart:
Am Dienstag, den 06.02.2007, 15:56 +0100 schrieb Fawzi Mohamed:
I am replying to myself, but anyway with it seems (from the documentation) that
forkIO (do{ waitForProcess pid; return () })
is the best solution, and does not seem to lead to wasted resources.
This does not work for me. According to strace, the spawned process is in a blocking write, the haskell process is in a blocking waitpid, and I am sure that the program would consume all output.
Any difference with -threaded ?
Yes: Then it works. Strangely it also depends on which machine the same binary runs: On my desktop, it works even without threaded, but on the server it hangs. Are there any disadvantages of -threaded? Zombie processes are not too bad, after all. Greetings, Joachim -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de ICQ#: 74513189

Joachim Breitner wrote:
Hi,
Am Mittwoch, den 07.02.2007, 11:05 +1100 schrieb Donald Bruce Stewart:
Am Dienstag, den 06.02.2007, 15:56 +0100 schrieb Fawzi Mohamed:
I am replying to myself, but anyway with it seems (from the documentation) that
forkIO (do{ waitForProcess pid; return () })
is the best solution, and does not seem to lead to wasted resources. This does not work for me. According to strace, the spawned process is in a blocking write, the haskell process is in a blocking waitpid, and I am sure that the program would consume all output.
Any difference with -threaded ?
Yes: Then it works. Strangely it also depends on which machine the same binary runs: On my desktop, it works even without threaded, but on the server it hangs.
Are there any disadvantages of -threaded? Zombie processes are not too bad, after all.
Probably not, for most uses. Perhaps the only remaining use for the non-threaded runtime is a multithreaded program that needs to call into a non-thread-safe foreign library (this is the issue affecting Gtk). It's possible that for 6.8 we could make -threaded the default; it's already what you get when you run under GHCi. Cheers, Simon
participants (5)
-
David Waern
-
dons@cse.unsw.edu.au
-
Fawzi Mohamed
-
Joachim Breitner
-
Simon Marlow