Re: [Haskell-cafe] Spurious program crashes

On Nov 17, 2005, at 1:44 PM, Sebastian Sylvan wrote:
Are you sure it's safe to kill a thread which has already been killed?
It seems so from the docs.
Why do you fork off the killing of the threads? Why not just run them in sequence?
Someone said that they read somewhere that killThread can block. I'm not gonna point any fingers at musasabi ;-).
Also, I'd recommend refactoring the code a bit, write a function "parIO" which runs IO computations in parallell and then define timeout in terms of that.
I did this by stealing the timeout/either combinators from the Asynchronous Exceptions paper. It did not help a single bit. Joel -- http://wagerlabs.com/

On 11/17/05, Joel Reymont
On Nov 17, 2005, at 1:44 PM, Sebastian Sylvan wrote:
Are you sure it's safe to kill a thread which has already been killed?
It seems so from the docs.
Why do you fork off the killing of the threads? Why not just run them in sequence?
Someone said that they read somewhere that killThread can block. I'm not gonna point any fingers at musasabi ;-).
Also, I'd recommend refactoring the code a bit, write a function "parIO" which runs IO computations in parallell and then define timeout in terms of that.
I did this by stealing the timeout/either combinators from the Asynchronous Exceptions paper. It did not help a single bit.
This is somewhat frustrating for me because I had a very similar (if not the exact same) issue when writing some test applications for an FMOD binding. However, all that source code (and so much more) was lost due to a hard disk failure. I am now struggling to remember what was the cause, and how I solved. What I do remember is that the timeout and parIO functions in the concurrent programming papers I found were NOT correct. killThread did NOT behave as expected when I killed an already killed thread. I tried multiple tricks here (including some which required recursive do-notation) to try to get the parIO function to only kill the *other* thread. This could be done by having the two spawned threads take their computations in an MVar along with the threadID of the other thread. something like: parIO f1 f2 = do m <- newEmptyMVar -- result Mvar mf1 <- newEmptyMVar -- MVar for f1 mf2 <- newEmptyMVar -- MVar for f2 -- fork worker threads t1 <- forkIO (child m mf1) t2 <- forkIO (child m mf2) -- pass computations and threadID to worker threads putMVar mf1 (t2, f1) putMVar mf2 (t1, f2) -- return result takeMVar m where child m mf = do (tid, f) <- takeMVar mf x <- f putMVar m x killThread tid timeout t f = threadDelay (round (t * 1e6)) `parIO` f As I remember another solution I came up with was to wrap the "child" function body in a catch statement. The child function was just a helper function that ran a computation and put its result in an MVar. I *think* the problem *may* have been that when an FFI function got "ThreadKilled" exception asynchrounously that got bubbled up to the parIO thread for some reason. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Maybe one of the Simons can comment on this. I distinctly remember trying the mdo approach to kill the other thread and getting burned by that. Don't know why I forgot to mention it. On Nov 17, 2005, at 2:03 PM, Sebastian Sylvan wrote:
What I do remember is that the timeout and parIO functions in the concurrent programming papers I found were NOT correct. killThread did NOT behave as expected when I killed an already killed thread. I tried multiple tricks here (including some which required recursive do-notation) to try to get the parIO function to only kill the *other* thread. This could be done by having the two spawned threads take their computations in an MVar along with the threadID of the other thread.

One thing, which I am sure you must have got right, but which burned me, is that you must explicitly free enitities created by FFI calls. For example network sockets exist outside of the haskell runtime, and are not free'd automatically when a haskell thread is killed, you need an explicit exception handler to close the handle... They may eventually be garbage collected - but your application may run out of resources before this happens. Keean. Joel Reymont wrote:
Maybe one of the Simons can comment on this. I distinctly remember trying the mdo approach to kill the other thread and getting burned by that. Don't know why I forgot to mention it.
On Nov 17, 2005, at 2:03 PM, Sebastian Sylvan wrote:
What I do remember is that the timeout and parIO functions in the concurrent programming papers I found were NOT correct. killThread did NOT behave as expected when I killed an already killed thread. I tried multiple tricks here (including some which required recursive do-notation) to try to get the parIO function to only kill the *other* thread. This could be done by having the two spawned threads take their computations in an MVar along with the threadID of the other thread.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I'm being quite careful with resources these days. The outstanding issues are 1) Crashes on Mac OSX that are not reproduced on Linux, Windows, etc. 2) Some kind of a problem with Chan. getChanContents retrieves things smoothly, readChan only does it for the first few lines. Simon? Anyone? 3) Different performance of the logger thread on Mac OSX and Windows. I'm having thousands of threads write their trace messages to a Chan. The logger On Windows I only see the first few lines of output when using isEmptyChan/readChan to retrieve values in a loop. On Mac OSX I do see smooth output. On Windows I run out of memory because all the output sent to the chan builds up and is never processed. I can process it by replacing isEmptyChan/readChan with getChanContents but then my logger thread hangs forever (right semantics) and hangs everything else that waits for the logger thread to check an MVar and exit. On Nov 21, 2005, at 4:34 PM, Keean Schupke wrote:
One thing, which I am sure you must have got right, but which burned me, is that you must explicitly free enitities created by FFI calls.

On Mon, Nov 21, 2005 at 04:42:33PM +0000, Joel Reymont wrote:
2) Some kind of a problem with Chan. getChanContents retrieves things smoothly, readChan only does it for the first few lines. Simon? Anyone?
This is interesting. Could you show some source code? Best regards Tomasz

Yes, of course. darcs repo at http://test.wagerlabs.com/postmortem. logger in Util.hs On Nov 21, 2005, at 8:30 PM, Tomasz Zielonka wrote:
On Mon, Nov 21, 2005 at 04:42:33PM +0000, Joel Reymont wrote:
2) Some kind of a problem with Chan. getChanContents retrieves things smoothly, readChan only does it for the first few lines. Simon? Anyone?
This is interesting. Could you show some source code?

On Mon, Nov 21, 2005 at 09:50:20PM +0000, Joel Reymont wrote:
Yes, of course.
darcs repo at http://test.wagerlabs.com/postmortem.
logger in Util.hs
It's in Conc.hs You seem to be busy waiting. I can see two ways of solving the problem: 1. use STM and non-deterministic choice 2. use a (Chan (Maybe String)), where (Just s) means the next log entry, and Nothing means "break the logger loop" Best regards Tomasz

STM would complicate things too much for me. At least I think so. I would love to use STM but I would need to fit it into "type ScriptState = ErrorT String (StateT World IO)" just to use the logger. I'm not THAT comfortable with monads. Let me see if I understand you correctly... Are you saying that I should be using getChanContents in the code below? logger :: Handle -> MVar () -> IO () logger h die = do empty <- isEmptyChan parent unless empty $ do x <- readChan parent putStrLn x hPutStrLn h x alive <- isEmptyMVar die when (alive || not empty) $ logger h die I think using Maybe is a great trick but I'm curious why so few messages actually get taken out of the channel in the code above? Are you saing that with all the checking it does not get to pull messages out? I see clearly how using Maybe with getChanContents will work out perfectly. I don't understand why the above code is inefficient to the point of printing just a few messages (out of hundreds) out on Windows. I would like to understand it to avoid such mistakes in the future. Thanks, Joel On Nov 21, 2005, at 9:56 PM, Tomasz Zielonka wrote:
You seem to be busy waiting. I can see two ways of solving the problem: 1. use STM and non-deterministic choice 2. use a (Chan (Maybe String)), where (Just s) means the next log entry, and Nothing means "break the logger loop"

On Mon, Nov 21, 2005 at 10:41:38PM +0000, Joel Reymont wrote:
STM would complicate things too much for me. At least I think so. I would love to use STM but I would need to fit it into "type ScriptState = ErrorT String (StateT World IO)" just to use the logger. I'm not THAT comfortable with monads.
I am talking about Software Transactional Memory, which is in Control.Concurrent.STM. I think you confused it with State Transformer Monad. In your case STM would allow you to wait simultaneously on (T)MVar and (T)Chan. It would look like this: logger :: TMVar () -> IO () logger die = join $ atomically $ (do x <- readTChan parent return $ do putStrLn x logger die) `orElse` (do takeTMVar die return (return ())) but you have to modify the rest of code to use STM. I modified your Conc.hs to use STM, but using the greater guarantees of STM you could surely simplify it further (see the attached patch).
Let me see if I understand you correctly... Are you saying that I should be using getChanContents in the code below?
I am not proposing to use getChanContents. You are busy-waiting on MVar and Chan. I just proposed a solution to stuff messages and die-request into the same concurrency primitive, so you can wait for both events using a single operation. But you are right (below) that this bug doesn't explain the behaviour of your program. It is "only" a performance bug.
logger :: Handle -> MVar () -> IO () logger h die = do empty <- isEmptyChan parent unless empty $ do x <- readChan parent putStrLn x hPutStrLn h x alive <- isEmptyMVar die when (alive || not empty) $ logger h die
I think using Maybe is a great trick but I'm curious why so few messages actually get taken out of the channel in the code above?
Actually, I am not sure. I just noticed that your code uses a bad coding practice and could be improved. If I find some time I'll try to examine it more closely.
Are you saing that with all the checking it does not get to pull messages out?
As it is, you code can impose a big performance penalty, but indeed it shouldn't change the semantics. Perhaps I miss something.
I see clearly how using Maybe with getChanContents will work out perfectly. I don't understand why the above code is inefficient to the point of printing just a few messages (out of hundreds) out on Windows. I would like to understand it to avoid such mistakes in the future.
Yes, this is strange. Perhaps we're both missing something obvious. Best regards Tomasz

I was under the impression that STM code needed to be in its own monad. I was looking at Control.Concurrent.STM.TChan, for example, where signatures like this exist: newTChan :: STM (TChan a) readTChan :: TChan a -> STM a writeTChan :: TChan a -> a -> STM () and then newTChan :: STM (TChan a) readTChan :: TChan a -> STM a writeTChan :: TChan a -> a -> STM () I guess I should give this another look, re-read the STM paper and check out your patch. Regardless, simple is elegant and your Maybe solution is simple. Thanks, Joel On Nov 22, 2005, at 7:09 AM, Tomasz Zielonka wrote:
I am talking about Software Transactional Memory, which is in Control.Concurrent.STM. I think you confused it with State Transformer Monad.
In your case STM would allow you to wait simultaneously on (T)MVar and (T)Chan. It would look like this:
logger :: TMVar () -> IO () logger die = join $ atomically $ (do x <- readTChan parent return $ do putStrLn x logger die) `orElse` (do takeTMVar die return (return ()))

On Tue, Nov 22, 2005 at 08:30:33AM +0000, Joel Reymont wrote:
I was under the impression that STM code needed to be in its own monad. I was looking at Control.Concurrent.STM.TChan, for example, where signatures like this exist:
newTChan :: STM (TChan a) readTChan :: TChan a -> STM a writeTChan :: TChan a -> a -> STM ()
The STM monad is where synchronisation operations are grouped in transactions. You can use STM as a drop-in replacement for traditional Control.Concurrent synchronisation primitives by simply wrapping every single operation in an "atomically" block: atomically :: STM a -> IO a For example, a drop-in replacement for Chan: type Chan' a = TChan a newChan' = atomically newTChan readChan' c = atomically (readTChan c) writeChan' c v = atomically (writeChan c v) the types of these functions are: newChan' :: IO (TChan a) readChan' :: TChan a -> IO a writeChan' :: TChan a -> a -> IO () But it is only grouping more operations in a transaction that will let you benefit from the wonders of STM :-)
I guess I should give this another look, re-read the STM paper and check out your patch.
You definitely should do it. It is a very rewarding read.
Regardless, simple is elegant and your Maybe solution is simple.
But it also requires that you restructure your code, doesn't it? I am not sure we understood each other here. One way to restructure your code to enable smooth transition to the (Chan (Maybe String)) idea would be to change the type of "die request" from (MVar ()) to (IO ()). You could use (dieVar, die) <- do dieVar <- newEmptyMVar return (dieVar, putMVar dieVar ()) where "dieVar" is used on the receiver side, and die is used on the sender side. Then you could easily use a different notification mechanism for logger: let die = writeChan parent Nothing Best regards Tomasz

Tomasz, I think it's much simpler than that. I just changed the trace function to send Just String down the channel. Whenever I send Nothing (from waitForChildren) the logger just exits. Simple change in two places, no need for MVars. Did I miss anything? The program became much snappier, btw. Joel On Nov 22, 2005, at 8:53 AM, Tomasz Zielonka wrote:
Regardless, simple is elegant and your Maybe solution is simple.
But it also requires that you restructure your code, doesn't it? I am not sure we understood each other here.
One way to restructure your code to enable smooth transition to the (Chan (Maybe String)) idea would be to change the type of "die request" from (MVar ()) to (IO ()). You could use
(dieVar, die) <- do dieVar <- newEmptyMVar return (dieVar, putMVar dieVar ())
where "dieVar" is used on the receiver side, and die is used on the sender side. Then you could easily use a different notification mechanism for logger:
let die = writeChan parent Nothing

On Tue, Nov 22, 2005 at 09:03:55AM +0000, Joel Reymont wrote:
I think it's much simpler than that. I just changed the trace function to send Just String down the channel. Whenever I send Nothing (from waitForChildren) the logger just exits. Simple change in two places, no need for MVars.
Did I miss anything?
Perhaps I did. I had an impression that these MVars where a pattern that you use in other parts of your code. If this is only limited to the logger code then it code could be probably simplified even further.
The program became much snappier, btw.
Did it fix the problem? Best regards Tomasz

Yes in the sense that more than a few lines of code are now printed on Windows. Not in the sense of the topic of this thread but then it seems to be a Mac OSX-only issue. Thanks, Joel On Nov 22, 2005, at 9:14 AM, Tomasz Zielonka wrote:
The program became much snappier, btw.
Did it fix the problem?

Hello Joel, Tuesday, November 22, 2005, 12:03:55 PM, you wrote: JR> I think it's much simpler than that. I just changed the trace JR> function to send Just String down the channel. Whenever I send JR> Nothing (from waitForChildren) the logger just exits. Simple change JR> in two places, no need for MVars. JR> Did I miss anything? The program became much snappier, btw. it is just the same i recommend to you in previous letter (sorry, i don't read patch from Tomasz). it must work, at least the close solution work in my own program :) -- Best regards, Bulat mailto:bulatz@HotPOP.com

Hello Joel, Tuesday, November 22, 2005, 1:41:38 AM, you wrote: JR> logger h die = JR> do empty <- isEmptyChan parent JR> unless empty $ do x <- readChan parent JR> putStrLn x JR> hPutStrLn h x JR> alive <- isEmptyMVar die JR> when (alive || not empty) $ logger h die can you just send Die message through the same Chan? it will be best solution. you can even write: logger h die = pid <- forkIO (readMVar die >> putChan parent DIE) go killThread pid where go = do x <- readChan parent case x of DIE -> return () _ -> do putStrLn x hPutStrLn h x go or try something like this: while isEmptyMVar while not isEmptyChan x <- readChan ... JR> I see clearly how using Maybe with getChanContents will work out JR> perfectly. I don't understand why the above code is inefficient to JR> the point of printing just a few messages (out of hundreds) out on JR> Windows. I would like to understand it to avoid such mistakes in the JR> future. is writing to channel and filling MVar done in different threads? if so, second thread may just get much more attention. and may be your code itself drive to this, for example because you are querying channel state with the same frequency as state of MVar ps: btw, for such sort of tasks like 'go' above i created control structure repeat_whileM. with its help first code will become just: logger h die = withThread (readMVar die >> putChan parent DIE) $ do repeat_whileM (readChan parent) (/=DIE) (\x -> putStrLn x >> hPutStrLn h x) withThread code = bracket (forkIO code) killThread . const repeat_whileM inp cond out = do x <- inp if (cond x) then do out x repeat_whileM inp cond out else return x -- Best regards, Bulat mailto:bulatz@HotPOP.com

This is the approach that I went with, thanks. On Nov 22, 2005, at 9:26 AM, Bulat Ziganshin wrote:
can you just send Die message through the same Chan? it will be best solution. you can even write:
participants (5)
-
Bulat Ziganshin
-
Joel Reymont
-
Keean Schupke
-
Sebastian Sylvan
-
Tomasz Zielonka