
Hello, In one of my example programs I have a strange behaviour: it is a very simple taskpool using STM; in pseudocode it's 1. generate data structures 2. initialize data structures 3. fork threads 4. wait (using STM) until the pool is empty and all threads are finished 5. print a final message In very few cases, which depend on the number of threads spawned, the program hangs *after* the final message of step 5 has been printed. "Few cases" means, for example, 50.000 good, terminating runs before it hangs. If you increment the number of spawned threads (to a few hundred or thousands), it hangs much faster. Since forked threads terminate after the main thread terminates (which it should after printing the message), this behaviour is quite unexpected. Since I've experienced strange behaviour in the past which was the fault of my system configuration[1], I am a bit cautious before reporting a bug on GHC's bugtracker, especially since its reproduction is so difficult and random. So my question is how much circumspection is expected/needed before one should enter a bug in the bug tracker? I've tested the attached code on three different systems (with different linux systems, but always GHC 6.12.1 (since it's a bit costly to install the older versions)) and observed the mentioned behaviour. Is this enough to justify a bug report? Or, on the other hand, could someone spot the error in the attached code. Given my history with strange parallel behaviour, I am much more sure that it's the fault of my code, but I can't spot the error and the described behaviour (halting *after* the final message) is really strange. Addendum: Daniel Fischer could reproduce the problem and pointed out[2], that making the evaluation of the TVAR's value strict does not reproduce the behaviour. This is even stranger in this context; I don't see, how lazy evaluation can change the behaviour of my code. Cheers, Michael [1] http://www.haskell.org/pipermail/haskell-cafe/2010-March/073938.html [2] http://www.haskell.org/pipermail/haskell-cafe/2010-March/074520.html -- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering Wilhelmshöher Allee 73 34121 Kassel Phone: +49-(0)561-804-6269

On 15/03/2010 08:59, Michael Lesniak wrote:
Hello,
In one of my example programs I have a strange behaviour: it is a very simple taskpool using STM; in pseudocode it's
1. generate data structures 2. initialize data structures 3. fork threads 4. wait (using STM) until the pool is empty and all threads are finished 5. print a final message
In very few cases, which depend on the number of threads spawned, the program hangs *after* the final message of step 5 has been printed. "Few cases" means, for example, 50.000 good, terminating runs before it hangs. If you increment the number of spawned threads (to a few hundred or thousands), it hangs much faster. Since forked threads terminate after the main thread terminates (which it should after printing the message), this behaviour is quite unexpected.
I've fixed three deadlocks since 6.12.1 was released: two were IO
manager-related, and one caused by an interaction between the scheduler
and GC. It's likely that one of these is your problem. All of them are
fixed in 6.12.2, so if you are able to grab a snapshot and test it that
would be very helpful.
Tue Mar 9 09:58:31 GMT 2010 Simon Marlow
Since I've experienced strange behaviour in the past which was the fault of my system configuration[1], I am a bit cautious before reporting a bug on GHC's bugtracker, especially since its reproduction is so difficult and random.
I've been doing a lot of testing recently that involves running a program repeatedly in a loop until it goes wrong, such is the nature of non-deterministic concurrency :-)
So my question is how much circumspection is expected/needed before one should enter a bug in the bug tracker? I've tested the attached code on three different systems (with different linux systems, but always GHC 6.12.1 (since it's a bit costly to install the older versions)) and observed the mentioned behaviour. Is this enough to justify a bug report?
Sure, by all means submit a bug report. As mentioned earlier, you might be able to avoid doing so if you find that the 6.12.2 snapshot fixes it, though. Cheers, Simon

Hello Simon,
GC. It's likely that one of these is your problem. All of them are fixed in 6.12.2, so if you are able to grab a snapshot and test it that would be very helpful. Where can I get version is 6.12.2? According to [1], there are both 6.13... and 6.12.1, but I did not find 6.12.2. Any hints?
Cheers, Michae/ [1] http://www.haskell.org/ghc/download.html

On 15/03/2010 12:06, Michael Lesniak wrote:
Hello Simon,
GC. It's likely that one of these is your problem. All of them are fixed in 6.12.2, so if you are able to grab a snapshot and test it that would be very helpful. Where can I get version is 6.12.2? According to [1], there are both 6.13... and 6.12.1, but I did not find 6.12.2. Any hints?
The snapshot distribtuions on the stable branch are here: http://www.haskell.org/ghc/dist/stable/dist/ pick the latest 6.12.1.* snapshot. Cheers, Simon

Hello Simon, with 6.12.1.20100313 the behaviour is worse: even when using $! in the appropiate lines (see [2] in my original message) the programs hangs quite often. Hence, 6.12.1 works better in this (special?) case. Any other things I can do to help identifying the problem? Cheers, Michael -- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering Wilhelmshöher Allee 73 34121 Kassel Phone: +49-(0)561-804-6269

On 15/03/10 16:02, Michael Lesniak wrote:
Hello Simon,
with 6.12.1.20100313 the behaviour is worse: even when using $! in the appropiate lines (see [2] in my original message) the programs hangs quite often. Hence, 6.12.1 works better in this (special?) case.
Ok, I'll look into it, thanks for the report. Cheers, Simon

On 15/03/2010 22:03, Simon Marlow wrote:
On 15/03/10 16:02, Michael Lesniak wrote:
Hello Simon,
with 6.12.1.20100313 the behaviour is worse: even when using $! in the appropiate lines (see [2] in my original message) the programs hangs quite often. Hence, 6.12.1 works better in this (special?) case.
Ok, I'll look into it, thanks for the report.
I reproduced the deadlock, and it looks like a new one: a lock order reversal between Schedule.c:checkBlackHoles() and RtsAPI.c:rts_unlock(). It turns out I've already fixed it in the HEAD as a side effect of some other improvements, so I'm going to try to bring those into the 6.12.2 branch. Cheers, Simon

Hello Simon,
I reproduced the deadlock, and it looks like a new one: a lock order reversal between Schedule.c:checkBlackHoles() and RtsAPI.c:rts_unlock(). It turns out I've already fixed it in the HEAD as a side effect of some other improvements, so I'm going to try to bring those into the 6.12.2 branch. Great! Thanks!
Cheers, Michael -- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering Wilhelmshöher Allee 73 34121 Kassel Phone: +49-(0)561-804-6269

On 16/03/2010 14:34, Michael Lesniak wrote:
Hello Simon,
I reproduced the deadlock, and it looks like a new one: a lock order reversal between Schedule.c:checkBlackHoles() and RtsAPI.c:rts_unlock(). It turns out I've already fixed it in the HEAD as a side effect of some other improvements, so I'm going to try to bring those into the 6.12.2 branch. Great! Thanks!
Oh, and I found a bug in the parallel GC too. Nice example. Cheers, Simon
participants (2)
-
Michael Lesniak
-
Simon Marlow