
#10414: Buggy behavior with threaded runtime (-N1 working, -N2 getting into <<loop>>) -------------------------------------+------------------------------------- Reporter: exio4 | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by rwbarton): I think I've finally come up with at least part of a plausible explanation. Tell me if this seems right... Suppose I have a heap object whose initial value is {{{ x = \u [] concat [[1],[]] -- using "\u []" as syntax for "updatable thunk", -- otherwise normal Core syntax (not STG) }}} and suppose first some thread evaluates `x` to WHNF. Using the definition of `concat` as `main_go` from above, this will allocate a single-entry thunk `y = \s [] concat []`, and then calling `(++)` will lead to {{{ x = 1 : z y = \s [] concat [] z = \u [] (++) [] y }}} At this point the heap has the following property (*) The heap contains a single-entry thunk (`y`) and a regular thunk (`z`) such that entering the regular thunk will cause the single-entry thunk to be entered as well. (The key point is that the single-entry thunk has already been allocated on the heap, in contrast to a situation in which entering a regular thunk would cause a ''new'' single-entry thunk to be allocated, possibly entered, and then become garbage, all before evaluation of that regular thunk is complete.) Now, consider the following execution path of two threads A and B. * Both threads enter `z` simultaneously (before either manages to overwrite it with a black hole, if eager blackholing was enabled when we compiled `(++)`; otherwise before either manages to update it to an indirection). * Thread A does the case analysis inside `(++)` and enters `y`, and overwrites it with a black hole before thread B does anything else. * Now thread B does the case analysis inside `(++)` and enters `y`, but `y` has been overwritten with a black hole so thread B blocks. But `y` is never going to be updated, so thread B will block forever. This is bad! * Finally thread A finishes evaluating `y` (to `[]`) and then updates `z` accordingly. But thread B is still blocking on the black hole and even if it could get unblocked by some mechanism (say in the GC) there's no longer enough information on the heap to recover the correct value of `y` and allow thread B to continue. This doesn't exactly explain why the programs in this ticket <<loop>>, but a thread becoming permanently blocked is equally bad behavior I think. Some extra evidence that something like this is going on is that if I inline the definition of `(++)` into par2.hs as well, so that it is compiled with eager blackholing enabled, the program <<loop>>s much less frequently, just a few percent of the time as opposed to over half the time. That would match up with a smaller window of simultaneity in the first step of the execution trace above. If this analysis is correct, then assuming we want to continue to allow threads to enter thunks in an unsynchronized way (to avoid prohibitive locking costs), it seems we have to ensure that the condition (*) never holds, at least when eager blackholing is enabled. Generating single-entry thunks is still okay as long as they never survive as live heap objects after the thunk that allocated them has been reduced to WHNF. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10414#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler