[GHC] #7815: STM fails to validate read.

#7815: STM fails to validate read. ----------------------------------------+----------------------------------- Reporter: fryguybob | Owner: Type: bug | Status: new Priority: normal | Component: Runtime System Version: 7.7 | Keywords: STM Os: Unknown/Multiple | Architecture: x86 Failure: Incorrect result at runtime | Blockedby: Blocking: | Related: ----------------------------------------+----------------------------------- This issue was brought up by napping in {{{#haskell}}} with this paste: [http://hpaste.org/85134] (the first paste in particular) The code is: {{{ import Control.Concurrent.STM import Control.Concurrent import Control.Monad main = do dog <- newTVarIO False cat <- newTVarIO False let unset = do d <- readTVar dog c <- readTVar cat if (d || c) then retry else return () setDog = unset >> writeTVar dog True setCat = unset >> writeTVar cat True oops = do d <- readTVar dog c <- readTVar cat guard (d && c) reset = do writeTVar dog False writeTVar cat False reset' = do d <- readTVar dog c <- readTVar cat guard (d || c) reset forkIO (atomically oops >> putStrLn "Oh Noes!") forever (do forkIO (atomically setDog) forkIO (atomically setCat) atomically reset' atomically reset') }}} When run it produces: {{{ $ ghc --make test.hs -threaded -rtsopts $ ./test +RTS -N2 Oh Noes! test: thread blocked indefinitely in an STM transaction }}} The second message is just a consequence of entering an unexpected state. The first message indicates that both the transactions {{{cat}}} and {{{dog}}} committed at the same time. It does this for HEAD and 7.6. I've sketched out an interleaving that leads to this. TRec entries are in the first and third column and TVar's are in the second column. Each entry has a TVar name and the expected value followed by the new value and then a number of updates if it has been read. TVars list their value and their number of updates. {{{ A TRec TVar B TRec -- Transactions start cat F F cat F 0 cat F F -- Initial reads. dog F F dog F 0 dog F F cat F T dog F T -- Local writes in TRec's -- Validation: cat F F 0 -- B reads num_updates from cat (with ^ -- consistency check with value) cat F T cat A 0 | -- A acquires lock for cat (atomic cas) dog F F 0 ^ | -- A reads num_updates from dog (with ^ | | -- consistency check with value) | dog B 0 dog F T | -- B acquires lock for dog (atomic cas) | | ^ | | | | | Success 0 | 0 | -- read check for A Success 0 0 -- read check for B cat A 1 -- Increment version cat T 1 -- Unlock with new value dog B 1 -- Increment version dog B T -- Unlock with new value }}} What is clear here is that the version number is not enough to check in {{{check_read_only}}} because there is a gap between locking and incrementing the version. We need to know atomically that the TVar is not locked and it's version number is the same. I need to read through the right parts of Keir Fraser's thesis carefully, but it seems like the read phase here is only helpful in preventing a commit that writes back the exact value we have already seen while we are in the middle of committing. The code for {{{check_read_only}}} is here: {{{ #!c static StgBool check_read_only(StgTRecHeader *trec STG_UNUSED) { StgBool result = TRUE; ASSERT (config_use_read_phase); IF_STM_FG_LOCKS({ FOR_EACH_ENTRY(trec, e, { StgTVar *s; s = e -> tvar; if (entry_is_read_only(e)) { TRACE("%p : check_read_only for TVar %p, saw %ld", trec, s, e -> num_updates); if (s -> num_updates != e -> num_updates) { // ||s -> current_value != e -> expected_value) { TRACE("%p : mismatch", trec); result = FALSE; BREAK_FOR_EACH; } } }); }); return result; } }}} If I restore the commented out line (which appears commented out in the first commit of this code that I can find) I can't reproduce the issue, but I think there is still a problem due to the ordering of those checks: we could observe the version as the same, while it is locked, have the TVar unlock, then observe the value the same. Switching the order we can only observe the TVar unlocked if the update has been incremented (as long as we are on an architecture that ensures this such as x86). Does this seem right? One issue is that given the interleaving that above with this added check ''both'' transactions could fail to commit. I think the algorithms from Fraser avoid this, but I think it always involves invalidating another transaction (i.e. killing off any other transactions observed to be in the read check phase with an overlapping TVar in a way that results in only one winner (see page 21 section 4.4 in ''Concurrent Programming Without Locks'')). As a side note, the issue can be avoided by ensuring that the reads become writes and avoiding the read only check. But you can only become a write if the values do not have matching pointers, switching to Ints instead of Bools gets you there. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read. -------------------------------+-------------------------------------------- Reporter: fryguybob | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Keywords: STM | Os: Unknown/Multiple Architecture: x86 | Failure: Incorrect result at runtime Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by simonpj): * difficulty: => Unknown Comment: Tim Harris writes: Yes, this looks like a genuine bug -- surprised it could have lurked here this long! I think the suggested fix is right -- i.e., to replace: {{{ if (s -> num_updates != e -> num_updates) { // ||s -> current_value != e -> expected_value) { }}} with: {{{ // Check for concurrent updates (note: current_value may refer to a TRec // at this point if this TVar is currently locked for making an update). if (s -> current_value != e -> expected_value || s -> num_updates != e -> num_updates) { }}} Could someone execute on this? By which I mean: * Make the change * Add a bit more explanation than Tim does here; perhaps a `Note [blah]` that explains the subtlety, and a reference to this ticket for more details. * Add a regression test, if possible, that * Check that it all validates. Thanks! Amazing job, `fryguybob`, in identifying the problem so precisely. Simon -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read. -------------------------------+-------------------------------------------- Reporter: fryguybob | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Keywords: STM | Os: Unknown/Multiple Architecture: x86 | Failure: Incorrect result at runtime Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by fryguybob): I'm happy to write a patch for this. I think there might also be a problem with {{{num_updates}}} and {{{current_value}}} being different memory locations on architectures with the right kind of relaxed memory ordering (ARM?). I'll look into that and make a separate issue if I find anything. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read. -------------------------------+-------------------------------------------- Reporter: fryguybob | Owner: fryguybob Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Keywords: STM | Os: Unknown/Multiple Architecture: x86 | Failure: Incorrect result at runtime Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by fryguybob): * owner: => fryguybob -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read. -------------------------------+-------------------------------------------- Reporter: fryguybob | Owner: fryguybob Type: bug | Status: patch Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Keywords: STM | Os: Unknown/Multiple Architecture: x86 | Failure: Incorrect result at runtime Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by fryguybob): * status: new => patch -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read. -------------------------------+-------------------------------------------- Reporter: fryguybob | Owner: igloo Type: bug | Status: patch Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Keywords: STM | Os: Unknown/Multiple Architecture: x86 | Failure: Incorrect result at runtime Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by simonpj): * owner: fryguybob => igloo Comment: Thanks! Ian: would you like to check and commit? Simon -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read. -------------------------------+-------------------------------------------- Reporter: fryguybob | Owner: igloo Type: bug | Status: patch Priority: normal | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Keywords: STM | Os: Unknown/Multiple Architecture: x86 | Failure: Incorrect result at runtime Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by igloo): * milestone: => 7.8.1 -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7815: STM fails to validate read.
-------------------------------+--------------------------------------------
Reporter: fryguybob | Owner: igloo
Type: bug | Status: patch
Priority: normal | Milestone: 7.8.1
Component: Runtime System | Version: 7.7
Keywords: STM | Os: Unknown/Multiple
Architecture: x86 | Failure: Incorrect result at runtime
Difficulty: Unknown | Testcase:
Blockedby: | Blocking:
Related: |
-------------------------------+--------------------------------------------
Comment(by ryates@…):
commit 87baa31cb117181ab4f4b26387653996d1792167
{{{
Author: Ryan Yates

#7815: STM fails to validate read. ------------------------------------------+--------------------------------- Reporter: fryguybob | Owner: igloo Type: bug | Status: closed Priority: normal | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: fixed | Keywords: STM Os: Unknown/Multiple | Architecture: x86 Failure: Incorrect result at runtime | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: ------------------------------------------+--------------------------------- Changes (by igloo): * status: patch => closed * resolution: => fixed Comment: Applied, thanks! -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7815#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC