[GHC] #11760: runST with lazy blackholing breaks referential transparency

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- A thunk created with `runST` can be evaluated twice by different threads producing different results. An example (taken from https://twitter.com/obadzz/status/714081240475951105): {{{#!hs {-# LANGUAGE RecordWildCards #-} import qualified Data.STRef.Lazy as S import Control.Monad import Control.Monad.ST.Lazy import Control.Concurrent data ListRef s a = ListRef { element :: a , readCounter :: Int , rest :: Maybe (S.STRef s (ListRef s a)) } toList :: S.STRef s (ListRef s a) -> ST s [(a, Int)] toList r = do ListRef{..} <- S.readSTRef r S.modifySTRef r $ \e -> e { readCounter = readCounter + 1 } xs <- maybe (return []) toList rest return $ (element, readCounter) : xs circularList :: ST s (S.STRef s (ListRef s Char)) circularList = do x3 <- S.newSTRef (ListRef 'c' 0 Nothing) x2 <- S.newSTRef (ListRef 'b' 0 (Just x3)) x1 <- S.newSTRef (ListRef 'b' 0 (Just x2)) S.modifySTRef x3 $ \e -> e { rest = Just x1 } return x1 l :: [(Char, Int)] l = take 15 $ runST $ circularList >>= toList main :: IO () main = do void $ forkIO $ print l void $ forkIO $ print l void getLine print l }}} The output (run multiple times to reproduce): {{{ $ ghc --make -O -threaded -outputdir=.build test.hs [1 of 1] Compiling Main ( test.hs, .build/Main.o ) Linking test ... $ ./test +RTS -N2 [('b',0),('b',0),('c',0),('b',1),('b',1),('c',1),('b',2),('b',2),('c',2),('b',3),('b',3),('c',3),('b',5),('b',4),('c',4)] [('b',0),('b',0),('c',0),('b',1),('b',1),('c',1),('b',2),('b',2),('c',2),('b',3),('b',3),('c',3),('b',4),('b',4),('c',4)] [('b',0),('b',0),('c',0),('b',1),('b',1),('c',1),('b',2),('b',2),('c',2),('b',3),('b',3),('c',3),('b',4),('b',4),('c',4)] $ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.10.1 $ }}} Note that the last 3 elements are `('b',5),('b',4),('c',4)` or `('b',4),('b',4),('c',4)`. I was able to reproduce it with few weeks old HEAD. With `-feager- blackholing` it works as expected. `unsafePerformIO` uses `noDuplicate` to prevent such kind of issue. Should `runST` do something similar? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: simonmar, rnewton (added) Comment: This looks bad. Do you have a more detailed theory? Our thinking about lazy blackholing is that if two threads start to evaluate the same thunk, they may waste work but shoudl still get the same value. But that is not happening here and I'd like to understand why. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

This looks bad. Do you have a more detailed theory? Our thinking about lazy blackholing is that if two threads start to evaluate the same
#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Yuras): Replying to [comment:1 simonpj]: thunk, they may waste work but shoudl still get the same value. But that is not happening here and I'd like to understand why. Reeveluating a thunk gives the same result only for pure computation, it is clearly not the case for (lazy) `ST`. In this particular case, the thunk captures mutable variables, and we have a race condition when two threads access them. The result depends on ordering. Also, I don't think `-feager-blackholing` actually fixes the issue. AFAIK it doesn't introduce a write barrier then replacing a thunk with a hole, so the race is still possible (though is harder to reproduce). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Wow. Great bug. This should not happen. I believe the problem is specific to the lazy ST monad (`Control.Monad.ST.Lazy`), but your example relies on this so it's hard to test the theory. What happens is that the top-level runST thunk gets updated with `(x : xs)` where `xs` is itself a lazy ST computation that refers to mutable state. At this point it is possible for two threads to evaluate the inner thunk and get different results. We could use `noDuplicate#` in the lazy ST monad which should prevent this, but it will make lazy ST much more expensive. Maybe that's ok. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by Yuras): * failure: None/Unknown => Incorrect result at runtime Comment: Yes, I agree that it is specific to lazy `ST`. Strict one serializes access to mutable refs, and entering the thunk twice just creates a fresh set on refs. Re performance in presence of `noDuplicate#`: I don't have a use case for lazy `ST` actually, I just found the snippet on twitter. But it looks like lazy `ST` is useless because of the issue, so probably trading performance for correctness makes sense here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by obadz): My 2 cents is that we shouldn't have pure values that evaluate to non- deterministic/thread-dependent values unless an function marked `unsafe` was called. Maybe the current implementation of `Control.Monad.ST.Lazy.runST` can be moved to `Control.Monad.ST.Lazy.unsafeRunST` and replaced by one that use `noDuplicate#`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by NeilMitchell): It turns out the widely used zlib library suffers from exactly this issue, and can segfault when executed in parallel. See https://github.com/haskell/zlib/issues/7#issuecomment-253028108 for more details. It seems like adding {{{noDuplicate}}} to {{{runST}}} and adding {{{unsafeRunST}}} is a reasonable compromise. There may be a performance impact on {{{runST}}}, but having segfaults/value-changes should require typing something with the word {{{unsafe}}} in it at some point. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I agree with Neil here. But is it just `runST`? Every time we build a thunk whose evaluation will mutate state, we must ensure that thunk can be evaluated at most once (no- duplicate). And doesn't every use of `bind` in the lazy ST world have that property? From `Control.Monad.Lazy.ST.Lazy.Imp`: {{{ (ST m) >>= k = ST $ \ s -> let (r,new_s) = m s ST k_a = k r in k_a new_s }}} That `m s` thunk must be evaluated at most once. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Correct, it's not just `runST`, we would need to use `noDuplicate#` in the definition of `>>=`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Is #9494 the same issue as this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by duncan): Right, I was also going to say it'd have to be in `>>=`, not just `runST`. The type of `noDuplicate#` should be generalised to not limit it to `RealWorld`. Is this going to be horribly expensive? Do we need rules to merge duplicate `noDuplicate#`s? In IO it's clear when we pay this expense since we do it via unsafeInterleaveIO, but in lazy ST that just corresponds to "safe" bind. For now I'll add some `noDuplicate`s in the zlib binding. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: dfeuer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * priority: normal => highest * owner: => dfeuer * milestone: => 8.2.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: dfeuer Type: bug | Status: patch Priority: highest | Milestone: 8.2.1 Component: Core Libraries | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D3038 Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: ekmett (added) * status: new => patch * differential: => Phab:D3038 * component: Compiler => Core Libraries -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: dfeuer Type: bug | Status: patch Priority: highest | Milestone: 8.2.1 Component: Core Libraries | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D3038 Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): We still need a more reliable test case. I managed to make a variation of this one fail moderately reliably using lots more threads, but I have an idea for another approach that's probably cheaper and more reliable; I'll try to get to that shortly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: dfeuer Type: bug | Status: patch Priority: highest | Milestone: 8.2.1 Component: Core Libraries | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D3038 Wiki Page: | -------------------------------------+------------------------------------- Comment (by int-e): Here's an attempt at a more reliable testcase: {{{ {-# LANGUAGE BangPatterns #-} import Control.Concurrent import Control.Monad import Control.Monad.ST.Lazy import Control.Exception import Data.STRef import Data.IORef import Control.Concurrent.MVar import Data.List -- evil ST action that tries to synchronize (by busy waiting on the -- shared STRef) with a concurrent evaluation evil :: ST s [Int] evil = do r <- strictToLazyST $ newSTRef 0 replicateM 100 $ do i <- strictToLazyST $ readSTRef r let !j = i + 1 strictToLazyST $ writeSTRef r j let go 0 = return () go n = do i' <- strictToLazyST $ readSTRef r when (j == i') $ go (n-1) go 100 return j main = do let res = runST evil s0 <- newIORef (map pred (0 : res)) s1 <- newIORef (map pred (1 : res)) m0 <- newMVar () m1 <- newMVar () forkIO $ do putMVar m0 () readIORef s0 >>= evaluate . foldl' (+) 0 putMVar m0 () forkIO $ do putMVar m1 () readIORef s1 >>= evaluate . foldl' (+) 0 putMVar m1 () threadDelay 10000 replicateM 3 $ takeMVar m0 >> takeMVar m1 v0 <- tail <$> readIORef s0 v1 <- tail <$> readIORef s1 print (v0 == v1) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11760: runST with lazy blackholing breaks referential transparency
-------------------------------------+-------------------------------------
Reporter: Yuras | Owner: dfeuer
Type: bug | Status: patch
Priority: highest | Milestone: 8.2.1
Component: Core Libraries | Version: 7.10.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Incorrect result | Unknown/Multiple
at runtime | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D3038
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by David Feuer

#11760: runST with lazy blackholing breaks referential transparency -------------------------------------+------------------------------------- Reporter: Yuras | Owner: dfeuer Type: bug | Status: closed Priority: highest | Milestone: 8.2.1 Component: Core Libraries | Version: 7.10.3 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D3038 Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * status: patch => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11760#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC