Race-condition in alternative 'System.Timeout.timeout' implementation

Hello *, I've been experimenting with an alternative implementation of 'System.Timeout.timeout'[1] which avoids the overhead of spawning a new thread for each invocation. Part of my motivation is to see if I can implement a faster version of threadWaitReadTimeout :: Int -> Fd -> IO Bool threadWaitReadTimeout to = liftM (maybe False (const True)) . timeout to . threadWaitRead and thus exploit GHC's event notification system instead of having to reimplement a timeout-manager myself (like popular HTTP server libraries such as Snap or Warp do currently). The following Haskell program shows a proof-of-concept implementation derived directly from 'System.Timeout.timeout' together with a Criterion benchmark comparing the performance between the original and the alternative 'timeout' function wrapping a 'readMVar' call. On a i7-3770 with GHC-7.6.2/Linux/64bit ran with "+RTS -A4m -N4", the benchmark shows a 15x improvement for the new implementation (below 1 uS) compared to the original implementation (~13 uS): ,---- | benchmarking id | mean: 22.60933 ns, lb 22.50331 ns, ub 22.73515 ns, ci 0.950 | std dev: 591.0383 ps, lb 509.6189 ps, ub 663.2670 ps, ci 0.950 | found 17 outliers among 100 samples (17.0%) | 17 (17.0%) high mild | variance introduced by outliers: 19.992% | variance is moderately inflated by outliers | | benchmarking timeout_1ms | mean: 13.79584 us, lb 13.62939 us, ub 13.92814 us, ci 0.950 | std dev: 756.3080 ns, lb 524.7628 ns, ub 1.068547 us, ci 0.950 | found 14 outliers among 100 samples (14.0%) | 4 (4.0%) low severe | 5 (5.0%) high mild | 5 (5.0%) high severe | variance introduced by outliers: 52.484% | variance is severely inflated by outliers | | benchmarking timeout2_1ms | mean: 879.8152 ns, lb 874.5223 ns, ub 885.9759 ns, ci 0.950 | std dev: 29.31963 ns, lb 25.65941 ns, ub 32.98116 ns, ci 0.950 | found 9 outliers among 100 samples (9.0%) | 9 (9.0%) high mild | variance introduced by outliers: 28.734% | variance is moderately inflated by outliers | ... `---- Alas there's a race-condition hidden somewhere I'm struggling with; When the timeout is set low enough, the internal 'Timeout2' exceptions leaks outside the 'timeout2' wrapper: ,---- | ... | benchmarking timeout2_10us | newtimeout: <<timeout2>> `---- I've tried rewriting the code but couldn't figure out a way to keep the exception from escaping 'timeout2'. Does the race-condition actually lie in the 'timeout2' implementation -- and if so, is it possible to rewrite 'timeout2' to solve it? [1]: http://hackage.haskell.org/packages/archive/base/latest/doc/html/System-Time... cheers, hvr

On Sun, Feb 24, 2013 at 2:31 PM, Herbert Valerio Riedel
I've been experimenting with an alternative implementation of 'System.Timeout.timeout'[1] which avoids the overhead of spawning a new thread for each invocation.
Part of my motivation is to see if I can implement a faster version of
threadWaitReadTimeout :: Int -> Fd -> IO Bool threadWaitReadTimeout to = liftM (maybe False (const True)) . timeout to . threadWaitRead
and thus exploit GHC's event notification system instead of having to reimplement a timeout-manager myself (like popular HTTP server libraries such as Snap or Warp do currently).
The following Haskell program shows a proof-of-concept implementation derived directly from 'System.Timeout.timeout' together with a Criterion benchmark comparing the performance between the original and the alternative 'timeout' function wrapping a 'readMVar' call.
On a i7-3770 with GHC-7.6.2/Linux/64bit ran with "+RTS -A4m -N4", the benchmark shows a 15x improvement for the new implementation (below 1 uS) compared to the original implementation (~13 uS):
Fast timeouts is really important for real world web servers, which typically need one timeout per connection (e.g. to avoid slowloris DOS attacks). We should make sure timeouts are as cheap and fast as possible. This seems like a step in the right direction. -- Johan

You might want to take a look at
https://github.com/alphaHeavy/timeout-control/blob/master/System/Timeout/Con...,
though I'd guess it is subject to the same race condition. I have a
few other fixes (for dealing with lifted bracket iirc) I still need to
merge back from a private branch.
On Sun, Feb 24, 2013 at 2:31 PM, Herbert Valerio Riedel
Hello *,
I've been experimenting with an alternative implementation of 'System.Timeout.timeout'[1] which avoids the overhead of spawning a new thread for each invocation.
Part of my motivation is to see if I can implement a faster version of
threadWaitReadTimeout :: Int -> Fd -> IO Bool threadWaitReadTimeout to = liftM (maybe False (const True)) . timeout to . threadWaitRead
and thus exploit GHC's event notification system instead of having to reimplement a timeout-manager myself (like popular HTTP server libraries such as Snap or Warp do currently).
The following Haskell program shows a proof-of-concept implementation derived directly from 'System.Timeout.timeout' together with a Criterion benchmark comparing the performance between the original and the alternative 'timeout' function wrapping a 'readMVar' call.
On a i7-3770 with GHC-7.6.2/Linux/64bit ran with "+RTS -A4m -N4", the benchmark shows a 15x improvement for the new implementation (below 1 uS) compared to the original implementation (~13 uS):
,---- | benchmarking id | mean: 22.60933 ns, lb 22.50331 ns, ub 22.73515 ns, ci 0.950 | std dev: 591.0383 ps, lb 509.6189 ps, ub 663.2670 ps, ci 0.950 | found 17 outliers among 100 samples (17.0%) | 17 (17.0%) high mild | variance introduced by outliers: 19.992% | variance is moderately inflated by outliers | | benchmarking timeout_1ms | mean: 13.79584 us, lb 13.62939 us, ub 13.92814 us, ci 0.950 | std dev: 756.3080 ns, lb 524.7628 ns, ub 1.068547 us, ci 0.950 | found 14 outliers among 100 samples (14.0%) | 4 (4.0%) low severe | 5 (5.0%) high mild | 5 (5.0%) high severe | variance introduced by outliers: 52.484% | variance is severely inflated by outliers | | benchmarking timeout2_1ms | mean: 879.8152 ns, lb 874.5223 ns, ub 885.9759 ns, ci 0.950 | std dev: 29.31963 ns, lb 25.65941 ns, ub 32.98116 ns, ci 0.950 | found 9 outliers among 100 samples (9.0%) | 9 (9.0%) high mild | variance introduced by outliers: 28.734% | variance is moderately inflated by outliers | ... `----
Alas there's a race-condition hidden somewhere I'm struggling with; When the timeout is set low enough, the internal 'Timeout2' exceptions leaks outside the 'timeout2' wrapper:
,---- | ... | benchmarking timeout2_10us | newtimeout: <<timeout2>> `----
I've tried rewriting the code but couldn't figure out a way to keep the exception from escaping 'timeout2'. Does the race-condition actually lie in the 'timeout2' implementation -- and if so, is it possible to rewrite 'timeout2' to solve it?
[1]: http://hackage.haskell.org/packages/archive/base/latest/doc/html/System-Time...
cheers, hvr
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Dear Herbert,
I've been experimenting with an alternative implementation of 'System.Timeout.timeout'[1] which avoids the overhead of spawning a new thread for each invocation.
Be warned that timeouts are very intricate. We had a lengthy discussion on the topic 2 years ago, starting at http://www.haskell.org/pipermail/libraries/2011-February/015876.html There was even an IO manager based proposal similar to yours: http://hackage.haskell.org/trac/ghc/ticket/4963 (What's the busyWontTimeout benchmark mentioned there?) http://www.haskell.org/pipermail/libraries/2011-February/015953.html The main trouble with the IO manager based approach is that even after unregisterTimeout finished, the timeout may still be invoked, and additional work is needed to protect against that. (I have more to say on this, but will postpone it until later. A lot of it has already been said in the earlier thread anyway.) Best regards, Bertram

Bertram Felgenhauer wrote:
Dear Herbert,
I've been experimenting with an alternative implementation of 'System.Timeout.timeout'[1] which avoids the overhead of spawning a new thread for each invocation.
(I have more to say on this, but will postpone it until later. A lot of it has already been said in the earlier thread anyway.)
The main trouble with the IO manager based approach is that even after unregisterTimeout finished, the timeout may still be invoked. It's possible to protect against the exception arriving after 'timeout' has returned using an MVar, using a timeout handler like E.registerTimeout em to $ do t <- tryTakeMVar m when (isJust t) (throwTo tid ex) Similarly the main thread can use tryTakeMVar to check whether the timeout exception is about to arrive or not. If no such exception is pending, everything is fine. However, if the exception is pending, we have another problem: It is thrown by a different thread, so we don't know when it will arrive. In the meantime, *other* asynchronous exceptions (for example from different timeout calls, but also unrelated throwTo/killThread calls) may arrive that should *all* be propagated to the caller. It's fairly straight-forward to collect the arriving exceptions in a list, waiting for the expected Timeout one to arrive. But we cannot raise more than one exception synchronously at a time. This is fatal: While it ispossible to spawn a thread to re-throw the exceptions, this breaks the guarantees of synchronous delivery that 'throwTo' has (in ghc), for code outside of the timeout call: A: starts executing timeout foo B: killThread A A: receives exception X, ThreadKilled and Timeout simultaneously. A: spawns thread K for throwing ThreadKilled, re-raises X A: catches and handles 'X' A: killThread B B: receives ThreadKilled, dies K: re-throws ThreadKilled to A A: receives ThreadKilled, dies Without the delayed delivery of the 'ThreadKilled' exception of A, only one of the threads A and B would ever die. A possible solution might be a primitive operation that raises multiple exception at once (it would have to raise one of them and enqueue the other ones in the TSO's message queue.) Probably not worth the effort. A related, but less nasty problem also affects System.Timeout.timeout currently: http://hackage.haskell.org/trac/ghc/ticket/7719 Best regards, Bertram -- Best effort implementation using the event manager, taking the -- comments above into account, and lacking a proper way of raising -- multiple exceptions synchronously. -- -- The code is quite complicated, so there may be other flaws still. timeout2 :: Int -> IO a -> IO (Maybe a) timeout2 to f | to < 0 = fmap Just f | to == 0 = return Nothing | otherwise = do tid <- myThreadId ex <- fmap Timeout2 newUnique Just em <- E.getSystemEventManager -- FIXME m <- newMVar () let -- timeout handler: deliver timeout exception if m is still full timeout = do t <- tryTakeMVar m when (isJust t) $ do throwTo tid ex -- keep m alive, to prevent 'takeMVar m' from raising -- 'blocked indefinitely' exceptions in the main thread m `seq` return () -- loop, collecting exceptions until the right one arrives. loop es e | fromException e == Just ex = case reverse es of [] -> return () [e] -> throwIO e e:es -> -- we have collected more than one exception, -- so employ outside help for delivery forkIO (mapM_ (throwTo tid) es) >> throwIO e | otherwise = do -- 'takeMVar m' blocks until an exception arrives takeMVar m `catch` loop (e:es) error "not reached" mask $ \restore -> do hdl <- E.registerTimeout em to timeout r <- restore (fmap Just f) `catch` \e -> do E.unregisterTimeout em hdl t <- tryTakeMVar m case t of Just _ -> -- timeout prevented, simply re-raise e throwIO (e :: SomeException) Nothing -> -- have to wait for the timeout exception loop [] e >> return Nothing when (isJust r) $ do -- our computation was successful, but we still have -- to clean up the timeout handler E.unregisterTimeout em hdl t <- tryTakeMVar m case t of Just _ -> -- timeout prevented return () Nothing -> -- wait for timeout exception takeMVar m `catch` loop [] error "not reached" return r

Hi Bertram, Thank you for the explanation. My previous attempt obviously suffers from the race condition you mention. However it still seems to be possible to implement a compromise, using both the IO manager and a new thread, i.e. forking only when the computation is being timed out. The following implementation is as fast as Herbert's timeout2, at least in the benchmark where the computation rarely times out. - Takano Akio -- | Alternative implementation of 'System.Timeout.timeout' using -- 'GHC.Event.registerTimeout' AND a watchdog-thread. timeout4 :: Int -> IO a -> IO (Maybe a) timeout4 to f | to < 0 = fmap Just f | to == 0 = return Nothing | otherwise = do mainTid <- myThreadId ex <- fmap Timeout2 newUnique Just em <- E.getSystemEventManager -- FIXME killingThreadVar <- newEmptyMVar let timeoutHandler = (>>return ()) $ forkIO $ do killingTid <- myThreadId success <- tryPutMVar killingThreadVar killingTid when success $ throwTo mainTid ex cleanupTimeout key = uninterruptibleMask_ $ do -- Once the thread is in this uninterruptible block, -- it never receives the exception 'ex' because: -- (1) when we are in the uninterruptible block, -- all attept of throwTo to kill this thread -- will block. -- (2) the killing thread will either fail to fill -- 'killingThreadVar' or get killed before -- this thread exits the block. success <- tryPutMVar killingThreadVar undefined when (not success) $ do killingTid <- takeMVar killingThreadVar -- never blocks killThread killingTid E.unregisterTimeout em key handleJust (\e -> if e == ex then Just () else Nothing) (\_ -> return Nothing) (bracket (E.registerTimeout em to timeoutHandler) cleanupTimeout (\_ -> fmap Just f))

Akio Takano wrote:
Thank you for the explanation. My previous attempt obviously suffers from the race condition you mention.
However it still seems to be possible to implement a compromise, using both the IO manager and a new thread, i.e. forking only when the computation is being timed out. The following implementation is as fast as Herbert's timeout2, at least in the benchmark where the computation rarely times out.
Brilliant! I believe this version will work; the Timeout exception cannot escape the timeout call anymore by the same reasoning as in System.Timeout.timeout (with the bugfix for 7719 which consists solely of adding uninterruptibleMask_ around killThread); the main difference is that the creation of the killing thread is delayed until it is actually needed. (I also love the dual purpose 'killingThreadVar' MVar.) Maybe it's time to reopen #4963? http://hackage.haskell.org/trac/ghc/ticket/4963 Thanks, Bertram
-- | Alternative implementation of 'System.Timeout.timeout' using -- 'GHC.Event.registerTimeout' AND a watchdog-thread. timeout4 :: Int -> IO a -> IO (Maybe a) timeout4 to f | to < 0 = fmap Just f | to == 0 = return Nothing | otherwise = do mainTid <- myThreadId ex <- fmap Timeout2 newUnique Just em <- E.getSystemEventManager -- FIXME killingThreadVar <- newEmptyMVar
let timeoutHandler = (>>return ()) $ forkIO $ do killingTid <- myThreadId success <- tryPutMVar killingThreadVar killingTid when success $ throwTo mainTid ex cleanupTimeout key = uninterruptibleMask_ $ do -- Once the thread is in this uninterruptible block, -- it never receives the exception 'ex' because: -- (1) when we are in the uninterruptible block, -- all attept of throwTo to kill this thread -- will block. -- (2) the killing thread will either fail to fill -- 'killingThreadVar' or get killed before -- this thread exits the block. success <- tryPutMVar killingThreadVar undefined when (not success) $ do killingTid <- takeMVar killingThreadVar -- never blocks killThread killingTid E.unregisterTimeout em key
The unregisterTimeout has no effect if success is not set, so why not use if-then-else?
handleJust (\e -> if e == ex then Just () else Nothing) (\_ -> return Nothing) (bracket (E.registerTimeout em to timeoutHandler) cleanupTimeout (\_ -> fmap Just f))

Herbert Valerio Riedel
(bracket (E.registerTimeout em to (throwTo tid ex)) (E.unregisterTimeout em) (\_ -> fmap Just f))
...after some discussion on #ghc, I've realized, that 'registerTimeout' is dangerous if used improperly; it should be avoided to call any blocking operation (or throw exceptions) in the timeout-handler, as otherwise the I/O manager loop stops processing new events (at least with GHC-7.6.2) until the timeout-action completes; the following code demonstrates this issue by triggering a neverending timeout-action which effectively makes the Haskell process non-responsive. --8<---------------cut here---------------start------------->8--- import Control.Concurrent import qualified GHC.Event as E messupEventManager :: IO () messupEventManager = do mv <- newMVar () Just em <- E.getSystemEventManager E.registerTimeout em 5000000 (putStrLn "...blocking NOW!" >> putMVar mv ()) putStrLn "...in about 5 seconds the I/O manager will get stuck..." --8<---------------cut here---------------end--------------->8--- So maybe a warning in the documentation of registerTimeout may be appropriate telling users of registerTimeout that care should be taken to avoid operations blocking for non-negligible time (or throwing exceptions) in the timeout-handler, as otherwise in the best case the I/O processing latency suffers and in the worst case the I/O manager may come to a halt altogether. cheers, hvr
participants (5)
-
Akio Takano
-
Bertram Felgenhauer
-
Herbert Valerio Riedel
-
Johan Tibell
-
Nathan Howell