
#8006: Asynchronous exception rethrown synchronously inside runStmt -----------------------------+---------------------------------------------- Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Component: Compiler Version: 7.6.3 | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Blockedby: Blocking: | Related: -----------------------------+---------------------------------------------- This bug is related to various bugs to do with asynchronous exceptions [http://hackage.haskell.org/trac/ghc/ticket/3997], [http://hackage.haskell.org/trac/ghc/ticket/5902], as well as the request for a means to terminate a code snippet started with `runStmt` [http://hackage.haskell.org/trac/ghc/ticket/1381]. The only means to terminate such a snippet is to send an asynchronous exception to the thread that executed the `runStmt` (while, in versions of ghc prior to Simon M's patch listed in 1381, making sure that sandboxing is disabled). The problem with this approach is that this exception might interrupt the *typechecker* instead of the snippet. This is problematic, because a relatively large part of the typechecker runs inside a call to `unsafeInterleaveIO`, through a call to `forkM` or `forkM_maybe` from * `loadDecl` in `compiler/iface/LoadIFace.lhs` [[BR]] * `tc_iface_decl`, `tcIfaceDataCons`, `tcIfaceInst`, `tcIfaceFamInst`, `tcIfaceRule`, `tcIfaceVectInfo`, `tcUnfolding`, `tcIfaceWrapper`, `tcPragExpr`, `tcUnfolding`, `tcIfaceWrapper` and `tcPragExpr` in `compiler/iface/TcIFace.lhs` [[BR]] If the asynchronous exception is caught inside `unsafeInterleaveIO` and rethrown synchronously (typically implicitly through a call to `try` or `catch` that only catches exceptions of a certain type -- specifically `IOEnvFailure` in this case) then this exception becomes the value of the thunk and any subsequent attempt to poke that thunk will rethrow the exception. The attached test case illustrates this. After some minimal set up, we repeatedly use `runStmt` to execute a snippet which simply waits for 1 second. Before we do this, we fork a thread which waits a random delay between 0 and 0.1 seconds, and then sends an asynchronous exception to the main thread. We catch this exception, make sure it's the exception we're expecting, and repeat. On every iteration we throw a different exception; however, when we run the code we will get {{{ Unexpected exception CountedThreadKilled 4 Expected (CountedThreadKilled 8) }}} or similar output. In other words, we are catching an *old* exception, that we threw and caught earlier. This must be because this exception has become the value of a thunk, either inside `unsafePerformIO` or `unsafeInterleaveIO` -- and so far the only candidate I have found is the `unsafeInterleaveIO` inside `forkM_maybe`. So this leaves us with two questions: 1. Are there other (relevant) places where `unsafePerformIO` or `unsafeInterleaveIO` are used? I ''think'' the answer to this is is no, but I am not sure. 2. Where do we catch and rethrow exceptions? Unfortunately, question 2 is more difficult to answer. The function `tryM` (`compiler/utils/IOEnv.hs`) wraps around `try`, but it in turn gets wrapped in lots of places: `recoverM`, `recoverTR`, `tryTcErrs`, `checkNoErrs` (this one is used a lot), `mapAndRecoverM` (also used frequently), `tryTc` and `runPlans`. There is a call to `tryM` directly inside the `unsafeInterleaveIO` in `forkM_maybe`, so that's an obvious candidate. However, as an experiment, I replaced the `tryM` with `tryAllM` (not really a solution, as discussed below) and re-ran the tests, and it was still failing in a similar way. This means that this `tryM` cannot be the only culprit, because with this change `forkM` will "re"throw all exceptions as an `GhcException`, never as the original exception (certainly not as the custom exception type that the test is using). Since we were still getting our custom exception, this must have been rethrown elsewhere. Unfortunately `tryM` or one of its variants gets called in lots of places: * `initDs` in `compiler/deSugar/DsMonad.lhs` [[BR]] * `dataConInfoPtrToName` in `compiler/ghci/DebuggerUtils.hs` [[BR]] * `addConstraint`, `cvObtainTerm`, `cvReconstructType` and `congruenceNewtypes` in `compiler/ghci/RtClosureInspect.hs` [[BR]] * `tcPolyBinds`, `tcPolyInfer`, `tcSpecPrags`, `tcImpPrags` and `tcTySigs` in `compiler/typecheck/TcBinds.lhs` [[BR]] * `tcClassDecl2` in `compiler/typecheck/TcClassDcl.lhs` [[BR]] * `check_instance` in `compiler/typecheck/TcDefaults.lhs` [[BR]] * `tcDeriving`, `makeDerivSpecs` and `inferInstanceContexts` in `compiler/typecheck/TcDeriv.lhs` [[BR]] * `tc_hs_type` in `compiler/typecheck/TcHsType.lhs` * `tcInstDecls1`, `tcClsInstDecl` and `tcInstDecl2` in `compiler/typecheck/TcInstDcls.lhs` [[BR]] * `tcRnExtCore`, `tc_rn_src_decls`, `rnTopSrcDecls`, `tcUserStmt`, `tcGhciStmts`, `lookup_rdr_name` in `compiler/typecheck/TcRnDriver.lhs` [[BR]] * `initTc` in `compiler/typecheck/TcRnMonad.lhs` [[BR]] * `tcTopSplice`, `tcTopSpliceExpr`, `tcTopSpliceType`, `runMeta`, `qRecover`, `reifyInstances` in `compiler/typecheck/TcSplice.lhs` [[BR]] * `tcTyAndClassDecls`, `tcTyClGroup` and `checkValidTyCon` in `compiler/typecheck/TcTyClsDecls.lhs` Figuring out which of these may be called inside the `forkM` calls is tricky. I tried running the test with a profiled built of `ghc` with `-fprof-auto` enabled, but unfortunately I was unable to reproduce a test failure with that approach, I'm not sure why. But even if we did find the culprits, fixing it will not be easy because there is no generic solution. In `forkM_maybe` we could return `Nothing` when we catch an asynchronous exception, but then we would call `pgmError` inside `forkM` thereby rethrowing the asynchronous exception as a synchronous `GhcException`. That wouldn't solve anything. Instead, we'd have to rethrow the asynchronous exception using `throwTo`, but then we need to decide what to do after the `throwTo` returns, and that might be a rather difficult question to answer generically for all of the typechecker. Can we just re-run the `thing_inside`? Probably not, there might be all kinds of side effects; and checking for all those side effects would be quite a large task, esp given that there are so many places (list above) where we are caling `tryM` (directly through `forkM` or deeper down the call stack). I think a more realistic solution is to provide a means to detect when the snippet starts running. I outlined such a solution at http://www.haskell.org/pipermail/ghc-devs/2013-June/001380.html, but Simon M objected that this proposal might not work because there is no clear point where type checking finishes and the snippet starts executing. So far I think it would probably be easier to make sure that such a point *does* exist than to fix the asynchronous exception problem. (Note that I have adopted the solution outlined in my email and it seems to work for me, but since this bug is non-deterministic that of course doesn't mean it actually works..) However, I've run out of time to work on this for now. Might be continued. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler