[GHC] #8006: Asynchronous exception rethrown synchronously inside runStmt

21 Jun 2013

      #8006: Asynchronous exception rethrown synchronously inside runStmt
-----------------------------+----------------------------------------------
Reporter:  edsko             |          Owner:                  
    Type:  bug               |         Status:  new             
Priority:  normal            |      Component:  Compiler        
 Version:  7.6.3             |       Keywords:                  
      Os:  Unknown/Multiple  |   Architecture:  Unknown/Multiple
 Failure:  None/Unknown      |      Blockedby:                  
Blocking:                    |        Related:                  
-----------------------------+----------------------------------------------
 This bug is related to various bugs to do with asynchronous exceptions
 [http://hackage.haskell.org/trac/ghc/ticket/3997],
 [http://hackage.haskell.org/trac/ghc/ticket/5902], as well as the request
 for a means to terminate a code snippet started with `runStmt`
 [http://hackage.haskell.org/trac/ghc/ticket/1381].

 The only means to terminate such a snippet is to send an asynchronous
 exception to the thread that executed the `runStmt` (while, in versions of
 ghc prior to Simon M's patch listed in 1381, making sure that sandboxing
 is disabled). The problem with this approach is that this exception might
 interrupt the *typechecker* instead of the snippet.

 This is problematic, because a relatively large part of the typechecker
 runs inside a call to `unsafeInterleaveIO`, through a call to `forkM` or
 `forkM_maybe` from

 * `loadDecl` in `compiler/iface/LoadIFace.lhs` [[BR]]
 * `tc_iface_decl`, `tcIfaceDataCons`, `tcIfaceInst`, `tcIfaceFamInst`,
 `tcIfaceRule`, `tcIfaceVectInfo`, `tcUnfolding`, `tcIfaceWrapper`,
 `tcPragExpr`, `tcUnfolding`, `tcIfaceWrapper` and `tcPragExpr` in
 `compiler/iface/TcIFace.lhs` [[BR]]

 If the asynchronous exception is caught inside `unsafeInterleaveIO` and
 rethrown synchronously (typically implicitly through a call to `try` or
 `catch` that only catches exceptions of a certain type -- specifically
 `IOEnvFailure` in this case) then this exception becomes the value of the
 thunk and any subsequent attempt to poke that thunk will rethrow the
 exception.

 The attached test case illustrates this. After some minimal set up, we
 repeatedly use `runStmt` to execute a snippet which simply waits for 1
 second. Before we do this, we fork a thread which waits a random delay
 between 0 and 0.1 seconds, and then sends an asynchronous exception to the
 main thread. We catch this exception, make sure it's the exception we're
 expecting, and repeat. On every iteration we throw a different exception;
 however, when we run the code we will get

 {{{
 Unexpected exception CountedThreadKilled 4
 Expected (CountedThreadKilled 8)
 }}}

 or similar output. In other words, we are catching an *old* exception,
 that we threw and caught earlier. This must be because this exception has
 become the value of a thunk, either inside `unsafePerformIO` or
 `unsafeInterleaveIO` -- and so far the only candidate I have found is the
 `unsafeInterleaveIO` inside `forkM_maybe`.

 So this leaves us with two questions:

 1. Are there other (relevant) places where `unsafePerformIO` or
 `unsafeInterleaveIO` are used? I ''think'' the answer to this is is no,
 but I am not sure.

 2. Where do we catch and rethrow exceptions?

 Unfortunately, question 2 is more difficult to answer. The function `tryM`
 (`compiler/utils/IOEnv.hs`) wraps around `try`, but it in turn gets
 wrapped in lots of places: `recoverM`, `recoverTR`, `tryTcErrs`,
 `checkNoErrs` (this one is used a lot), `mapAndRecoverM` (also used
 frequently), `tryTc` and `runPlans`.

 There is a call to `tryM` directly inside the `unsafeInterleaveIO` in
 `forkM_maybe`, so that's an obvious candidate. However, as an experiment,
 I replaced the `tryM` with `tryAllM` (not really a solution, as discussed
 below) and re-ran the tests, and it was still failing in a similar way.
 This means that this `tryM` cannot be the only culprit, because with this
 change `forkM` will "re"throw all exceptions as an `GhcException`, never
 as the original exception (certainly not as the custom exception type that
 the test is using). Since we were still getting our custom exception, this
 must have been rethrown elsewhere.

 Unfortunately `tryM` or one of its variants gets called in lots of places:

 * `initDs` in `compiler/deSugar/DsMonad.lhs` [[BR]]
 * `dataConInfoPtrToName` in `compiler/ghci/DebuggerUtils.hs` [[BR]]
 * `addConstraint`, `cvObtainTerm`, `cvReconstructType` and
 `congruenceNewtypes` in `compiler/ghci/RtClosureInspect.hs` [[BR]]
 * `tcPolyBinds`, `tcPolyInfer`, `tcSpecPrags`, `tcImpPrags` and `tcTySigs`
 in `compiler/typecheck/TcBinds.lhs` [[BR]]
 * `tcClassDecl2` in `compiler/typecheck/TcClassDcl.lhs` [[BR]]
 * `check_instance` in `compiler/typecheck/TcDefaults.lhs` [[BR]]
 * `tcDeriving`, `makeDerivSpecs` and `inferInstanceContexts` in
 `compiler/typecheck/TcDeriv.lhs` [[BR]]
 * `tc_hs_type` in `compiler/typecheck/TcHsType.lhs`
 * `tcInstDecls1`, `tcClsInstDecl` and `tcInstDecl2` in
 `compiler/typecheck/TcInstDcls.lhs` [[BR]]
 * `tcRnExtCore`, `tc_rn_src_decls`, `rnTopSrcDecls`, `tcUserStmt`,
 `tcGhciStmts`, `lookup_rdr_name` in `compiler/typecheck/TcRnDriver.lhs`
 [[BR]]
 * `initTc` in `compiler/typecheck/TcRnMonad.lhs`  [[BR]]
 * `tcTopSplice`, `tcTopSpliceExpr`, `tcTopSpliceType`, `runMeta`,
 `qRecover`, `reifyInstances` in `compiler/typecheck/TcSplice.lhs` [[BR]]
 * `tcTyAndClassDecls`, `tcTyClGroup` and `checkValidTyCon` in
 `compiler/typecheck/TcTyClsDecls.lhs`

 Figuring out which of these may be called inside the `forkM` calls is
 tricky. I tried running the test with a profiled built of `ghc` with
 `-fprof-auto` enabled, but unfortunately I was unable to reproduce a test
 failure with that approach, I'm not sure why.

 But even if we did find the culprits, fixing it will not be easy because
 there is no generic solution. In `forkM_maybe` we could return `Nothing`
 when we catch an asynchronous exception, but then we would call `pgmError`
 inside `forkM` thereby rethrowing the asynchronous exception as a
 synchronous `GhcException`. That wouldn't solve anything.
 Instead, we'd have to rethrow the asynchronous exception using `throwTo`,
 but then we need to decide what to do after the `throwTo` returns, and
 that might be a rather difficult question to answer generically for all of
 the typechecker. Can we just re-run the `thing_inside`? Probably not,
 there might be all kinds of side effects; and checking for all those side
 effects would be quite a large task, esp given that there are so many
 places (list above) where we are caling `tryM` (directly through `forkM`
 or deeper down the call stack).

 I think a more realistic solution is to provide a means to detect when the
 snippet starts running. I outlined such a solution at
 http://www.haskell.org/pipermail/ghc-devs/2013-June/001380.html, but Simon
 M objected that this proposal might not work because there is no clear
 point where type checking finishes and the snippet starts executing. So
 far I think it would probably be easier to make sure that such a point
 *does* exist than to fix the asynchronous exception problem. (Note that I
 have adopted the solution outlined in my email and it seems to work for
 me, but since this bug is non-deterministic that of course doesn't mean it
 actually works..)

 However, I've run out of time to work on this for now. Might be continued.

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler