[GHC] #8006: Asynchronous exception rethrown synchronously inside runStmt

#8006: Asynchronous exception rethrown synchronously inside runStmt -----------------------------+---------------------------------------------- Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Component: Compiler Version: 7.6.3 | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Blockedby: Blocking: | Related: -----------------------------+---------------------------------------------- This bug is related to various bugs to do with asynchronous exceptions [http://hackage.haskell.org/trac/ghc/ticket/3997], [http://hackage.haskell.org/trac/ghc/ticket/5902], as well as the request for a means to terminate a code snippet started with `runStmt` [http://hackage.haskell.org/trac/ghc/ticket/1381]. The only means to terminate such a snippet is to send an asynchronous exception to the thread that executed the `runStmt` (while, in versions of ghc prior to Simon M's patch listed in 1381, making sure that sandboxing is disabled). The problem with this approach is that this exception might interrupt the *typechecker* instead of the snippet. This is problematic, because a relatively large part of the typechecker runs inside a call to `unsafeInterleaveIO`, through a call to `forkM` or `forkM_maybe` from * `loadDecl` in `compiler/iface/LoadIFace.lhs` [[BR]] * `tc_iface_decl`, `tcIfaceDataCons`, `tcIfaceInst`, `tcIfaceFamInst`, `tcIfaceRule`, `tcIfaceVectInfo`, `tcUnfolding`, `tcIfaceWrapper`, `tcPragExpr`, `tcUnfolding`, `tcIfaceWrapper` and `tcPragExpr` in `compiler/iface/TcIFace.lhs` [[BR]] If the asynchronous exception is caught inside `unsafeInterleaveIO` and rethrown synchronously (typically implicitly through a call to `try` or `catch` that only catches exceptions of a certain type -- specifically `IOEnvFailure` in this case) then this exception becomes the value of the thunk and any subsequent attempt to poke that thunk will rethrow the exception. The attached test case illustrates this. After some minimal set up, we repeatedly use `runStmt` to execute a snippet which simply waits for 1 second. Before we do this, we fork a thread which waits a random delay between 0 and 0.1 seconds, and then sends an asynchronous exception to the main thread. We catch this exception, make sure it's the exception we're expecting, and repeat. On every iteration we throw a different exception; however, when we run the code we will get {{{ Unexpected exception CountedThreadKilled 4 Expected (CountedThreadKilled 8) }}} or similar output. In other words, we are catching an *old* exception, that we threw and caught earlier. This must be because this exception has become the value of a thunk, either inside `unsafePerformIO` or `unsafeInterleaveIO` -- and so far the only candidate I have found is the `unsafeInterleaveIO` inside `forkM_maybe`. So this leaves us with two questions: 1. Are there other (relevant) places where `unsafePerformIO` or `unsafeInterleaveIO` are used? I ''think'' the answer to this is is no, but I am not sure. 2. Where do we catch and rethrow exceptions? Unfortunately, question 2 is more difficult to answer. The function `tryM` (`compiler/utils/IOEnv.hs`) wraps around `try`, but it in turn gets wrapped in lots of places: `recoverM`, `recoverTR`, `tryTcErrs`, `checkNoErrs` (this one is used a lot), `mapAndRecoverM` (also used frequently), `tryTc` and `runPlans`. There is a call to `tryM` directly inside the `unsafeInterleaveIO` in `forkM_maybe`, so that's an obvious candidate. However, as an experiment, I replaced the `tryM` with `tryAllM` (not really a solution, as discussed below) and re-ran the tests, and it was still failing in a similar way. This means that this `tryM` cannot be the only culprit, because with this change `forkM` will "re"throw all exceptions as an `GhcException`, never as the original exception (certainly not as the custom exception type that the test is using). Since we were still getting our custom exception, this must have been rethrown elsewhere. Unfortunately `tryM` or one of its variants gets called in lots of places: * `initDs` in `compiler/deSugar/DsMonad.lhs` [[BR]] * `dataConInfoPtrToName` in `compiler/ghci/DebuggerUtils.hs` [[BR]] * `addConstraint`, `cvObtainTerm`, `cvReconstructType` and `congruenceNewtypes` in `compiler/ghci/RtClosureInspect.hs` [[BR]] * `tcPolyBinds`, `tcPolyInfer`, `tcSpecPrags`, `tcImpPrags` and `tcTySigs` in `compiler/typecheck/TcBinds.lhs` [[BR]] * `tcClassDecl2` in `compiler/typecheck/TcClassDcl.lhs` [[BR]] * `check_instance` in `compiler/typecheck/TcDefaults.lhs` [[BR]] * `tcDeriving`, `makeDerivSpecs` and `inferInstanceContexts` in `compiler/typecheck/TcDeriv.lhs` [[BR]] * `tc_hs_type` in `compiler/typecheck/TcHsType.lhs` * `tcInstDecls1`, `tcClsInstDecl` and `tcInstDecl2` in `compiler/typecheck/TcInstDcls.lhs` [[BR]] * `tcRnExtCore`, `tc_rn_src_decls`, `rnTopSrcDecls`, `tcUserStmt`, `tcGhciStmts`, `lookup_rdr_name` in `compiler/typecheck/TcRnDriver.lhs` [[BR]] * `initTc` in `compiler/typecheck/TcRnMonad.lhs` [[BR]] * `tcTopSplice`, `tcTopSpliceExpr`, `tcTopSpliceType`, `runMeta`, `qRecover`, `reifyInstances` in `compiler/typecheck/TcSplice.lhs` [[BR]] * `tcTyAndClassDecls`, `tcTyClGroup` and `checkValidTyCon` in `compiler/typecheck/TcTyClsDecls.lhs` Figuring out which of these may be called inside the `forkM` calls is tricky. I tried running the test with a profiled built of `ghc` with `-fprof-auto` enabled, but unfortunately I was unable to reproduce a test failure with that approach, I'm not sure why. But even if we did find the culprits, fixing it will not be easy because there is no generic solution. In `forkM_maybe` we could return `Nothing` when we catch an asynchronous exception, but then we would call `pgmError` inside `forkM` thereby rethrowing the asynchronous exception as a synchronous `GhcException`. That wouldn't solve anything. Instead, we'd have to rethrow the asynchronous exception using `throwTo`, but then we need to decide what to do after the `throwTo` returns, and that might be a rather difficult question to answer generically for all of the typechecker. Can we just re-run the `thing_inside`? Probably not, there might be all kinds of side effects; and checking for all those side effects would be quite a large task, esp given that there are so many places (list above) where we are caling `tryM` (directly through `forkM` or deeper down the call stack). I think a more realistic solution is to provide a means to detect when the snippet starts running. I outlined such a solution at http://www.haskell.org/pipermail/ghc-devs/2013-June/001380.html, but Simon M objected that this proposal might not work because there is no clear point where type checking finishes and the snippet starts executing. So far I think it would probably be easier to make sure that such a point *does* exist than to fix the asynchronous exception problem. (Note that I have adopted the solution outlined in my email and it seems to work for me, but since this bug is non-deterministic that of course doesn't mean it actually works..) However, I've run out of time to work on this for now. Might be continued. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt -----------------------------+---------------------------------------------- Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Component: Compiler Version: 7.6.3 | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Blockedby: Blocking: | Related: -----------------------------+---------------------------------------------- Comment(by edsko): Note that the test also sometimes fails with {{{ Unexpected exception ExceptionTest: panic! (the 'impossible' happened) (GHC version 7.6.2 for x86_64-apple-darwin): Dynamic linker not initialised Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug }}} This is probably an orthogonal problem (and I have never seen this "in the wild"). -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Changes (by simonmar): * difficulty: => Unknown Comment: Thanks for looking into this. I expect you're right that it's infeasible to figure out what to do when `throwTo` returns. I would be OK with a solution that did all the lazy loading of interface files first before running the snippet, as long as we can be reasonably sure that it works. (that is, not just "I tried it and it seems to work", but asking somebody who has a reasonable idea of where we do all the `unsafePerformIO`s, i.e., simonpj). I'm not sure, but I think that we might lazily read in `IdInfo` from the interface file, so poking on the arity of an `Id` might cause an `unsafePerformIO` to happen, and that would be outside of the typechecker. A couple of other approaches spring to mind: * make `forkM` be `forkIO` (or `async`). I don't know how much overhead that would entail, though. We could try it. * put a `mask_` inside `forkM`. Perhaps `uninterruptibleMask` is justified in this case, since it's hard to guarantee that we don't have any interruptible operations, yet we're reasonably sure that it shouldn't block. However, that might mean we couldn't ^C the typechecker if it looped... -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonpj): Wait... you say "this exception has become the value of a thunk". Are you saying that if a thunk evaluation is interrupted by an asynch exn `e`, then the value of the thunk becomes `throw e`? Surely we should rather just freeze evaluation of the thunk so if it is evaluated again we simply resume? I don't think an asynch exn should ''ever'' become the value of a thunk. That makes no sense! Simon -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by edsko): "I don't think an asynch exn should ever become the value of a thunk." -- yes, that would be ideal. Unfortunately, that's very hard to do, because there is no way to install an exception handler for synchronous exceptions only -- in particular, the almost-primitive `GHC.IO.catchException` catches all exceptions, does a runtime type check, and rethrows the exception using `raiseIO#` if the types don't match. This means that asynchronous exceptions become synchronous, and hence can become the value of a thunk. Manually re-throwing the exception using `throwTo` is not really a solution either (at least, not a generic solution, though sometimes a possible workaround), as then we need to decide in user code how to resume the thread. The ghc bugs I cite at the start of the issue are all related to this problem. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): Simon - normally a thunk is frozen when hit by an asynchronous exception, but the problem is when we have an `unsafePerformIO` or `unsafeInterleaveIO` inside the thunk, and the IO code catches the asynchronous exception and re-throws it. When it re-throws the exception there are two options: * re-throw it as a synchronous exception. But then any enclosing thunks will capture the exception. This is what happens right now in the typechecker and other parts of GHC; it's the bug we want to fix. * re-throw the exception asynchronously. You can do this by calling `throwTo` passing your own `ThreadId`. Leaving aside the question of how you know that the original exception was asynchronous (which we don't have a good answer to, see #5902), the problem here is that when the current execution is resumed in the future, `throwTo` will ''return'', and then you have to figure out what to do: re-execute the whole operation, or something else? Hope that helps. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): I think I suggested somewhere that we should have a special kind of `catch` that doesn't catch asynchronous exceptions. That would give us a good way to fix this problem manually, at least. It does need changes in both GHC (a new primop) and the RTS (a new stack frame) though. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonpj): Ok, well, before thinking about workarounds, let's just check that something closer to ideal is not possible. Here is a tiny example {{{ let x = unsafePerformIO $ do { let v = expensive 100 ; printToFile "logfile" ("Computed v" ++ show v) ; return v } }}} Here I am writing to a log-file as an example of the kind of benign side effect that should be ok in `unsafePerformIO`. So what should happen if an asynch exn is caught inside `printToFile` some I/O stuff is restored, and the exception is re-thrown? I would have that this principle should hold: * exactly the same thing should happen as would have happened if the exception was delivered when we ''were not'' inside any catch-blocks in `printToFile`. Namely, freeze and resume. I'd be inclined to resume just before the `throwTo`. So that might (just) cause the log file to be written to twice, but it's supposed be a benign side effect. It's a good point though. I wonder what the semantics says? Adding a new kind of catch does not seem like the right solution. `printToFile` might really really want to restore some invariants before aborting; if it can't catch async exns, it couldn't do so. Simon -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8006: Asynchronous exception rethrown synchronously inside runStmt ---------------------------------+------------------------------------------ Reporter: edsko | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): If repeating the I/O is what you want, then you can do that - indeed, that's what I suggested that we should do, and this ticket is Edsko's followup after investigating that possibility. For some background, we adopted this approach in the I/O library for lazy I/O, see #3997, and `Note [async]` in `libraries/base/GHC/IO/Handle/Internals.hs`. Edsko's objection to doing this in GHC is that there are a lot of places where we catch exceptions inside `unsafePerformIO` or equivalent, and it probably isn't safe to repeat the I/O in all of them. Well, maybe it is - I'm not sure. The semantics of async exceptions doesn't say anything about this, because it doesn't include `unsafePerformIO` - IO is strictly at a higher level than pure computation. What seems attractive about having a different kind of catch is that it avoids the problem in cases where we're only interested in catching a subset of exceptions, which is what we're doing in GHC: both `tryM` and `tryMostM` want to just pass through async exceptions. You're right that it doesn't help if you want to do some cleanup (`bracket` or `finally`). -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8006#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC