
On 11/05/2011 23:57, dm-list-haskell-cafe@scs.stanford.edu wrote:
At Wed, 11 May 2011 13:02:21 +0100, Simon Marlow wrote:
However, if there's some simpler way to guarantee that>>= is the point where exceptions are thrown (and might be the case for GHC in practice), then I basically only need to update the docs. If someone with more GHC understanding could explain how asynchronous exceptions work, I'd love to hear it...
There's no guarantee of the form that you mention - asynchronous exceptions can occur anywhere. However, there might be a way to do what you want (disclaimer: I haven't looked at the implementation of iterIO).
Control.Exception will have a new operation in 7.2.1:
allowInterrupt :: IO () allowInterrupt = unsafeUnmask $ return ()
which allows an asynchronous exception to be thrown inside mask (until 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
So to answer my own question from earlier, I did a bit of benchmarking, and it seems that on my machine (a 2.4 GHz Intel Xeon 3060, running linux 2.6.38), I get the following costs:
9 ns - return () :: IO () -- baseline (meaningless in itself) 13 ns - unsafeUnmask $ return () -- with interrupts enabled 18 ns - unsafeUnmask $ return () -- inside a mask_
13 ns - ffi -- a null FFI call (getpid cached by libc) 18 ns - unsafeUnmask ffi -- with interrupts enabled 22 ns - unsafeUnmask ffi -- inside a mask_
Those are lower than I was expecting, but look plausible. There's room for improvement too (by inlining some or all of unsafeUnmask#). However, the general case of unsafeUnmask E, where E is something more complex than return (), will be more expensive because a new closure for E has to be created. e.g. try "return x" instead of "return ()", and try to make sure that the closure has to be created once per unsafeUnmask, not lifted out and shared.
131 ns - syscall -- getppid through FFI 135 ns - unsafeUnmask syscall -- with interrupts enabled 140 ns - unsafeUnmask syscall -- inside a mask_
So it seems that the cost of calling unsafeUnmask inside every liftIO would be about 22 cycles per liftIO invocation, which seems eminently reasonable. You could then safely run your whole program inside a big mask_ and not worry about exceptions happening between>>= invocations. Though truly compute-intensive workloads could have issues, the kind of applications targeted by iterIO will spend most of their time doing I/O, so this shouldn't be an issue.
Better yet, for programs that don't use asynchronous exceptions, if you don't put your whole program inside a mask_, the cost drops roughly in half. It's hard to imagine any real application whose performance would take a significant hit because of an extra 11 cycles per liftIO.
Is there anything I'm missing? For instance, my machine only has one CPU, and the tests all ran with one thread. Does unmaskAsyncExceptions# acquire a spinlock that could lock the memory bus? Or is there some other reason unsafeUnmask could become expensive on NUMA machines, or in the presence of concurrency?
There are no locks here, thanks to the message-passing implementation we use for throwTo between processors. unmaskeAsyncExceptions# basically pushes a small stack frame, twiddles a couple of bits in the thread state, and checks a word in the thread state to see whether any exceptions are pending. The stack frame untwiddles the bits again and returns. Cheers, Simon