Re: [Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

12 May 2011

      On 11/05/2011 23:57, dm-list-haskell-cafe@scs.stanford.edu wrote:
...
At Wed, 11 May 2011 13:02:21 +0100,
Simon Marlow wrote:
...
...
However, if there's some simpler way to guarantee that>>= is the
point where exceptions are thrown (and might be the case for GHC in
practice), then I basically only need to update the docs.  If someone
with more GHC understanding could explain how asynchronous exceptions
work, I'd love to hear it...
There's no guarantee of the form that you mention - asynchronous
exceptions can occur anywhere.  However, there might be a way to do what
you want (disclaimer: I haven't looked at the implementation of iterIO).
Control.Exception will have a new operation in 7.2.1:
allowInterrupt :: IO ()
    allowInterrupt = unsafeUnmask $ return ()
which allows an asynchronous exception to be thrown inside mask (until
7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
So to answer my own question from earlier, I did a bit of
benchmarking, and it seems that on my machine (a 2.4 GHz Intel Xeon
3060, running linux 2.6.38), I get the following costs:
9 ns - return () :: IO ()       -- baseline (meaningless in itself)
     13 ns - unsafeUnmask $ return () -- with interrupts enabled
     18 ns - unsafeUnmask $ return () -- inside a mask_
13 ns - ffi                      -- a null FFI call (getpid cached by libc)
     18 ns - unsafeUnmask ffi         -- with interrupts enabled
     22 ns - unsafeUnmask ffi         -- inside a mask_
Those are lower than I was expecting, but look plausible.  There's room 
for improvement too (by inlining some or all of unsafeUnmask#).

However, the general case of unsafeUnmask E, where E is something more 
complex than return (), will be more expensive because a new closure for 
E has to be created.  e.g. try "return x" instead of "return ()", and 
try to make sure that the closure has to be created once per 
unsafeUnmask, not lifted out and shared.
...
131 ns - syscall                  -- getppid through FFI
    135 ns - unsafeUnmask syscall     -- with interrupts enabled
    140 ns - unsafeUnmask syscall     -- inside a mask_

...
So it seems that the cost of calling unsafeUnmask inside every liftIO
would be about 22 cycles per liftIO invocation, which seems eminently
reasonable.  You could then safely run your whole program inside a big
mask_ and not worry about exceptions happening between>>=
invocations.  Though truly compute-intensive workloads could have
issues, the kind of applications targeted by iterIO will spend most of
their time doing I/O, so this shouldn't be an issue.
Better yet, for programs that don't use asynchronous exceptions, if
you don't put your whole program inside a mask_, the cost drops
roughly in half.  It's hard to imagine any real application whose
performance would take a significant hit because of an extra 11 cycles
per liftIO.
Is there anything I'm missing?  For instance, my machine only has one
CPU, and the tests all ran with one thread.  Does
unmaskAsyncExceptions# acquire a spinlock that could lock the memory
bus?  Or is there some other reason unsafeUnmask could become
expensive on NUMA machines, or in the presence of concurrency?
There are no locks here, thanks to the message-passing implementation we 
use for throwTo between processors.  unmaskeAsyncExceptions# basically 
pushes a small stack frame, twiddles a couple of bits in the thread 
state, and checks a word in the thread state to see whether any 
exceptions are pending.  The stack frame untwiddles the bits again and 
returns.

Cheers,
	Simon