
This is interesting, thanks. I propose to add INLINE pragmas to withMVar and friends. Having an interface for simple locks sounds like a good idea to me. Would you like to send a patch? This won't affect Handle I/O unfortunately, because we need block to protect against asynchronous exceptions. I'm still not certain you won't need that in the stream library, too: check any stateful code (eg. buffering) and imagine what happens if an exception is raised at an arbitrary point. Cheers, Simon Bulat Ziganshin wrote:
Main reason of slowness of existing Handle-based I/O in GHC is locking around each operation. it is especially bad for simple char-at-a-time I/O where 99% of time spent on locking and unlocking.
To be exact, on my CPU, hPutChar for 100mb file requires 150 seconds, while hGetChar for the same file is "only" 100 seconds long. it seems that former use 3 locking operations and later 2 ones, because my own vGetChar/vPutChar implementations both requires 52 seconds, of those only about one second is real work and rest is just `withMVar` expenses.
Until now, i thought that this 0.5 ms (about 1000 primitive CPU operations) on each withMVar is pure time required to perform takeMVar+putMVar operations. But yesterday i investigated this problem deeper and results was amazing!
First, i just made local copy of `withMVar` and added INLINE to it:
import Control.Exception as Exception {-# INLINE inlinedWithMVar #-} inlinedWithMVar :: MVar a -> (a -> IO b) -> IO b inlinedWithMVar m io = block $ do a <- takeMVar m b <- Exception.catch (unblock (io a)) (\e -> do putMVar m a; throw e) putMVar m a return b
Second, i've developed my own simplified version of this procedure. Here i should say that my library uses "MVar ()" field to hold lock and separate immutable data field with actual data locked:
data WithLocking h = WithLocking h !(MVar ())
This allowed me to omit block/unblock operation and develop the following faster analog of withMVar:
lock (WithLocking h mvar) action = do Exception.catch (do takeMVar mvar result <- action h putMVar mvar () return res ) (\e -> do tryPutMVar mvar (); throw e)
And as third variant i tried exception-unsafe variant of `withMVar`:
unsafeWithMVar :: MVar a -> (a -> IO b) -> IO b unsafeWithMVar m io = do a <- takeMVar m b <- io a putMVar m a return b
And now are results:
withMVar 52 seconds inlinedWithMVar 38 seconds lock 20 seconds unsafeWithMVar 10 seconds
So,
1) `withMVar` can be made significantly faster just by attaching INLINE pragma to it. until GHC includes this patch, you can just make local copy of this procedure (it's implementation is compiler-independent) and use INLINE pragma for this local copy
2) if MVar is used only to protect some immutable data from simultaneous access, it's use can be made significantly faster by using above-mentioned WithLocking type constructor together with 'lock' function. I hope that this mechanism will go into future Haskell implementations and in particular it will be used in my own Streams library and in new DiffArray implementation (that is a part of ArrayRef library)
3) For simple programs that don't catch exceptions anyway, this excessive protection is just meaningless. they can use 'unsafeWithMVar' to work as fast as possible. i mean in particular shootout-like benchmarks. it is also possible to develop fast & safe routines by using explicit unlocking (with 'tryPutMVar') in higher-level exception handlers
and a more general conclusion. this case is a good demonstration of significant performance loss due to using of higher-order functions. i think that more aggressive inlining of high-order and polymorphic functions should significantly speed up GHC-compiled programs.