Why do remaining HECs busy-wait during unsafe-FFI calls?

Hello, Recently, I stumbled over E.Z.Yang's "Safety first: FFI and threading"[1] post and then while experimenting with unsafe-imported FFI functions I've noticed a somewhat surprising behaviour: Consider the following contrived program: --8<---------------cut here---------------start------------->8--- import Foreign.C import Control.Concurrent import Control.Monad import Data.Time.Clock.POSIX (getPOSIXTime) foreign import ccall unsafe "unistd.h sleep" c_sleep_unsafe :: CUInt -> IO CUInt main :: IO () main = do putStrLnTime "main started" _ <- forkIO (sleepLoop 10 >> putStrLnTime "sleepLoop finished") yield putStrLnTime "after forkIO" threadDelay (11*1000*1000) -- 11 seconds putStrLnTime "end of main" where putStrLnTime s = do t <- getPOSIXTime putStrLn $ init (show t) ++ "\t" ++ s sleepLoop n = do n' <- c_sleep_unsafe n unless (n' == 0) $ do putStrLnTime "c_sleep_unsafe got interrupted" sleepLoop n' --8<---------------cut here---------------end--------------->8--- When compiled with GHC-7.6.3/linux/amd64 with "-O2 -threaded" and executed with "+RTS -N4", the following output is emitted: 1367838802.137419 main started 1367838812.137727 after forkIO 1367838812.137783 sleepLoop finished 1367838823.148733 end of main which shows that the forkIO of the unsafe ccall effectively blocks the main thread; Moreover, when looking at the process table, I saw that 3 threads were occupying 100% CPU time each for 10 seconds until the 'after forkIO' was emitted. So what is happening here exactly, why do the 3 remaining HECs busy-wait during that FFI call instead of continuing the execution of the main thread? Do *all* foreign unsafe ccalls (even short ones) cause N-1 HECs to spend time in some kind of busy looping? [1]: http://blog.ezyang.com/2010/07/safety-first-ffi-and-threading/ Cheers, hvr

When an unsafe call is made, the OS thread currently running on the HEC
makes the call without releasing the HEC. If the main thread was on the run
queue of the HEC making the foreign unsafe call when the foreign call was
made, then no other HECs will pick up the main thread. Hence the two sleep
calls in your program happen sequentially instead of concurrently.
I'm not completely sure what is causing the busy wait, but here is one
guess: when a GC is triggered on one HEC, it signals to all the other HECs
to stop the mutator and run the collection. This waiting may be a busy
wait, because the wait is typically brief. If this is true, then since one
thread is off in a unsafe foreign call, there is one HEC that refuses to
start the GC and all the other HECs are busy-waiting for the signal. The
GC could be triggered by a period of inactivity. Again, this is just a
guess - you might try to verify this by turning off the periodic triggering
of GC and checking whether the start GC barrier is a busy-wait.
On Mon, May 6, 2013 at 7:29 AM, Herbert Valerio Riedel
Hello,
Recently, I stumbled over E.Z.Yang's "Safety first: FFI and threading"[1] post and then while experimenting with unsafe-imported FFI functions I've noticed a somewhat surprising behaviour:
Consider the following contrived program:
--8<---------------cut here---------------start------------->8--- import Foreign.C import Control.Concurrent import Control.Monad import Data.Time.Clock.POSIX (getPOSIXTime)
foreign import ccall unsafe "unistd.h sleep" c_sleep_unsafe :: CUInt -> IO CUInt
main :: IO () main = do putStrLnTime "main started" _ <- forkIO (sleepLoop 10 >> putStrLnTime "sleepLoop finished") yield putStrLnTime "after forkIO" threadDelay (11*1000*1000) -- 11 seconds putStrLnTime "end of main" where putStrLnTime s = do t <- getPOSIXTime putStrLn $ init (show t) ++ "\t" ++ s
sleepLoop n = do n' <- c_sleep_unsafe n unless (n' == 0) $ do putStrLnTime "c_sleep_unsafe got interrupted" sleepLoop n'
--8<---------------cut here---------------end--------------->8---
When compiled with GHC-7.6.3/linux/amd64 with "-O2 -threaded" and executed with "+RTS -N4", the following output is emitted:
1367838802.137419 main started 1367838812.137727 after forkIO 1367838812.137783 sleepLoop finished 1367838823.148733 end of main
which shows that the forkIO of the unsafe ccall effectively blocks the main thread;
Moreover, when looking at the process table, I saw that 3 threads were occupying 100% CPU time each for 10 seconds until the 'after forkIO' was emitted.
So what is happening here exactly, why do the 3 remaining HECs busy-wait during that FFI call instead of continuing the execution of the main thread?
Do *all* foreign unsafe ccalls (even short ones) cause N-1 HECs to spend time in some kind of busy looping?
[1]: http://blog.ezyang.com/2010/07/safety-first-ffi-and-threading/
Cheers, hvr
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Andreas Voellmy
When an unsafe call is made, the OS thread currently running on the HEC makes the call without releasing the HEC. If the main thread was on the run queue of the HEC making the foreign unsafe call when the foreign call was made, then no other HECs will pick up the main thread. Hence the two sleep calls in your program happen sequentially instead of concurrently.
Is this the bound-main-thread issue? That is, would wrapping the main thread in 'runInUnboundThread' help here?
I'm not completely sure what is causing the busy wait, but here is one guess: when a GC is triggered on one HEC, it signals to all the other HECs to stop the mutator and run the collection. This waiting may be a busy wait, because the wait is typically brief. If this is true, then since one thread is off in a unsafe foreign call, there is one HEC that refuses to start the GC and all the other HECs are busy-waiting for the signal. The GC could be triggered by a period of inactivity. Again, this is just a guess - you might try to verify this by turning off the periodic triggering of GC and checking whether the start GC barrier is a busy-wait.
that seems to be a rather good guess: I inhibited the GC by disabling the idle-timer using "+RTS -N4 -I0" and with that the HEC busy-waiting is gone; So actually this isn't FFI-specific at all, as I could trigger the very same effect by using a non-allocating/tight-loop evaluation such as the following: do _ <- forkIO (evaluate (busyfun 0 0) >> putStrLnTime "busyfun finished") where busyfun :: Int -> Int -> Int busyfun !n !m = if m < 0 then n else busyfun (n+1) (m+1) cheers, hvr
participants (2)
-
Andreas Voellmy
-
Herbert Valerio Riedel