'import ccall unsafe' and parallelism

Greetings everybody, I happen to be a bit confused with regards to unsafe foreign imports and parallelism. Assume the following C function: foreign import ccall unsafe "cfun" cfun :: CInt -> IO () Now, cfun does some work: go xs = unsafePerformIO $ do forM_ xs $ cfun return $ somethingUnhealthy And I'd like to parallelize this: parMap rdeepseq go [costly,costly] However, due to the way ghc handles unsafe imports, namely block everything else whenever 'cfun' is called, I happen to have only one active 'go'. Lets assume 'cfun' is cheap and would suffer from 'ccall safe' more than I'd be willing to pay. Is there any fix possible? Viele Gruesse, Christian PS: The real problem happens to use a bunch of different judy arrays, each of which lives in its on thread; 300 judy arrays, 300 threads, each up to 20 million inserts. But I think the basic problem can be reduced to "how to parallelize 'ccall unsafe's.

Hi! On Thu, Aug 14, 2014 at 5:54 PM, Christian Höner zu Siederdissen < choener@tbi.univie.ac.at> wrote:
However, due to the way ghc handles unsafe imports, namely block everything else whenever 'cfun' is called, I happen to have only one active 'go'. Lets assume 'cfun' is cheap and would suffer from 'ccall safe' more than I'd be willing to pay.
Calls to unsafe functions do not block everything! Other Haskell threads can continue running on other capabilities (make sure you run your program with +RTS -N). However, make sure that the C function itself never blocks, or it might deadlock your program.

I'm no judge of what's true about safe and unsafe, but this account of the system has at least to my ear the ring of authenticity: http://blog.melding-monads.com/2011/10/24/concurrency-and-foreign-functions-... The FFI section is short and readable. With respect to whether "unsafe" does or does not block other threads, I can never penetrate the verbiage about capabilities to know for sure what to expect in practice, but when I checked, in practice, "unsafe" blocks other threads. I use "safe" to avoid this. Donn

if your computation in the C call takes more than 400 nano seconds, the
overhead of the safe ffi convention is less onerous and you should do that
when applicable.
an alternative is so use forkOn to setup a worker thread on various GHC
capabilities, and have them in parallel work on different chunks of the
list. that way you're pausing ALL the capabilities (which again, you should
only use the unsafe ffi convention for computations that take <= 10
microseconds if you want things to be well behaved)
On Thu, Aug 14, 2014 at 12:21 PM, Donn Cave
I'm no judge of what's true about safe and unsafe, but this account of the system has at least to my ear the ring of authenticity:
http://blog.melding-monads.com/2011/10/24/concurrency-and-foreign-functions-...
The FFI section is short and readable.
With respect to whether "unsafe" does or does not block other threads, I can never penetrate the verbiage about capabilities to know for sure what to expect in practice, but when I checked, in practice, "unsafe" blocks other threads. I use "safe" to avoid this.
Donn _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Thu, Aug 14, 2014 at 11:54 AM, Christian Höner zu Siederdissen < choener@tbi.univie.ac.at> wrote:
go xs = unsafePerformIO $ do forM_ xs $ cfun return $ somethingUnhealthy
I wonder if this is your real problem. `unsafePerformIO` does some extra locking; the FFI specifies a function `unsafeLocalState`, which in GHC is `unsafeDupablePerformIO` which skips the extra locking. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

I have to agree with Brandon's diagnosis: unsafePerformIO will take out a lock, which is likely why you are seeing no parallelism. Edward Excerpts from Brandon Allbery's message of 2014-08-14 17:12:00 +0100:
On Thu, Aug 14, 2014 at 11:54 AM, Christian Höner zu Siederdissen < choener@tbi.univie.ac.at> wrote:
go xs = unsafePerformIO $ do forM_ xs $ cfun return $ somethingUnhealthy
I wonder if this is your real problem. `unsafePerformIO` does some extra locking; the FFI specifies a function `unsafeLocalState`, which in GHC is `unsafeDupablePerformIO` which skips the extra locking.

Thanks, I've played around some more and finally more than one capability is active. And indeed, unsafe calls don't block everything. I /had/ actually read that but when I saw the system spending basically only 100% cpu time, I'd thought to ask. One problem with this program seems to be that the different tasks are of vastly different sizes. Inputs range from ~ 7x10^1 to ~ 3x10^7 elements inducing waits with the larger problem sizes. We'll keep the program single-threaded for now as this also keeps memory consumption at only 25 gbyte instead of the more impressive 70 gbyte in multi-threaded mode ;-) Viele Gruesse, Christian

have a smart wrapper around you ffi call, and if when you think the ffi call will take more than 1 microsecond, ALWAYS use the safe ffi call, i do something like this in an FFI i wrote, it works great On Thu, Aug 14, 2014 at 1:20 PM, Christian Höner zu Siederdissen < choener@tbi.univie.ac.at> wrote:
Thanks,
I've played around some more and finally more than one capability is active. And indeed, unsafe calls don't block everything. I /had/ actually read that but when I saw the system spending basically only 100% cpu time, I'd thought to ask.
One problem with this program seems to be that the different tasks are of vastly different sizes. Inputs range from ~ 7x10^1 to ~ 3x10^7 elements inducing waits with the larger problem sizes.
We'll keep the program single-threaded for now as this also keeps memory consumption at only 25 gbyte instead of the more impressive 70 gbyte in multi-threaded mode ;-)
Viele Gruesse, Christian
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

That's actually a great idea, especially since the safe variants of the
calls are already in place.
* Carter Schonwald
have a smart wrapper around you ffi call, and if when you think the ffi call will take more than 1 microsecond, ALWAYS use the safe ffi call, i do something like this in an FFI i wrote, it works great
On Thu, Aug 14, 2014 at 1:20 PM, Christian HAP:ner zu Siederdissen
wrote: Thanks,
I've played around some more and finally more than one capability is active. And indeed, unsafe calls don't block everything. I /had/ actually read that but when I saw the system spending basically only 100% cpu time, I'd thought to ask.
One problem with this program seems to be that the different tasks are of vastly different sizes. Inputs range from ~ 7x10^1 to ~ 3x10^7 elements inducing waits with the larger problem sizes.
We'll keep the program single-threaded for now as this also keeps memory consumption at only 25 gbyte instead of the more impressive 70 gbyte in multi-threaded mode ;-)
Viele Gruesse, Christian
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

glad I could help, https://github.com/wellposed/hblas/blob/master/src/Numerical/HBLAS/BLAS/Inte... is an example of the "choose to do the safe vs unsafe ffi call" trick in the case of blas / lapack routines, i can always estimate how long a compute job will take as a function of its inputs, and i use that estimate to decide which ffi strategy to use (ie i use unsafe ffi on < 10 microsecond computations so that the overhead doesn't dominate the compute time on tiny inputs) On Thu, Aug 14, 2014 at 5:38 PM, Christian Höner zu Siederdissen < choener@tbi.univie.ac.at> wrote:
That's actually a great idea, especially since the safe variants of the calls are already in place.
* Carter Schonwald
[14.08.2014 23:10]: have a smart wrapper around you ffi call, and if when you think the ffi call will take more than 1 microsecond, ALWAYS use the safe ffi call, i do something like this in an FFI i wrote, it works great
On Thu, Aug 14, 2014 at 1:20 PM, Christian HAP:ner zu Siederdissen
wrote: Thanks,
I've played around some more and finally more than one capability is active. And indeed, unsafe calls don't block everything. I /had/ actually read that but when I saw the system spending basically only 100% cpu time, I'd thought to ask.
One problem with this program seems to be that the different tasks are of vastly different sizes. Inputs range from ~ 7x10^1 to ~ 3x10^7 elements inducing waits with the larger problem sizes.
We'll keep the program single-threaded for now as this also keeps memory consumption at only 25 gbyte instead of the more impressive 70 gbyte in multi-threaded mode ;-)
Viele Gruesse, Christian
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (6)
-
Brandon Allbery
-
Carter Schonwald
-
Christian Höner zu Siederdissen
-
Donn Cave
-
Edward Z. Yang
-
Johan Tibell