interaction between ghci and cudaBLAS library

In my efforts to integrate this library into Haskell (I am working on OS X 10.5.6 with ghc-6.10.1 and CUDA 2.0) I am getting a bad interaction between the threads in ghci - when I call the library init function via the FFI, ghci will block in __semwait_signal. Of course if I build an executable which I guess has only one thread then all is well. AFAIK the CUDA library is re-entrant (or threadsafe) in that it will initialize a device context for each thread that calls it. I guess that might be part of the problem. Certainly there is a call to _pthread_getspecific in that library. This is show stopper for me as I want to use GHCI to call blas routines on the device (this lends itself very nicely to a monadic approach - leaving the matrices on the device until we are done applying sequential computations - and only then bringing them back into the "real" world). Since there is this association between the device context and the calling thread - is there a way to get a handle on the threading in ghci? (or just have a single thread) But why are we blocking? I would have expected completion or is ghci smart enough to prevent any non-deterministic behaviour that the current setup would entail? Any ideas or suggestions of how to proceed with this? The final work should it be successful will be offered to the community as a basis for doing high performance linear algebra on CUDA devices as well as get my haskell up to speed as a side effect :) -- View this message in context: http://www.nabble.com/interaction-between-ghci-and-cudaBLAS-library-tp223008... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

seb:
In my efforts to integrate this library into Haskell (I am working on OS X 10.5.6 with ghc-6.10.1 and CUDA 2.0) I am getting a bad interaction between the threads in ghci - when I call the library init function via the FFI, ghci will block in __semwait_signal. Of course if I build an executable which I guess has only one thread then all is well.
AFAIK the CUDA library is re-entrant (or threadsafe) in that it will initialize a device context for each thread that calls it. I guess that might be part of the problem. Certainly there is a call to _pthread_getspecific in that library.
This is show stopper for me as I want to use GHCI to call blas routines on the device (this lends itself very nicely to a monadic approach - leaving the matrices on the device until we are done applying sequential computations - and only then bringing them back into the "real" world).
Since there is this association between the device context and the calling thread - is there a way to get a handle on the threading in ghci? (or just have a single thread)
But why are we blocking? I would have expected completion or is ghci smart enough to prevent any non-deterministic behaviour that the current setup would entail?
Any ideas or suggestions of how to proceed with this? The final work should it be successful will be offered to the community as a basis for doing high performance linear algebra on CUDA devices as well as get my haskell up to speed as a side effect :)
Do you get the same problem in compiled code? (GHCi is generally for exploratory work only). E.g. ghc -O2 --make .... or ghc -O2 --make -threaded .... -- Don

Don Stewart-2 wrote:
Do you get the same problem in compiled code? (GHCi is generally for exploratory work only).
if I create an executable run it non-interactively. It works fine: $ ghc -O2 --make -threaded main.hs cublas.hs -lcublas -L${CUDA}/lib No matter whether is it compiled or interpreted it blocks in ghci (interactively), the threading option makes no difference in either case. -- View this message in context: http://www.nabble.com/interaction-between-ghci-and-cudaBLAS-library-tp223008... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

seb:
Don Stewart-2 wrote:
Do you get the same problem in compiled code? (GHCi is generally for exploratory work only).
if I create an executable run it non-interactively. It works fine:
$ ghc -O2 --make -threaded main.hs cublas.hs -lcublas -L${CUDA}/lib
No matter whether is it compiled or interpreted it blocks in ghci (interactively), the threading option makes no difference in either case.
GHCi doesn't use the threaded runtime though. So given that: "To allow foreign calls to be made without blocking all the Haskell threads (with GHC), it is only necessary to use the -threaded option when linking your program, and to make sure the foreign import is not marked unsafe. " So I think this is expected? -- Don

Hello Don, Tuesday, March 3, 2009, 5:22:46 AM, you wrote:
GHCi doesn't use the threaded runtime though. So given that:
Don, afair ghci compiled using threaded runtime since 6.6: Prelude> :m Control.Concurrent Prelude Control.Concurrent> rtsSupportsBoundThreads True -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin-2 wrote:
Hello Don,
Tuesday, March 3, 2009, 5:22:46 AM, you wrote:
GHCi doesn't use the threaded runtime though. So given that:
Don, afair ghci compiled using threaded runtime since 6.6:
Prelude> :m Control.Concurrent Prelude Control.Concurrent> rtsSupportsBoundThreads True
That's what I thought - the problem seems to be with the interaction between ghci threads (there are four OS threads running on my machine). I read the conc-ffi paper by Simon Marlow, Simon Peyton-Jones and Wolfgang Thaller and whilst that explained the situation regarding bound threads (and indeed how to ensure running a bunch of actions within the same thread context - which gave me some hope), I am still no wiser as to what is going on here and how it can be fixed. It at all. -- View this message in context: http://www.nabble.com/interaction-between-ghci-and-cudaBLAS-library-tp223008... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Simon Beaumont wrote:
Bulat Ziganshin-2 wrote:
Hello Don,
Tuesday, March 3, 2009, 5:22:46 AM, you wrote:
GHCi doesn't use the threaded runtime though. So given that:
Don, afair ghci compiled using threaded runtime since 6.6:
Prelude> :m Control.Concurrent Prelude Control.Concurrent> rtsSupportsBoundThreads True
That's what I thought - the problem seems to be with the interaction between ghci threads (there are four OS threads running on my machine). I read the conc-ffi paper by Simon Marlow, Simon Peyton-Jones and Wolfgang Thaller and whilst that explained the situation regarding bound threads (and indeed how to ensure running a bunch of actions within the same thread context - which gave me some hope), I am still no wiser as to what is going on here and how it can be fixed. It at all.
BTW if I wrap a forkOS around this I get control back in the ghci - which confirms the threadedness of that at least but still the ffi call blocks. I'm going to take a closer look inside with gdb again, this feels like some kind of deadlock. Given that in the conc-ffi paper this kind of interaction was anticipated we are not outside the design envelope of ghc(i) so there must be a way to solve this. -- View this message in context: http://www.nabble.com/interaction-between-ghci-and-cudaBLAS-library-tp223008... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.
participants (3)
-
Bulat Ziganshin
-
Don Stewart
-
Simon Beaumont