
seb:
In my efforts to integrate this library into Haskell (I am working on OS X 10.5.6 with ghc-6.10.1 and CUDA 2.0) I am getting a bad interaction between the threads in ghci - when I call the library init function via the FFI, ghci will block in __semwait_signal. Of course if I build an executable which I guess has only one thread then all is well.
AFAIK the CUDA library is re-entrant (or threadsafe) in that it will initialize a device context for each thread that calls it. I guess that might be part of the problem. Certainly there is a call to _pthread_getspecific in that library.
This is show stopper for me as I want to use GHCI to call blas routines on the device (this lends itself very nicely to a monadic approach - leaving the matrices on the device until we are done applying sequential computations - and only then bringing them back into the "real" world).
Since there is this association between the device context and the calling thread - is there a way to get a handle on the threading in ghci? (or just have a single thread)
But why are we blocking? I would have expected completion or is ghci smart enough to prevent any non-deterministic behaviour that the current setup would entail?
Any ideas or suggestions of how to proceed with this? The final work should it be successful will be offered to the community as a basis for doing high performance linear algebra on CUDA devices as well as get my haskell up to speed as a side effect :)
Do you get the same problem in compiled code? (GHCi is generally for exploratory work only). E.g. ghc -O2 --make .... or ghc -O2 --make -threaded .... -- Don