
Yes, these are all problems. However, there is a nice abstraction of the OS thread API in GHC's RTS, thanks to Sigbjorn. So I'm sure this API could be extended to include some thread-local state operations.
A further piece of what one might call thread local state is 'recursive locks' like those found in Java. With normal locks, if a thread executes this: take(lock); take(lock); ... release(lock); release(lock); then the 2nd call to take will block because a thread already has the lock. With recursive locks, the implementation of take records who has the lock and just increments a counter if the same thread takes the lock again. Likewise, release decrements the counter and only releases the lock when the counter reaches 0. And arbitrary user code and libraries are free to implement all kinds of code that depends on the current thread id. I'll bet you're going to see a lot of cases like this.
The only viable solution I can see is to provide a way to capture the calling thread (as a first class entity) when you call into Haskell and to explicitly specify whcih thread to use when you call out from Haskell. (Hmmm, sounds like callcc for C :-))
The trouble is, that is *way* too much overhead for a C call. HOpenGL does lots of these, and I strongly suspect that adding a full OS-thread context switch (well two, including the return) for each one would be a killer.
So don't do it for every foreign function call - only do it for the ones that request it. Here's the implementation I imagine: For foreign imports and exports that have not requested any special thread behaviour, do exactly what GHC currently does. Overhead == 0. For foreign exports that have requested thread capture, the call goes like this: 1) Get a thread from GHC's thread pool 2) Allocate a first-class C-thread object on the Haskell heap. and fill in the details for this thread. 3) Add the normal foreign export function arguments plus a pointer to the C thread object to the GHC thread and make it runnable. 4) Block this thread so that it is ready to use later. 5) When C function returns, do so in the first-class C-thread object. For foreign imports that have requested explicit thread choice, the call goes like this: 1) Get the C-thread object (it's an argument to the Haskell function so this is easy), perform suitable sanity checks (i.e., not already in use). 2) Marshall argumens to C function into the C-thread. 3) Unblock the C thread, block the Haskell thread waiting for response. 4) When C function returns, context switch back to a GHC thread. Overhead for foreign export is higher. Overhead for foreign import is not much different from existing safe foreign imports. Overhead only occurs if you request this feature. -- Alastair Reid reid@cs.utah.edu http://www.cs.utah.edu/~reid/