
I'm slowly losing track of this discussion...
so am I :-(
My initial suggestion was that it is guaranteed that the same OS thread which created the f.i.w. thunk is used to call back to Haskell for *this* wrapped function. There is no overhead for calls Haskell->C, only for the comparatively rare case of C->Haskell. What's wrong with this?
There are two problems with this approach, I think. The situation is like this (correct me if I'm wrong): - Haskell thread H1 running on OS thread O1 registers a callback C. - Haskell thread H1/O1 makes a blocking call into HOpenGL. This call is made also in O1. The RTS allocates another OS worker thread O2, and continues running Haskell threads. - HOpenGL, running in O1, invokes the callback C. The RTS stops O1, creates a new Haskell thread H2 in which to run C, and eventually runs H2 in O2. Problem #1 is the call-in: our current implementation *always* runs the callback in a different OS thread from the calling thread. It was simpler this way, but perhaps this can change. Problem #2 is that we would have to add some extra machinery to guarantee that a given Haskell thread executes in a particular OS thread, and somehow do it in a way that was "fair" (i.e. the Haskell thread with a preference for its OS thread doesn't get starved, and doesn't starve other threads). The RTS currently doesn't make any distinction between OS threads; a given Haskell thread can even migrate from one OS thread to another during its execution. Cheers, Simon