This discussion is getting rather long, so I thought I'd summarise (as much for my benefit as everyone else's). Please let me know if I get anything wrong. It turns out that some C libraries designed to be used from multi-threaded programs make use of thread-local state. This is at odds with GHC's new extension to support using OS threads to multiplex calls to blocking foreign functions - this is the extension we call the "threaded RTS", which is off by default but turned on if you configure GHC with --enable-threaded-rts. The threaded-rts extension is important if you want to call foreign functions that might block - without thread-rts this would block all the other Haskell threads until the blocking foreign call returns. The problems arise because GHC's threaded RTS doesn't make any distinction between OS threads; as far as it is concerned any OS thread is as good as any other. We hadn't considered the use of thread-local state by external C libraries when we designed this (obviously :-{). Ok, so what can we do? 1. Swap the thread-local state in ================================== Wolfgang's proposed fix is to allow the right thread-local state to be swapped in at the right moment, just before running a Haskell thread. I don't think this will work in general, because part of the thread-local state is the thread ID of the OS thread itself, which can't be swapped in. Also, Sven pointed out that swapping in the context in the GLUT case can have other drastic performance implications. 2. Every Haskell thread has its own OS thread ============================================= Some other folk proposed moving to a 1-1 correspondence between Haskell threads and OS threads. I think this is a poor solution simply because of the overhead - Haskell threads are very lightweight (1000s of threads is entirely reasonable), but OS threads tend to be much heavier. For example, I'm sure this would kill the performance of the Haskell web server. 3. Some Haskell threads have their own OS thread ================================================ Another solution is to fix a 1-1 correspondence between Haskell threads and OS threads for some Haskell threads only, perhaps selected by a different version of forkIO. We think this is implementable, has zero overhead if you don't use it, but it does require that the user of the external binding remembers to use the right flavour of forkIO. Callbacks have to create a new Haskell thread which is bound to the current OS thread. Alastair points out that it might be significant which Haskell thread runs a particular finalizer. 4. Thread groups ================ Claus's suggestion is similar, but gives the Haskell programmer more control over the mapping between OS threads and Haskell threads. I must admit I'd been wondering about something similar myself. He suggests that every Haskell thread is bound to a specific OS thread, but that more than one Haskell thread can map to the same OS thread (a thread group). This is slightly less convenient for the Haskell programmer - one has to be careful to fork a new thread group to avoid being blocked by a foreign call. ------------------ We can afford to discuss this a while longer, because Simon & I are currently focussed on the next release (I don't want to hold up 5.04 for a fix, and it wouldn't be a disaster if we had go straight to 5.06 in a couple of months or so). Personally I can't decide whether (3) or (4) is the better solution. I'm pretty sure (1) and (2) aren't viable, though. Cheers, Simon