
It should be noted that synchronisation is achieved by using slightly different kinds of primitives. But still... six times...
And it's about to get faster still, because CVars can now be implemented with a single MVar instead of two. The reason is that putMVar now blocks on a full MVar rather than raising an exception. But as Simon said, the main reason is surely that GHC is using lightweight threads compared to C. BTW, was this on Linux? I'd be interested to see the results on systems that have different threading models, because Linux's threads implementation maps threads onto processes (albeit lightweight kind of process, but still a process), so the context switch overhead is going to be much higher than a threads library which sits in a single process. Cheers, Simon