
On Wed, 2008-09-17 at 13:44 -0700, Evan Laforge wrote:
systems that don't use an existing user-space thread library (such as Concurrent Haskell or libthread [1]) emulate user-space threads by keeping a pool of processors and re-using them (e.g., IIUC Apache does this).
Your response seems to be yet another argument that processes are too expensive to be used the same way as threads. In my mind pooling vs new-creation is only relevant to process vs thread in the performance aspects. The fact that people use thread-pools means that they think that even thread-creation is too expensive. The central aspect in my mind is a default share-everything, or default share-nothing. One is much easier to reason about and encourages writing systems that have less shared-memory contention.
This is similar to the plan9 conception of processes. You have a generic rfork() call that takes flags that say what to share with your parent: namespace, environment, heap, etc. Thus the only difference between a thread and a process is different flags to rfork().
As I mentioned, Plan 9 also has a user-space thread library, similar to Concurrent Haskell.
Under the covers, I believe linux is similar, with its clone() call.
The fast context switching part seems orthogonal to me. Why is it that getting the OS involved for context switches kills the performance?
Read about CPU architecture.
Is it that the ghc RTS can switch faster because it knows more about the code it's running (i.e. the OS obviously couldn't switch on memory allocations like that)? Or is jumping up to kernel space somehow expensive by nature?
Yes. Kernel code is very different on the bare metal from userspace code; RTS code of course is not at all different. Switching processes in the kernel requires an interrupt or a system call. Both of those require the processor to dump the running process's state so it can be restored later (userspace thread-switching does the same thing, but it doesn't dump as much state because it doesn't need to be as conservative about what it saves).
And why does the OS need so many more K to keep track of a thread than the RTS?
An OS thread (Linux/Plan 9) stores: * Stack (definitely a stack pointer and stored registers (> 40 bytes on i686) and includes a special set of page tables on Plan 9) * FD set (even if it's the same as the parent thread, you need to keep a pointer to it * uid/euid/gid/egid (Plan 9 I think omits euid and egid) * Namespace (Plan 9 only; again, you need at least a pointer even if it's the same as the parent process) * Priority * Possibly other things I can't think of right now A Concurrent Haskell thread stores: * Stack * Allocation area (4KB) The kernel offers more to a process (and offers a wider separation between processes) than Concurrent Haskell offers to a thread. jcc