
Dean Herrington wrote:
[...] Rather, I find it nonintuitive that calling from Haskell to foreign code and back into Haskell should create a new Haskell thread, when these two Haskell threads really are just different portions of a single "thread of computation" (deliberately vague term).
I agree to that. Creating a new thread for calling back into Haskell _only_ makes sense if you look at it from inside the GHC RTS. Before I had a look at the relevant parts of the RTS, I would never have thought of that. I don't know if there's any advantage/disadvantage to changing GHC's internals. The only _observable_ difference is the thread's ThreadIds, and this should at least be clearly documented (or, even better, it should be "explicitly undocumented", so that no one will be suprised if the behaviour is changed in the future).
Off the top of my head I can think of two situations in which having separate threads is bothersome.
3. Throwing exceptions to a thread If I manually translate haskell exceptions to foreign exceptions and back, there is no reason why I shouldn't want to raise an exception in a thread I have a threadId for, even if that thread called a foreign function which in turn called back to haskell. I think that the behaviour can always be emulated using MVars however, so I think there's no immediate action required. --- I've tried to rephrase my proposal for native threads, this time treating GHC's behaviour in this situation as an implementation detail. I think the meaning of the proposal becomes clearer because of this. The proposal doesn't comment on ThreadIds, so the non-intuitive (IMHO) behaviour in GHC is independent of the "bound threads" proposal. I think I've understood both my own specification and the current RTS well enough to start trying to implement a prototype soon. The intended meaning of the specification hasn't changed for the third revision in a row. Does anyone have concrete suggestions for the syntax change to foreign export and foreign import "wrapper"? Cheers, Wolfgang ============================= Bound Threads Proposal, version 5 Goals ~~~~~ Since foreign libraries sometimes exploit thread local state, it is necessary to provide some control over which thread is used to execute foreign code. In particular, it is important that it should be possible for Haskell code to arrange that a sequence of calls to a given library are performed by the same native thread and that if an external library calls into Haskell, then any outgoing calls from Haskell are performed by the same native thread. This specification is intended to be implementable both by multithreaded Haskell implementations and by single-threaded implementations and so it does not comment on which particular OS thread is used to execute Haskell code. Definitions ~~~~~~~~~~~ A native thread is a thread as defined by the operating system. A "Haskell thread" encapsulates the execution of a Haskell I/O action. A Haskell thread is created by forkIO, and dies when the I/O action completes. When a Haskell thread calls a foreign imported function, it is not considered to be blocked (in the GHC runtime system, the calling thread is blocked; This is considered an implementation detail for the purposes of this specification, but be aware that myThreadId might return several different values for one "Haskell thread" as defined here). If the foreign function calls back to Haskell, the callback is said to run in the same Haskell thread. Design ~~~~~~ Haskell threads may be associated at thread creation time with either zero or one native threads. Each Native thread is associated with at most one Haskell thread. A native thread that is associated with a Haskell thread is called a bound Haskell thread. A Haskell thread that is associated with a native thread is called a bound native thread. A Haskell thread is always executed by a native thread. This specification places absolutely no restrictions on which native thread is used to execute a particular Haskell thread. The Haskell thread need not be associated with the native thread used to execute it, and one Haskell thread may be executed by more than one native thread during its lifetime [but not by several native threads at once]. A bound native thread may not be used for executing any Haskell thread except the one it is bound to. It is implementation dependent whether the main thread, threads created using forkIO and threads created for running finalizers or signal handlers are bound or not. When a foreign imported function is invoked [by Haskell code], the foreign code is executed in the native thread associated with the current Haskell thread, if an association exists. If the current Haskell thread is not associated to a native thread, the implementation may decide which native thread to run the foreign function in. The native thread that is used may not be bound to another Haskell thread. The existing distinction between unsafe, safe and threadsafe calls remains unchanged. There are now two kinds of foreign export and foreign import "wrapper" declarations: bound and free. The FFI syntax should be extended appropriately [which of the two should be the default, if any?]. Bound foreign exported functions should be executed in a Haskell thread bound to the native thread that invoked the foreign exported function. A "free" foreign export may be executed in any kind of Haskell thread. A new library routine, forkNativeThread :: IO () -> IO ThreadID, should spawn a new Haskell Thread (like forkIO) and associate it with a new native thread (forkIO is not guaranteed to do this). It may be implemented using the FFI and an OS-specific thread creation routine. It would just pass a "bound" callback as an entry point for a new OS thread. Issues ~~~~~~ Finalizers and signal handlers cannot be associated with a particular native thread. If they have to trigger an action in a particular native thread, a message has to be sent manually (via MVars and friends) to the Haskell thread associated with the native thread in question. I think we'll have to live with this. Does anyone have a better idea? This introduces a change in the syntax for foreign export and foreign import "wrapper" declarations (a bound/free specifier is added). I think we should have a default option here. I'm not sure which, however. Also, the objection that "bound" and "free" can be confused with the lambda calculus terms still holds. Implementations ~~~~~~~~~~~~~~~ Here are some examples of how the specification might be implemented. They should not be considered an actual part of the specification. 1) Let's assume we have a haskell system that has used OS native threads from the start. Every call to forkIO creates a new OS thread. The OS is responsible for all scheduling. Now we want to add support for [my version of] the proposal to this implementation. This should be trivial to do: A foreign call should be just a call, and a callback should just start executing Haskell code in the current OS thread. This implementation would treat all foreign exports as bound ("the implementation may freely choose what kind of Haskell thread the function is executed in"). All "safe" calls will probably be treated as "threadsafe" (after all, it's no use blocking other threads). If it weren't for the performance problems, this would be the ideal solution for me. 2) Let's assume we have a haskell system that executes all Haskell code in one thread and does its own scheduling between those threads. Now we want to add support for [my version of] the proposal. We do not want to move execution of Haskell code to different threads. We are not concerned about performance. In this case, we would keep track of the association between Haskell threads and "foreign" OS threads (here, the term "foreign thread" seems to fit very well). If the Haskell code calls a foreign imported function, a message is sent to the associated foreign thread (a new foreign thread is created if necessary). If a foreign exported function is called, it just signals the "Haskell runtime thread". The performance would be better than 1) as long as no foreign functions are involved. When the ffi is used, performance gets worse. 3) "The Middle Way", i.e. what I think should be implemented for GHC. The following are just fragments of thoughts, don't expect it to be complete yet: * There is a global lock [iirc, that's the Capability in the GHC RTS] which prevents several haskell threads from running truly concurrently. * Each bound Haskell thread is executed by its associated native thread. * Each bound native thread is executing at most one piece of code at a time, i.e. there is no scheduling going on inside the bound native thread. * When a bound foreign export is invoked, the RTS creates a new Haskell thread bound to the current OS thread. The following things are unchanged (if any of those things is not currently the case, please correct me): * Unsafe calls are just plain old function calls * All unbound Haskell threads are executed by a so-called "worker thread". When an unbound Haskell thread calls a threadsafe imported function, a new worker thread is created. * when an unbound foreign export is invoked, the RTS creates a new unbound Haskell thread.