RE: The goals of the concurrency standard?

12 Apr 2006

      On 12 April 2006 07:46, oleg@pobox.com wrote:
...
I am afraid I must ask for the clarification of the goals of this
discrimination between pre-emptive and cooperative scheduling -- which
leads me to further questions.
One one hand, I understand the motivation: providing guidance to the
end programmer. In a cooperative thread system, the programmer just
can't write code imagining the current thread is the only one. If the
thread is to compute the Nth Mersenne prime, it would effectively hang
the whole system. So, the user must specifically insert calls to yield
-- which means that the whole code has to be converted into a monadic
style, because yield is most certainly a monadic operation _and_
because we must be sure that computations before yield are indeed
finished, rather than snowball up to the very end. So, we have to
re-write pure functional code (like the Mersenne prime computation)
into a monadic one, and thus explicitly serialize it. That negates one
of the significant benefits of Haskell -- no longer code looks like a
specification or mathematical notation. If we're forced to write
everything in the A-normal form, we might as well use a better
language for that, like SML.
In a preemptive thread system, we can indeed program a thread as if it
were the only thread in the system -- and the scheduler would do the
rest.
Problem solved? Only if threads do not interact, which is not
likely. At least they have to write something, and so contend for the
access to stdout (assuming multi-line output). A thread running a long
computation while holding a critical resource can just as well hang
the whole system, pre-emptive scheduler notwithstanding. Lazy
evaluation of Haskell makes this problem quite subtle:
do
  r <- return (mersenne' 44)
  acquire'lock
  putStrLn "Great success"
  print r
  release'lock
One may think that the long computation occurs in the "r <- ..." line.
Alas, it may just as well occur in the "print r" line, when the digits
of the number are really required to print. So, we will be running a
long computation while holding a lock. This isn't much better than the
situation with the cooperative scheduling, is it?
In cases where this happens, and it *does* happen, the programmer has to
insert explicit seq or deepSeq.  It's not a huge problem, at least no
worse than the need to use seq/deepSeq to control resource usage or
exception leakage.  The code inside a lock/unlock sequence is usually
quite short, and we can enumerate the free variables quite easily.

One example where this happened in GHC is in the definition of putStr
itself: our first implementation of putStr took the Handle lock, and
then proceeded to traverse the string argument, filling the Handle's
buffer with the characters.  Unfortunately traversing the string takes
arbitrary time, and might even invoke further putStrs (inside
unsafePerformIO), so we had to rethink.  The easy way is to deepSeq the
string first, but that means traversing the string twice, and we don't
like to make putStr any less efficient than it already is.  So currently
putStr works by grabbing a buffer from the Handle, and filling this
buffer without holding the Handle lock.  When the buffer is full, it
gets "committed".  The Handle keeps a list of free buffers to avoid
having to allocate a new one for each putStr.  One side-effect of this
is that putStrs from multiple threads can be interleaved in strange
ways.

Fortunately STM helps a lot here, too.  (which doesn't mean much for
Haskell', but at least it means we have a solution without completely
redesigning Haskell').  If a transaction takes a long time because it is
busy evaluating free variables, that doesn't prevent other transactions
from completing.  All that happens is the long-running transaction will
probably be aborted due to conflicts with other transactions, but the
next time it runs it will probably complete because the work already
done isn't discarded.  Conclusion: the programmer doesn't have to think
about it.

Cheers,
	Simon

RE: The goals of the concurrency standard?

Simon Marlow