Re: [Haskell-cafe] Re: Channel9 Interview: Software Composability and theFu ture of Languages

29 Jan 2007

      Benjamin Franksen wrote:
...
Chris Kuklewicz wrote:
...
Aside on utterly useful proofs: When you write concurrent programs you
want an API with strong and useful guarantees so you can avoid deadlocks
and starvation and conflicting data read/writes.  Designing an using such
an API is a reasoning exercise identical to creating proofs.  Some systems
makes this possible and even easy (such as using STM).
I have to -- partly -- disagree with your last statement. It is possible
that I missed something important but as I see it, STM has by construction
the disadvantage that absence of starvation is quite hard (if not
impossible) to prove. A memory transaction must be re-started from the
beginning if it finds (when trying to commit) another task has modified one
of the TVars it depends on. This might mean that a long transaction may in
fact /never/ commit if it gets interrupted often enough by short
transactions which modify one of the TVars the longer transaction depends
upon.
True.

The STM API ensures the program will be deadlock free, the mutable state is
consistent, and that progress will always be made (i.e. _some block_ will commit).

The current STM implementation does not otherwise ensure fairness or avoid
starvation.  It is clever enough to put retrying blocks to sleep until one of
the TVars they read is changed.  But it does not have the single-wake up FIFO
fairness of MVars.
...
IIRC this problem is mentioned in the original STM paper only in passing
("Starvation is surely a possibility" or some such comment). I would be
very interested to hear if there has been any investigation of the
consequences with regard to proving progress guarantees for programs that
use STM.
The STM API ensures progress for the program as a whole (i.e. commits).  If a
block is failed because of a conflict this is because another block has
committed.  If all the threads of the program call retry then the program is
stuck, but that is not STM's fault, and the STM scheduler will kill the program
in this case rather than hang.
...
My current take on this is that there might be possibilities avoid this kind
of starvation by appropriate scheduling. One idea is to assign time slices
dynamically, e.g. to double it whenever a task must do a rollback instead
of a commit (due to externally modified TVars), and to reset the time slice
to the default on (successful) commit or exception.
With the new GHC using multiple CPU's this would not prevent a fast block from
running & committing on a different CPU and causing the long block to retry.
...
Another possibility is
to introduce dynamic task priorities and to increase priority on each
unsuccessful commit.
...
(Unfortunately I have no proof that such measures
would remove the danger of starvation, they merely seem plausible avenues
for further research. I also don't know whether the current GHC runtime
system already implements such heuristics.)
Cheers
Ben
What you need to "increase fairness/priority" for a block is to run that block
until it tries to commit while keeping other blocks that would interfere from
committing.

The scheduler could take the runtime of a failed block into account when
re-attempting it.  It could lock out other blocks that might interfere with a
very-long-delayed much-re-attempted-block until it has had a change to commit.
(Since the scheduler knows which TVars the problem block has needed in the past
any independent blocks can still run).  To keep the progress guarantee: if the
problem block takes a very very long time it must be timed out and another block
be allowed to commit.

This partitioning into "fast" vs "slow" blocks is merely based on the
observations of the runtime scheduler.