Re: proptotype of make style dep stuff

29 Oct 2007

      On Fri, 2007-10-26 at 10:48 +0000, Duncan Coutts wrote:
...
Note that this is where we have the opportunity for parallelism as we
can pick all the rules with no dependencies. For the moment we will just
pick one rule.
Speaking of parallel builds...

Spencer Janssen, Lennart Kolmodin and I were discussing possible
approaches to integrate parallel builds into the this framework.

Some requirements we thought of were that it should fit with the
approach of being agnostic/parametrised by the underlying build monad.
It should not rely on any actual parallelism happening, only on there
being parallelism available for the underlying build monad to exploit.

We'd like something simpler than forkIO + MVars.

We'd like to be able to implement simple single threaded versions
easily.

We'd like to be able to test/simulate different schedulers in the pure
context to be sure that we have no bugs that depend on the order of
parallel jobs. So it should be possible for us to implement demonic job
schedulers for testing.

Spencer suggested an api based on launching jobs and collecting
completed jobs:

So a  JobControl m a  gives us a way to launch jobs in a monad 'm' that
have result type 'a':

data JobControl m a = JobControl {
    launch  :: m a -> m (),
    collect :: m a
  }

We'd then pass one of these as a parameter to make, just like we already
pass a DepGenerator. Just like the DepGenerator it's actions will be in
the same monad as we're working in overall.

make :: Monad m
      => DepGenerator m
      -> JobControl m (Something)
      -> Graph m
      -> m ()

For make we'd be using some specific type for the jobs. It'd probably be
something like the result of Rule actions, paired with enough info to
identify which rule it is that was completed.

Informally, the semantics of launch / collect would be...
We use launch to create a new job. We use collect to get the result of
*any* launched job. Launched jobs may complete in any order. Collect
when there are no launched jobs is an error (if this proves inconvenient
we could change this to have collect return Nothing when there are no
jobs). If a job fails in the monad, the failure is returned in the monad
when the job is collected.

So a simple serial implementation might look like:

serialJobControl :: MonadState [m a] m => JobControl m a
serialJobControl = JobControl {
    launch = \job -> modify (job:),
    collect = do (job:jobs) <- get
                 put jobs
                 job
  }

So that just maintains a stack of jobs, adds to the stack on launch and
runs a job when we collect. It's not in order, but that does not matter
since any order will do.

An implementation in a monad based on IO would use an implementation
using forkIO and MVars. It'd probably just fork each new launched job.
It'd probably use a semaphore to control the degree of parallelism.
Results of completed jobs could be submitted to a channel which collect
could read from.

One thing all this depends on in practise is the ability to launch
multiple child processes and wait for the first one to complete without
blocking the whole process. Currently that's not something we can do
with the portable System.Process module. The waitForProcess function
blocks the entire process. Even with ghc's threaded rts we can only do
it at the cost of one OS thread for each process we are waiting on. That
does not seem like a sensible model. It really should be possible to
manage a collection of child processes without using any OS level
concurrency. In ghc on unix we should be able to do this by arranging
for SIGCHILD signals to be sent when child processes complete.

Duncan