
On Fri, 2007-10-26 at 10:48 +0000, Duncan Coutts wrote:
Note that this is where we have the opportunity for parallelism as we can pick all the rules with no dependencies. For the moment we will just pick one rule.
Speaking of parallel builds... Spencer Janssen, Lennart Kolmodin and I were discussing possible approaches to integrate parallel builds into the this framework. Some requirements we thought of were that it should fit with the approach of being agnostic/parametrised by the underlying build monad. It should not rely on any actual parallelism happening, only on there being parallelism available for the underlying build monad to exploit. We'd like something simpler than forkIO + MVars. We'd like to be able to implement simple single threaded versions easily. We'd like to be able to test/simulate different schedulers in the pure context to be sure that we have no bugs that depend on the order of parallel jobs. So it should be possible for us to implement demonic job schedulers for testing. Spencer suggested an api based on launching jobs and collecting completed jobs: So a JobControl m a gives us a way to launch jobs in a monad 'm' that have result type 'a': data JobControl m a = JobControl { launch :: m a -> m (), collect :: m a } We'd then pass one of these as a parameter to make, just like we already pass a DepGenerator. Just like the DepGenerator it's actions will be in the same monad as we're working in overall. make :: Monad m => DepGenerator m -> JobControl m (Something) -> Graph m -> m () For make we'd be using some specific type for the jobs. It'd probably be something like the result of Rule actions, paired with enough info to identify which rule it is that was completed. Informally, the semantics of launch / collect would be... We use launch to create a new job. We use collect to get the result of *any* launched job. Launched jobs may complete in any order. Collect when there are no launched jobs is an error (if this proves inconvenient we could change this to have collect return Nothing when there are no jobs). If a job fails in the monad, the failure is returned in the monad when the job is collected. So a simple serial implementation might look like: serialJobControl :: MonadState [m a] m => JobControl m a serialJobControl = JobControl { launch = \job -> modify (job:), collect = do (job:jobs) <- get put jobs job } So that just maintains a stack of jobs, adds to the stack on launch and runs a job when we collect. It's not in order, but that does not matter since any order will do. An implementation in a monad based on IO would use an implementation using forkIO and MVars. It'd probably just fork each new launched job. It'd probably use a semaphore to control the degree of parallelism. Results of completed jobs could be submitted to a channel which collect could read from. One thing all this depends on in practise is the ability to launch multiple child processes and wait for the first one to complete without blocking the whole process. Currently that's not something we can do with the portable System.Process module. The waitForProcess function blocks the entire process. Even with ghc's threaded rts we can only do it at the cost of one OS thread for each process we are waiting on. That does not seem like a sensible model. It really should be possible to manage a collection of child processes without using any OS level concurrency. In ghc on unix we should be able to do this by arranging for SIGCHILD signals to be sent when child processes complete. Duncan