Plan of Attack for Parallel Builds

I've been looking around and trying to develop a plan for parallelizing builds in cabal-install. Here's my idea so far: - Parallelize executeInstallPlan. When given a target load average as a flag it will determine whether it should spawn a worker (if below the target load average) or wait. If waiting, it will listen to all worker status channels and print out their current build status and the load average. Once a worker exits, it will again check the load average and spawn a new thread if necessary. - Rewrite install.*Package and their callees to use the CHP (Communicating Haskell Process) monad where possible. Use channels to communicate build status back to the main thread. - It might be necessary to parse the output of external builds in some way so that meaningful status can be communicated back to the user. - Add a default parallel build log path template. Allow the user to specify one on the command line to override the default. All output of parallel package builds will be logged in the background silently instead of displayed to the user. - On single-threaded (sequential) builds, revert to the old output style. On multi-threaded builds, display the current status of all running builds, load averages and nothing else. Possible output: Resolving dependencies... Building derive-2.3.0.2... [17 of 58] Building regex-base-0.93.1... [1 of 4] Building dyre-0.8.6... [5 of 7] Configuring xdg-basedir-0.2... [in progress] Dependencies Built: [0 of 9] Load Average: [3.4/4.0] Running 4 Jobs. A possible error message might look like: derive-2.3.0.2 failed during the building phase. Log stored in /home/frank/cabal/logs/build/derive-2.3.0.2.log What does everyone think? Thanks, Frank

On Tue, Mar 29, 2011 at 12:54 PM, Frank Murphy
I've been looking around and trying to develop a plan for parallelizing builds in cabal-install. Here's my idea so far:
[snip]
What does everyone think?
I think it sounds nice and it sounds valuable. I probably wouldn't give it the highest priority of the changes I'd like to see in Cabal, but I'm not the one doing the work :) Good luck! Jason

On Tue, Mar 29, 2011 at 12:54 PM, Frank Murphy
I've been looking around and trying to develop a plan for parallelizing builds in cabal-install.
Something I'd love to see, for sure. But the devil's in the details; see below.
- Parallelize executeInstallPlan. When given a target load average as a flag it will determine whether it should spawn a worker (if below the target load average) or wait.
Load average is a very very bad method to use, because it's very slow to respond to changes in real load on Unix, and it doesn't exist at all on Windows. Do like "make" instead and just accept a "-j" parameter for the maximum number of jobs to run simultaneously. - Rewrite install.*Package and their callees to use the CHP (Communicating
Haskell Process) monad where possible. Use channels to communicate build status back to the main thread.
Why not just use MVar or Chan?
- It might be necessary to parse the output of external builds in some way so that meaningful status can be communicated back to the user.
Yuck, no. Too fragile. Just check the exit status of a process. - Add a default parallel build log path template. Allow the user to specify
one on the command line to override the default. All output of parallel package builds will be logged in the background silently instead of displayed to the user. - On single-threaded (sequential) builds, revert to the old output style. On multi-threaded builds, display the current status of all running builds, load averages and nothing else.
No to both of these, too. It's desirable that build outputs shouldn't scribble on each other via excessive interleaving, but telling people to go read a log file because a build failed is going to give them a very bad experience. For instance, that would defeat the usual way that users of Emacs and other IDEs jump to the first error.

Hi Frank,
Thanks for reaching out and gathering input.
On Tue, Mar 29, 2011 at 9:54 PM, Frank Murphy
- Parallelize executeInstallPlan. When given a target load average as a flag it will determine whether it should spawn a worker (if below the target load average) or wait. If waiting, it will listen to all worker status channels and print out their current build status and the load average. Once a worker exits, it will again check the load average and spawn a new thread if necessary.
I think the most important setting is the number of worker threads (e.g. -jX). Load average sounds like a cool idea but I don't know how well it'll work in practice. Gentoo's Portage uses it so you might snoop around there for more info.
- Rewrite install.*Package and their callees to use the CHP (Communicating Haskell Process) monad where possible. Use channels to communicate build status back to the main thread.
CHP might be a bit overkill, an MVar and a Chan or two should be enough. At least start simple.
- It might be necessary to parse the output of external builds in some way so that meaningful status can be communicated back to the user.
I'm not sure this is worth it and even possible in the general case. See below.
- Add a default parallel build log path template. Allow the user to specify one on the command line to override the default.
I'm not quite sure what you mean here. Do you mean that we'd write "cabal install" logs to e.g. .cabal/logs or something along those lines?
- On single-threaded (sequential) builds, revert to the old output style.
Sounds good. One possible policy would be: If you run "cabal build", you get the old output format (and a single threaded build). If you run "cabal install", you get the new output format, regardless of if the build runs in parallel or not. What do people think? Is it worth displaying all the build output for "cabal install" in the single threaded case? Does the user care to see it? Perhaps it's good for debugging to let single threaded "cabal install" show the old output (i.e. if a parellel build fails, run the single threaded one to get more output).
On multi-threaded builds, display the current status of all running builds, load averages and nothing else. Possible output:
Resolving dependencies... Building derive-2.3.0.2... [17 of 58] Building regex-base-0.93.1... [1 of 4] Building dyre-0.8.6... [5 of 7] Configuring xdg-basedir-0.2... [in progress]
Dependencies Built: [0 of 9] Load Average: [3.4/4.0] Running 4 Jobs.
Cabal allows packages to use any build system they want (e.g. make), which means that we can't know the progress of a single build in the general case. Today, Cabal simply shows the stdout of the build process, whatever it is. This means that we cannot show progress of individual packages. I suggest something like (take from Gentoo's Portage): Building (1 of 9) derive-2.3.0.2... Building (2 of 9) regex-base-0.93.1... Building (3 of 9) dyre-0.8.6... Building (4 of 9) aeson-0.3.2.1... Building (5 of 9) binary-0.5.0.2... Installing derive-2.3.0.2 Installing regex-base-0.93.1 Building (6 of 9) text-0.11.0.6... Installing dyre-0.8.6 Jobs: 3 of 9 complete, 3 running Load avg: 3.44, 1.46, 0.69 We could perhaps make a special case for the Simple build type and parse the GHC output and show progress on individual builds. I don't think it's worth it, at least not initially.
A possible error message might look like:
derive-2.3.0.2 failed during the building phase. Log stored in /home/frank/cabal/logs/build/derive-2.3.0.2.log
For build failures I think we should output the content of the log file to stdout (as one chunk, using a lock to avoid interleaving). This will make it quicker for users to get to the build failure. For successful builds I don't think we need to output more than in the example above. Cheers, Johan

On Wed, Mar 30, 2011 at 10:06 AM, Johan Tibell
I think the most important setting is the number of worker threads (e.g. -jX). Load average sounds like a cool idea but I don't know how well it'll work in practice. Gentoo's Portage uses it so you might snoop around there for more info.
I should have mentioned that there is a good reason to support a load average *and* a job count flag: - Some jobs might be I/O bound for longer periods of time, making it worthwhile to spawn new ones. This is a dynamic property of the process which might be hard for the user to predict before running cabal install. The user does know how many cores he/she has and should be able to give a good load average. - The build system (e.g. make) might itself spawn several jobs, resulting in too many jobs given the number of available cores. Using load average Cabal could scale down the number of jobs it uses in that case. Johan

Hi all,
I'm very much looking forward to a future where cabal install exercises all
my core's with some heavy duty Haskell work ;-) Thanks Frank for taking this
up.
I personally like progress reports on the individual builds very much. I
agree that they are not super important, but nevertheless I think that a
progress report significantly improves the user experience. I also have a
simple, ad-hoc scheme that should result in an OK progress report for most
cases.
Gather patterns of the form
"["<integer>" of "<integer>"]"
in the program output and interpret the resulting sequence such that the
second to last "measurement" is a conservative estimate of the real
progress; i.e.,
progress :: [(Int,Int)] -> Maybe Double
progress xs = case reverse xs of
(_:(i,n):_) -> return (fromIntegral i / fromIntegral n)
_ -> mzero
Probably some more filtering of this sequence is required to cater for
repeated calls to GHC. I guess that, as long as progress never goes from
100% to something below, the user will be happy about the progress estimate.
Moreover, the chance that such a pattern occurs where it doesn't indicate
some interesting progress is reasonably low.
best regards,
Simon
2011/3/30 Johan Tibell
Hi Frank,
Thanks for reaching out and gathering input.
On Tue, Mar 29, 2011 at 9:54 PM, Frank Murphy
wrote: - Parallelize executeInstallPlan. When given a target load average as a flag it will determine whether it should spawn a worker (if below the target load average) or wait. If waiting, it will listen to all worker status channels and print out their current build status and the load average. Once a worker exits, it will again check the load average and spawn a new thread if necessary.
I think the most important setting is the number of worker threads (e.g. -jX). Load average sounds like a cool idea but I don't know how well it'll work in practice. Gentoo's Portage uses it so you might snoop around there for more info.
- Rewrite install.*Package and their callees to use the CHP (Communicating Haskell Process) monad where possible. Use channels to communicate build status back to the main thread.
CHP might be a bit overkill, an MVar and a Chan or two should be enough. At least start simple.
- It might be necessary to parse the output of external builds in some way so that meaningful status can be communicated back to the user.
I'm not sure this is worth it and even possible in the general case. See below.
- Add a default parallel build log path template. Allow the user to specify one on the command line to override the default.
I'm not quite sure what you mean here. Do you mean that we'd write "cabal install" logs to e.g. .cabal/logs or something along those lines?
- On single-threaded (sequential) builds, revert to the old output style.
Sounds good. One possible policy would be: If you run "cabal build", you get the old output format (and a single threaded build). If you run "cabal install", you get the new output format, regardless of if the build runs in parallel or not.
What do people think? Is it worth displaying all the build output for "cabal install" in the single threaded case? Does the user care to see it? Perhaps it's good for debugging to let single threaded "cabal install" show the old output (i.e. if a parellel build fails, run the single threaded one to get more output).
On multi-threaded builds, display the current status of all running builds, load averages and nothing else. Possible output:
Resolving dependencies... Building derive-2.3.0.2... [17 of 58] Building regex-base-0.93.1... [1 of 4] Building dyre-0.8.6... [5 of 7] Configuring xdg-basedir-0.2... [in progress]
Dependencies Built: [0 of 9] Load Average: [3.4/4.0] Running 4 Jobs.
Cabal allows packages to use any build system they want (e.g. make), which means that we can't know the progress of a single build in the general case. Today, Cabal simply shows the stdout of the build process, whatever it is. This means that we cannot show progress of individual packages. I suggest something like (take from Gentoo's Portage):
Building (1 of 9) derive-2.3.0.2... Building (2 of 9) regex-base-0.93.1... Building (3 of 9) dyre-0.8.6... Building (4 of 9) aeson-0.3.2.1... Building (5 of 9) binary-0.5.0.2... Installing derive-2.3.0.2 Installing regex-base-0.93.1 Building (6 of 9) text-0.11.0.6... Installing dyre-0.8.6 Jobs: 3 of 9 complete, 3 running Load avg: 3.44, 1.46, 0.69
We could perhaps make a special case for the Simple build type and parse the GHC output and show progress on individual builds. I don't think it's worth it, at least not initially.
A possible error message might look like:
derive-2.3.0.2 failed during the building phase. Log stored in /home/frank/cabal/logs/build/derive-2.3.0.2.log
For build failures I think we should output the content of the log file to stdout (as one chunk, using a lock to avoid interleaving). This will make it quicker for users to get to the build failure. For successful builds I don't think we need to output more than in the example above.
Cheers, Johan
_______________________________________________ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel

On 30 March 2011 09:37, Simon Meier
Probably some more filtering of this sequence is required to cater for repeated calls to GHC. I guess that, as long as progress never goes from 100% to something below, the user will be happy about the progress estimate. Moreover, the chance that such a pattern occurs where it doesn't indicate some interesting progress is reasonably low.
An coworker of mine back in the day had a great hack for reporting the progress of loading up a GUI application written in his framework. Basically, he watched what logging calls were made by the app. He also kept a record of the strings that got logged *last* time the app started up, as well, and maintained a pointer into it during app startup. If he saw a match (to within some edit distance) between a newly-logged message and previously-logged one occurring **after the pointer** then he moved the pointer forward to that previously-logged entry. He could then get a % estimate of how started the program was by looking at how far the pointer had gone along the log from last time. The really cute thing about this is that as the user of his GUI framework you didn't have to worry about progress reporting at all - just write a few logging calls as you would normally and it just magically worked :-) Unfortunately I can't see a way to apply this to Cabal without a lot of engineering (e.g. Hackage could record the build output stderr/stdout for a package and then transmit it down to a new client building a package). Probably not worth it! Cheers, Max
participants (6)
-
Bryan O'Sullivan
-
Frank Murphy
-
Jason Dagit
-
Johan Tibell
-
Max Bolingbroke
-
Simon Meier