
Friends, I've been looking at CI recently again, as I was facing CI turnaround times of 9-12hs; and this just keeps dragging out and making progress hard. The pending pipeline currently has 2 darwin, and 15 windows builds waiting. Windows builds on average take ~220minutes. We have five builders, so we can expect this queue to be done in ~660 minutes assuming perfect scheduling and good performance. That is 11hs! The next windows build can be started in 11hs. Please check my math and tell me I'm wrong! If you submit a MR today, with some luck, you'll be able to know if it will be mergeable some time tomorrow. At which point you can assign it to marge, and marge, if you are lucky and the set of patches she tries to merge together is mergeable, will merge you work into master probably some time on Friday. If a job fails, well you have to start over again. What are our options here? Ben has been pretty clear about not wanting a broken commit for windows to end up in the tree, and I'm there with him. Cheers, Moritz

Hi Moritz, I, too, had my gripes with CI turnaround times in the past. Here's a somewhat radical proposal: - Run "full-build" stage builds only on Marge MRs. Then we can assign to Marge much earlier, but probably have to do a bit more of (manual) bisecting of spoiled Marge batches. - I hope this gets rid of a bit of the friction of small MRs. I recently caught myself wanting to do a bunch of small, independent, but related changes as part of the same MR, simply because it's such a hassle to post them in individual MRs right now and also because it steals so much CI capacity. - Regular MRs should still have the ability to easily run individual builds of what is now the "full-build" stage, similar to how we can run optional "hackage" builds today. This is probably useful to pin down the reason for a spoiled Marge batch. - The CI capacity we free up can probably be used to run a perf build (such as the fedora release build) on the "build" stage (the one where we currently run stack-hadrian-build and the validate-deb9-hadrian build), in parallel. - If we decide against the latter, a micro-optimisation could be to cache the build artifacts of the "lint-base" build and continue the build in the validate-deb9-hadrian build of the "build" stage. The usefulness of this approach depends on how many MRs cause metric changes on different architectures. Another frustrating aspect is that if you want to merge an n-sized chain of dependent changes individually, you have to - Open an MR for each change (initially the last change will be comprised of n commits) - Review first change, turn pipeline green (A) - Assign to Marge, wait for batch to be merged (B) - Review second change, turn pipeline green - Assign to Marge, wait for batch to be merged - ... and so on ... Note that (A) incurs many context switches for the dev and the latency of *at least* one run of CI. And then (B) incurs the latency of *at least* one full-build, if you're lucky and the batch succeeds. I've recently seen batches that were resubmitted by Marge at least 5 times due to spurious CI failures and timeouts. I think this is a huge factor for latency. Although after (A), I should just pop the the patch off my mental stack, that isn't particularly true, because Marge keeps on reminding me when a stack fails or succeeds, both of which require at least some attention from me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next change. Maybe we can also learn from other projects like Rust, GCC or clang, which I haven't had a look at yet. Cheers, Sebastian Am Mi., 17. Feb. 2021 um 09:11 Uhr schrieb Moritz Angermann < moritz.angermann@gmail.com>:
Friends,
I've been looking at CI recently again, as I was facing CI turnaround times of 9-12hs; and this just keeps dragging out and making progress hard.
The pending pipeline currently has 2 darwin, and 15 windows builds waiting. Windows builds on average take ~220minutes. We have five builders, so we can expect this queue to be done in ~660 minutes assuming perfect scheduling and good performance. That is 11hs! The next windows build can be started in 11hs. Please check my math and tell me I'm wrong!
If you submit a MR today, with some luck, you'll be able to know if it will be mergeable some time tomorrow. At which point you can assign it to marge, and marge, if you are lucky and the set of patches she tries to merge together is mergeable, will merge you work into master probably some time on Friday. If a job fails, well you have to start over again.
What are our options here? Ben has been pretty clear about not wanting a broken commit for windows to end up in the tree, and I'm there with him.
Cheers, Moritz _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

At this point I believe we have ample Linux build capacity. Darwin looks
pretty good as well the ~4 M1s we have should in principle also be able to
build x86_64-darwin at acceptable speeds. Although on Big Sur only.
The aarch64-Linux story is a bit constraint by powerful and fast CI
machines but probabaly bearable for the time being. I doubt anyone really
looks at those jobs anyway as they are permitted to fail. If aarch64 would
become a bottle neck, I’d be inclined to just disable them. With the NCG
soon this will likely become much more bearable as wel, even though we
might want to run the nightly llvm builds.
To be frank, I don’t see 9.2 happening in two weeks with the current CI.
If we subtract aarch64-linux and windows builds we could probably do a full
run in less than three hours maybe even less. And that is mostly because we
have a serialized pipeline. I have discussed some ideas with Ben on
prioritizing the first few stages by the faster ci machines to effectively
fail fast and provide feedback.
But yes. Working on ghc right now is quite painful due to long and
unpredictable CI times.
Cheers,
Moritz
On Wed, 17 Feb 2021 at 6:31 PM, Sebastian Graf
Hi Moritz,
I, too, had my gripes with CI turnaround times in the past. Here's a somewhat radical proposal:
- Run "full-build" stage builds only on Marge MRs. Then we can assign to Marge much earlier, but probably have to do a bit more of (manual) bisecting of spoiled Marge batches. - I hope this gets rid of a bit of the friction of small MRs. I recently caught myself wanting to do a bunch of small, independent, but related changes as part of the same MR, simply because it's such a hassle to post them in individual MRs right now and also because it steals so much CI capacity. - Regular MRs should still have the ability to easily run individual builds of what is now the "full-build" stage, similar to how we can run optional "hackage" builds today. This is probably useful to pin down the reason for a spoiled Marge batch. - The CI capacity we free up can probably be used to run a perf build (such as the fedora release build) on the "build" stage (the one where we currently run stack-hadrian-build and the validate-deb9-hadrian build), in parallel. - If we decide against the latter, a micro-optimisation could be to cache the build artifacts of the "lint-base" build and continue the build in the validate-deb9-hadrian build of the "build" stage.
The usefulness of this approach depends on how many MRs cause metric changes on different architectures.
Another frustrating aspect is that if you want to merge an n-sized chain of dependent changes individually, you have to
- Open an MR for each change (initially the last change will be comprised of n commits) - Review first change, turn pipeline green (A) - Assign to Marge, wait for batch to be merged (B) - Review second change, turn pipeline green - Assign to Marge, wait for batch to be merged - ... and so on ...
Note that (A) incurs many context switches for the dev and the latency of *at least* one run of CI. And then (B) incurs the latency of *at least* one full-build, if you're lucky and the batch succeeds. I've recently seen batches that were resubmitted by Marge at least 5 times due to spurious CI failures and timeouts. I think this is a huge factor for latency.
Although after (A), I should just pop the the patch off my mental stack, that isn't particularly true, because Marge keeps on reminding me when a stack fails or succeeds, both of which require at least some attention from me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next change.
Maybe we can also learn from other projects like Rust, GCC or clang, which I haven't had a look at yet.
Cheers, Sebastian
Am Mi., 17. Feb. 2021 um 09:11 Uhr schrieb Moritz Angermann < moritz.angermann@gmail.com>:
Friends,
I've been looking at CI recently again, as I was facing CI turnaround times of 9-12hs; and this just keeps dragging out and making progress hard.
The pending pipeline currently has 2 darwin, and 15 windows builds waiting. Windows builds on average take ~220minutes. We have five builders, so we can expect this queue to be done in ~660 minutes assuming perfect scheduling and good performance. That is 11hs! The next windows build can be started in 11hs. Please check my math and tell me I'm wrong!
If you submit a MR today, with some luck, you'll be able to know if it will be mergeable some time tomorrow. At which point you can assign it to marge, and marge, if you are lucky and the set of patches she tries to merge together is mergeable, will merge you work into master probably some time on Friday. If a job fails, well you have to start over again.
What are our options here? Ben has been pretty clear about not wanting a broken commit for windows to end up in the tree, and I'm there with him.
Cheers, Moritz
_______________________________________________
ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I'm glad to report that my math was off. But it was off only because I
assumed that we'd successfully build all
windows configurations, which we of course don't. Thus some builds fail
faster.
Sylvain also provided a windows machine temporarily, until it expired.
This led to a slew of new windows wibbles.
The CI script Ben wrote, and generously used to help set up the new
builder, seems to assume an older Git install,
and thus a path was broken which thankfully to gitlab led to the brilliant
error of just stalling.
Next up, because we use msys2's pacman to provision the windows builders,
and pacman essentially gives us
symbols for packages to install, we ended up getting a newer autoconf onto
the new builder (and I assume this
will happen with any other builders we add as well). This new autoconf
(which I've also ran into on the M1s) doesn't
like our configure.ac/aclocal.m4 anymore and barfs; I wasn't able to figure
out how to force pacman to install an
older version and *not* give it some odd version suffix (which prevents it
from working as a drop in replacement).
In any case we *must* update our autoconf files. So I guess the time is now.
On Wed, Feb 17, 2021 at 6:58 PM Moritz Angermann
At this point I believe we have ample Linux build capacity. Darwin looks pretty good as well the ~4 M1s we have should in principle also be able to build x86_64-darwin at acceptable speeds. Although on Big Sur only.
The aarch64-Linux story is a bit constraint by powerful and fast CI machines but probabaly bearable for the time being. I doubt anyone really looks at those jobs anyway as they are permitted to fail. If aarch64 would become a bottle neck, I’d be inclined to just disable them. With the NCG soon this will likely become much more bearable as wel, even though we might want to run the nightly llvm builds.
To be frank, I don’t see 9.2 happening in two weeks with the current CI.
If we subtract aarch64-linux and windows builds we could probably do a full run in less than three hours maybe even less. And that is mostly because we have a serialized pipeline. I have discussed some ideas with Ben on prioritizing the first few stages by the faster ci machines to effectively fail fast and provide feedback.
But yes. Working on ghc right now is quite painful due to long and unpredictable CI times.
Cheers, Moritz
On Wed, 17 Feb 2021 at 6:31 PM, Sebastian Graf
wrote: Hi Moritz,
I, too, had my gripes with CI turnaround times in the past. Here's a somewhat radical proposal:
- Run "full-build" stage builds only on Marge MRs. Then we can assign to Marge much earlier, but probably have to do a bit more of (manual) bisecting of spoiled Marge batches. - I hope this gets rid of a bit of the friction of small MRs. I recently caught myself wanting to do a bunch of small, independent, but related changes as part of the same MR, simply because it's such a hassle to post them in individual MRs right now and also because it steals so much CI capacity. - Regular MRs should still have the ability to easily run individual builds of what is now the "full-build" stage, similar to how we can run optional "hackage" builds today. This is probably useful to pin down the reason for a spoiled Marge batch. - The CI capacity we free up can probably be used to run a perf build (such as the fedora release build) on the "build" stage (the one where we currently run stack-hadrian-build and the validate-deb9-hadrian build), in parallel. - If we decide against the latter, a micro-optimisation could be to cache the build artifacts of the "lint-base" build and continue the build in the validate-deb9-hadrian build of the "build" stage.
The usefulness of this approach depends on how many MRs cause metric changes on different architectures.
Another frustrating aspect is that if you want to merge an n-sized chain of dependent changes individually, you have to
- Open an MR for each change (initially the last change will be comprised of n commits) - Review first change, turn pipeline green (A) - Assign to Marge, wait for batch to be merged (B) - Review second change, turn pipeline green - Assign to Marge, wait for batch to be merged - ... and so on ...
Note that (A) incurs many context switches for the dev and the latency of *at least* one run of CI. And then (B) incurs the latency of *at least* one full-build, if you're lucky and the batch succeeds. I've recently seen batches that were resubmitted by Marge at least 5 times due to spurious CI failures and timeouts. I think this is a huge factor for latency.
Although after (A), I should just pop the the patch off my mental stack, that isn't particularly true, because Marge keeps on reminding me when a stack fails or succeeds, both of which require at least some attention from me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next change.
Maybe we can also learn from other projects like Rust, GCC or clang, which I haven't had a look at yet.
Cheers, Sebastian
Am Mi., 17. Feb. 2021 um 09:11 Uhr schrieb Moritz Angermann < moritz.angermann@gmail.com>:
Friends,
I've been looking at CI recently again, as I was facing CI turnaround times of 9-12hs; and this just keeps dragging out and making progress hard.
The pending pipeline currently has 2 darwin, and 15 windows builds waiting. Windows builds on average take ~220minutes. We have five builders, so we can expect this queue to be done in ~660 minutes assuming perfect scheduling and good performance. That is 11hs! The next windows build can be started in 11hs. Please check my math and tell me I'm wrong!
If you submit a MR today, with some luck, you'll be able to know if it will be mergeable some time tomorrow. At which point you can assign it to marge, and marge, if you are lucky and the set of patches she tries to merge together is mergeable, will merge you work into master probably some time on Friday. If a job fails, well you have to start over again.
What are our options here? Ben has been pretty clear about not wanting a broken commit for windows to end up in the tree, and I'm there with him.
Cheers, Moritz
_______________________________________________
ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Moritz Angermann
At this point I believe we have ample Linux build capacity. Darwin looks pretty good as well the ~4 M1s we have should in principle also be able to build x86_64-darwin at acceptable speeds. Although on Big Sur only.
The aarch64-Linux story is a bit constraint by powerful and fast CI machines but probabaly bearable for the time being. I doubt anyone really looks at those jobs anyway as they are permitted to fail.
For the record, I look at this once in a while to make sure that they haven't broken (and usually pick off one or two failures in the process).
If aarch64 would become a bottle neck, I’d be inclined to just disable them. With the NCG soon this will likely become much more bearable as wel, even though we might want to run the nightly llvm builds.
To be frank, I don’t see 9.2 happening in two weeks with the current CI.
I'm not sure what you mean. Is this in reference to your own 9.2-slated work or the release as a whole? Cheers, - Ben

Apologies for the latency here. This thread has required a fair amount of
reflection.
Sebastian Graf
Hi Moritz,
I, too, had my gripes with CI turnaround times in the past. Here's a somewhat radical proposal:
- Run "full-build" stage builds only on Marge MRs. Then we can assign to Marge much earlier, but probably have to do a bit more of (manual) bisecting of spoiled Marge batches. - I hope this gets rid of a bit of the friction of small MRs. I recently caught myself wanting to do a bunch of small, independent, but related changes as part of the same MR, simply because it's such a hassle to post them in individual MRs right now and also because it steals so much CI capacity.
- Regular MRs should still have the ability to easily run individual builds of what is now the "full-build" stage, similar to how we can run optional "hackage" builds today. This is probably useful to pin down the reason for a spoiled Marge batch.
I am torn here. For most of my non-trivial patches I personally don't mind long turnarounds: I walk away and return a day later to see whether anything failed. Spurious failures due to fragile tests make this a bit tiresome, but this is a problem that we are gradually solving (by fixing bugs and marking tests as fragile). However, I agree that small MRs are currently rather painful. On the other hand, diagnosing failed Marge batches is *also* rather tiresome. I am worried that by deferring full validation of MRs we will only exacerbate this problem. Furthermore, I worry that by deferring full validation we run the risk of rather *increasing* the MR turnaround time, since there are entire classes of issues that wouldn't be caught until the MR made it to Marge. Ultimately it's unclear to me whether this proposal would help or hurt. Nevertheless, I am willing to try it. However, if we go this route we should consider what can be done to reduce the incidence of failed Marge batches. One problem that I'm particularly worried about is that of tests with OS-dependent expected output (e.g. `$test_name.stdout-mingw32). I find that people (understandably) forget to update these when updating test output. I suspect that this will be a frequent source of failed Marge batches if we defer full validation. I can see a few ways that would mitigate this: * eliminate platform-dependent output files * introduce a linter that fails if it sees a test with platform-dependent output that doesn't touch all output files * always run the full-build stage on MRs that touch tests with platform-dependent output files Regardless of whether we implement Sebastian's proposal, one smaller measure we could implement to help the problem of small MRs is to introduce some sort of mechanism to mark MRs as "trivial" (e.g. a label or a commit/MR description keyword), which results in the `full-build` being skipped for that MR. Perhaps this would be helpful?
Another frustrating aspect is that if you want to merge an n-sized chain of dependent changes individually, you have to
- Open an MR for each change (initially the last change will be comprised of n commits) - Review first change, turn pipeline green (A) - Assign to Marge, wait for batch to be merged (B) - Review second change, turn pipeline green - Assign to Marge, wait for batch to be merged - ... and so on ...
Note that this (A) incurs many context switches for the dev and the latency of *at least* one run of CI. And then (B) incurs the latency of *at least* one full-build, if you're lucky and the batch succeeds. I've recently seen batches that were resubmitted by Marge at least 5 times due to spurious CI failures and timeouts. I think this is a huge factor for latency.
Although after (A), I should just pop the the patch off my mental stack, that isn't particularly true, because Marge keeps on reminding me when a stack fails or succeeds, both of which require at least some attention from me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next change.
Maybe we can also learn from other projects like Rust, GCC or clang, which I haven't had a look at yet.
I did a bit of digging on this. * Rust: It appears that Rust's CI scheme is somewhat similar to what you proposed above. They do relatively minimal validation of MRs (e.g. https://github.com/rust-lang/rust/runs/1905017693), with a full-validation for merges (e.g. https://github.com/rust-lang-ci/rust/runs/1925049948). The latter usually takes between 3 and 4 hours, with some jobs taking 5 hours. * GCC: As far as I can tell, gcc doesn't actually have any (functional) continuous integration. Discussions with contributors suggest that some companies that employ contributors might have their own private infrastructure, but I don't believe there is anything public. * LLVM: I can't work out whether/how LLVM validates MRs (their Phabricator instance mentions Buildkite, although it appears to be broken). `master` appears to be minimally checked (only Linux/x86-64) via buildbot (http://lab.llvm.org:8011/#/builders/16/builds/6593). These jobs take between 3 and 4 hours, although it's unclear what one shou * Go: Go's appears to have its own homebrew CI infrastructure (https://build.golang.org/) for comprehensive testing of master it's hard to tell how long these take but it's at least two hours. Code review happens by way of Gerrit with integration with some sort of CI. These runs take between 1 and 3 hours and seem to test a fairly comprehensive set of configurations. Cheers, - Ben

I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling. Before getting into these complex trade-offs, I think we should focus on the cornerstone issue that CI isn't incremental. 1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 tracks this more or less. 2. We don't cache between jobs. Shake and Make do not enforce dependency soundness, nor cache-correctness when the build plan itself changes, and this had made this hard/impossible to do safely. Naively this only helps with stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 builds, both can be incremental. Yes, this is also lossy, but I only see it leading to false failures not false acceptances (if we can also test the stage 1 one), so I consider it safe. MRs that only work with a slow full build because ABI can so indicate. The second, main part is quite hard to tackle, but I strongly believe incrementality is what we need most, and what we should remain focused on. John

1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX%2B7ZLNkH1u84NA%2FVMjQc%3D&reserved=0 tracks this more or less.
I don't get this. We have to build GHC before we can test it, don't we?
2 . We don't cache between jobs.
This is, I think, the big one. We endlessly build the exact same binaries.
There is a problem, though. If we make *any* change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt - we can't use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead.
I don't know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR's slightly modified stage1 GHC. That *might* be wrong; but it's usually right.
Anyway, there are big wins to be had here.
Simon
From: ghc-devs

Simon Peyton Jones via ghc-devs
1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 tracks this more or less.
I don't get this. We have to build GHC before we can test it, don't we?
2 . We don't cache between jobs.
This is, I think, the big one. We endlessly build the exact same binaries. There is a problem, though. If we make *any* change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt - we can't use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead.
I don't know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR's slightly modified stage1 GHC. That *might* be wrong; but it's usually right.
The question is: what happens if the it *is* wrong? There are three answers here: a. Allowing the build pipeline to pass despite a build/test failure eliminates most of the benefit of running the job to begin with as allow-failure jobs tend to be ignored. b. Making the pipeline fail leaves the contributor to pick up the pieces of a failure that they may or may not be responsible for, which sounds frustrating indeed. c. Retry the build, but this time from scratch. This is a tantalizing option but carries the risk that we end up doing *more* work than we do now (namely, if all jobs end up running both builds) The only tenable option here in my opinion is (c). It's ugly, but may be viable. Cheers, - Ben

Doing "optimistic caching" like you suggest sounds very promising. A way to regain more robustness would be as follows.
If the build fails while building the libraries or the stage2 compiler, this might be a false negative due to the optimistic caching. Therefore, evict the "optimistic caches" and restart building the libraries. That way we can validate that the build failure was a true build failure and not just due to the aggressive caching scheme.
Just my 2p
Josef
________________________________
From: ghc-devs

Recompilation avoidance I think in order to cache more in CI, we first have to invest some time in fixing recompilation avoidance in our bootstrapped build system. I just tested on a hadrian perf ticky build: Adding one line of *comment* in the compiler causes - a (pretty slow, yet negligible) rebuild of the stage1 compiler - 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It doesn't depend in any way on the change I made) - apparent full rebuild the libraries - apparent full rebuild of the stage2 compiler That took 17 minutes, a full build takes ~45minutes. So there definitely is some caching going on, but not nearly as much as there could be. I know there have been great and boring efforts on compiler determinism in the past, but either it's not good enough or our build system needs fixing. I think a good first step to assert would be to make sure that the hash of the stage1 compiler executable doesn't change if I only change a comment. I'm aware there probably is stuff going on, like embedding configure dates in interface files and executables, that would need to go, but if possible this would be a huge improvement. On the other hand, we can simply tack on a [skip ci] to the commit message, as I did for https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975. Variants like [skip tests] or [frontend] could help to identify which tests to run by default. Lean I had a chat with a colleague about how they do CI for Lean. Apparently, CI turnaround time including tests is generally 25 minutes (~15 minutes for the build) for a complete pipeline, testing 6 different OSes and configurations in parallel: https://github.com/leanprover/lean4/actions/workflows/ci.yml They utilise ccache to cache the clang-based C++-backend, so that they only have to re-run the front- and middle-end. In effect, they take advantage of the fact that the "function" clang, in contrast to the "function" stage1 compiler, stays the same. It's hard to achieve that for GHC, where a complete compiler pipeline comes as one big, fused "function": An external tool can never be certain that a change to Parser.y could not affect the CodeGen phase. Inspired by Lean, the following is a bit inconcrete and imaginary, but maybe we could make it so that compiler phases "sign" parts of the interface file with the binary hash of the respective subcomponents of the phase? E.g., if all the object files that influence CodeGen (that will later be linked into the stage1 compiler) result in a hash of 0xdeadbeef before and after the change to Parser.y, we know we can stop recompiling Data.List with the stage1 compiler when we see that the IR passed to CodeGen didn't change, because the last compile did CodeGen with a stage1 compiler with the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the function CodeGen stayed the same", so we can reuse its cached outputs. Of course, that is utopic without a tool that does the "taint analysis" of which modules in GHC influence CodeGen. Probably just including all the transitive dependencies of GHC.CmmToAsm suffices, but probably that's too crude already. For another example, a change to GHC.Utils.Unique would probably entail a full rebuild of the compiler because it basically affects all compiler phases. There are probably parallels with recompilation avoidance in a language with staged meta-programming. Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs < ghc-devs@haskell.org>:
Doing "optimistic caching" like you suggest sounds very promising. A way to regain more robustness would be as follows. If the build fails while building the libraries or the stage2 compiler, this might be a false negative due to the optimistic caching. Therefore, evict the "optimistic caches" and restart building the libraries. That way we can validate that the build failure was a true build failure and not just due to the aggressive caching scheme.
Just my 2p
Josef
------------------------------ *From:* ghc-devs
on behalf of Simon Peyton Jones via ghc-devs *Sent:* Friday, February 19, 2021 8:57 AM *To:* John Ericson ; ghc-devs < ghc-devs@haskell.org> *Subject:* RE: On CI 1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0 tracks this more or less.
I don’t get this. We have to build GHC before we can test it, don’t we?
2 . We don't cache between jobs.
This is, I think, the big one. We endlessly build the exact same binaries.
There is a problem, though. If we make **any** change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt – we can’t use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead.
I don’t know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR’s slightly modified stage1 GHC. That **might** be wrong; but it’s usually right.
Anyway, there are big wins to be had here.
Simon
*From:* ghc-devs
*On Behalf Of *John Ericson *Sent:* 19 February 2021 03:19 *To:* ghc-devs *Subject:* Re: On CI I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling.
Before getting into these complex trade-offs, I think we should focus on the cornerstone issue that CI isn't incremental.
1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0 tracks this more or less. 2. We don't cache between jobs. Shake and Make do not enforce dependency soundness, nor cache-correctness when the build plan itself changes, and this had made this hard/impossible to do safely. Naively this only helps with stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 builds, both can be incremental. Yes, this is also lossy, but I only see it leading to false failures not false acceptances (if we can also test the stage 1 one), so I consider it safe. MRs that only work with a slow full build because ABI can so indicate.
The second, main part is quite hard to tackle, but I strongly believe incrementality is what we need most, and what we should remain focused on.
John _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

There are some good ideas here, but I want to throw out another one: put all our effort into reducing compile times. There is a loud plea to do this on Discourse https://discourse.haskell.org/t/call-for-ideas-forming-a-technical-agenda/19..., and it would both solve these CI problems and also help everyone else. This isn't to say to stop exploring the ideas here. But since time is mostly fixed, tackling compilation times in general may be the best way out of this. Ben's survey of other projects (thanks!) shows that we're way, way behind in how long our CI takes to run. Richard
On Feb 19, 2021, at 7:20 AM, Sebastian Graf
wrote: Recompilation avoidance
I think in order to cache more in CI, we first have to invest some time in fixing recompilation avoidance in our bootstrapped build system.
I just tested on a hadrian perf ticky build: Adding one line of *comment* in the compiler causes a (pretty slow, yet negligible) rebuild of the stage1 compiler 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It doesn't depend in any way on the change I made) apparent full rebuild the libraries apparent full rebuild of the stage2 compiler That took 17 minutes, a full build takes ~45minutes. So there definitely is some caching going on, but not nearly as much as there could be. I know there have been great and boring efforts on compiler determinism in the past, but either it's not good enough or our build system needs fixing. I think a good first step to assert would be to make sure that the hash of the stage1 compiler executable doesn't change if I only change a comment. I'm aware there probably is stuff going on, like embedding configure dates in interface files and executables, that would need to go, but if possible this would be a huge improvement.
On the other hand, we can simply tack on a [skip ci] to the commit message, as I did for https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975 https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975. Variants like [skip tests] or [frontend] could help to identify which tests to run by default.
Lean
I had a chat with a colleague about how they do CI for Lean. Apparently, CI turnaround time including tests is generally 25 minutes (~15 minutes for the build) for a complete pipeline, testing 6 different OSes and configurations in parallel: https://github.com/leanprover/lean4/actions/workflows/ci.yml https://github.com/leanprover/lean4/actions/workflows/ci.yml They utilise ccache to cache the clang-based C++-backend, so that they only have to re-run the front- and middle-end. In effect, they take advantage of the fact that the "function" clang, in contrast to the "function" stage1 compiler, stays the same. It's hard to achieve that for GHC, where a complete compiler pipeline comes as one big, fused "function": An external tool can never be certain that a change to Parser.y could not affect the CodeGen phase.
Inspired by Lean, the following is a bit inconcrete and imaginary, but maybe we could make it so that compiler phases "sign" parts of the interface file with the binary hash of the respective subcomponents of the phase? E.g., if all the object files that influence CodeGen (that will later be linked into the stage1 compiler) result in a hash of 0xdeadbeef before and after the change to Parser.y, we know we can stop recompiling Data.List with the stage1 compiler when we see that the IR passed to CodeGen didn't change, because the last compile did CodeGen with a stage1 compiler with the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the function CodeGen stayed the same", so we can reuse its cached outputs. Of course, that is utopic without a tool that does the "taint analysis" of which modules in GHC influence CodeGen. Probably just including all the transitive dependencies of GHC.CmmToAsm suffices, but probably that's too crude already. For another example, a change to GHC.Utils.Unique would probably entail a full rebuild of the compiler because it basically affects all compiler phases. There are probably parallels with recompilation avoidance in a language with staged meta-programming.
Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs
mailto:ghc-devs@haskell.org>: Doing "optimistic caching" like you suggest sounds very promising. A way to regain more robustness would be as follows. If the build fails while building the libraries or the stage2 compiler, this might be a false negative due to the optimistic caching. Therefore, evict the "optimistic caches" and restart building the libraries. That way we can validate that the build failure was a true build failure and not just due to the aggressive caching scheme. Just my 2p
Josef
From: ghc-devs
mailto:ghc-devs-bounces@haskell.org> on behalf of Simon Peyton Jones via ghc-devs mailto:ghc-devs@haskell.org> Sent: Friday, February 19, 2021 8:57 AM To: John Ericson ; ghc-devs mailto:ghc-devs@haskell.org> Subject: RE: On CI Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0 tracks this more or less. I don’t get this. We have to build GHC before we can test it, don’t we? 2 . We don't cache between jobs. This is, I think, the big one. We endlessly build the exact same binaries. There is a problem, though. If we make *any* change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt – we can’t use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead. I don’t know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products. However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR’s slightly modified stage1 GHC. That *might* be wrong; but it’s usually right. Anyway, there are big wins to be had here. Simon
From: ghc-devs
mailto:ghc-devs-bounces@haskell.org> On Behalf Of John Ericson Sent: 19 February 2021 03:19 To: ghc-devs mailto:ghc-devs@haskell.org> Subject: Re: On CI I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling.
Before getting into these complex trade-offs, I think we should focus on the cornerstone issue that CI isn't incremental.
Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0 tracks this more or less. We don't cache between jobs. Shake and Make do not enforce dependency soundness, nor cache-correctness when the build plan itself changes, and this had made this hard/impossible to do safely. Naively this only helps with stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 builds, both can be incremental. Yes, this is also lossy, but I only see it leading to false failures not false acceptances (if we can also test the stage 1 one), so I consider it safe. MRs that only work with a slow full build because ABI can so indicate. The second, main part is quite hard to tackle, but I strongly believe incrementality is what we need most, and what we should remain focused on. John
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I'm not opposed to some effort going into this, but I would strongly opposite putting all our effort there. Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly parallel. There simply no way optimizations to the compiler independent from sharing a cache between CI runs can get anywhere close to that return on investment. (FWIW, I'm also skeptical that the people complaining about GHC performance know what's hurting them most. For example, after non-incrementality, the next slowest thing is linking, which is...not done by GHC! But all that is a separate conversation.) John On 2/19/21 2:42 PM, Richard Eisenberg wrote:
There are some good ideas here, but I want to throw out another one: put all our effort into reducing compile times. There is a loud plea to do this on Discourse https://discourse.haskell.org/t/call-for-ideas-forming-a-technical-agenda/19..., and it would both solve these CI problems and also help everyone else.
This isn't to say to stop exploring the ideas here. But since time is mostly fixed, tackling compilation times in general may be the best way out of this. Ben's survey of other projects (thanks!) shows that we're way, way behind in how long our CI takes to run.
Richard
On Feb 19, 2021, at 7:20 AM, Sebastian Graf
mailto:sgraf1337@gmail.com> wrote: Recompilation avoidance
I think in order to cache more in CI, we first have to invest some time in fixing recompilation avoidance in our bootstrapped build system.
I just tested on a hadrian perf ticky build: Adding one line of *comment* in the compiler causes
* a (pretty slow, yet negligible) rebuild of the stage1 compiler * 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It doesn't depend in any way on the change I made) * apparent full rebuild the libraries * apparent full rebuild of the stage2 compiler
That took 17 minutes, a full build takes ~45minutes. So there definitely is some caching going on, but not nearly as much as there could be. I know there have been great and boring efforts on compiler determinism in the past, but either it's not good enough or our build system needs fixing. I think a good first step to assert would be to make sure that the hash of the stage1 compiler executable doesn't change if I only change a comment. I'm aware there probably is stuff going on, like embedding configure dates in interface files and executables, that would need to go, but if possible this would be a huge improvement.
On the other hand, we can simply tack on a [skip ci] to the commit message, as I did for https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975 https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975. Variants like [skip tests] or [frontend] could help to identify which tests to run by default.
Lean
I had a chat with a colleague about how they do CI for Lean. Apparently, CI turnaround time including tests is generally 25 minutes (~15 minutes for the build) for a complete pipeline, testing 6 different OSes and configurations in parallel: https://github.com/leanprover/lean4/actions/workflows/ci.yml https://github.com/leanprover/lean4/actions/workflows/ci.yml They utilise ccache to cache the clang-based C++-backend, so that they only have to re-run the front- and middle-end. In effect, they take advantage of the fact that the "function" clang, in contrast to the "function" stage1 compiler, stays the same. It's hard to achieve that for GHC, where a complete compiler pipeline comes as one big, fused "function": An external tool can never be certain that a change to Parser.y could not affect the CodeGen phase.
Inspired by Lean, the following is a bit inconcrete and imaginary, but maybe we could make it so that compiler phases "sign" parts of the interface file with the binary hash of the respective subcomponents of the phase? E.g., if all the object files that influence CodeGen (that will later be linked into the stage1 compiler) result in a hash of 0xdeadbeef before and after the change to Parser.y, we know we can stop recompiling Data.List with the stage1 compiler when we see that the IR passed to CodeGen didn't change, because the last compile did CodeGen with a stage1 compiler with the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the function CodeGen stayed the same", so we can reuse its cached outputs. Of course, that is utopic without a tool that does the "taint analysis" of which modules in GHC influence CodeGen. Probably just including all the transitive dependencies of GHC.CmmToAsm suffices, but probably that's too crude already. For another example, a change to GHC.Utils.Unique would probably entail a full rebuild of the compiler because it basically affects all compiler phases. There are probably parallels with recompilation avoidance in a language with staged meta-programming.
Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs
mailto:ghc-devs@haskell.org>: Doing "optimistic caching" like you suggest sounds very promising. A way to regain more robustness would be as follows. If the build fails while building the libraries or the stage2 compiler, this might be a false negative due to the optimistic caching. Therefore, evict the "optimistic caches" and restart building the libraries. That way we can validate that the build failure was a true build failure and not just due to the aggressive caching scheme.
Just my 2p
Josef
------------------------------------------------------------------------ *From:* ghc-devs
mailto:ghc-devs-bounces@haskell.org> on behalf of Simon Peyton Jones via ghc-devs mailto:ghc-devs@haskell.org> *Sent:* Friday, February 19, 2021 8:57 AM *To:* John Ericson mailto:john.ericson@obsidian.systems>; ghc-devs mailto:ghc-devs@haskell.org> *Subject:* RE: On CI 1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0 tracks this more or less.
I don’t get this. We have to build GHC before we can test it, don’t we? 2 . We don't cache between jobs. This is, I think, the big one. We endlessly build the exact same binaries. There is a problem, though. If we make **any** change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt – we can’t use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead. I don’t know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products. However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR’s slightly modified stage1 GHC. That **might** be wrong; but it’s usually right. Anyway, there are big wins to be had here. Simon
*From:*ghc-devs
mailto:ghc-devs-bounces@haskell.org> *On Behalf Of *John Ericson *Sent:* 19 February 2021 03:19 *To:* ghc-devs mailto:ghc-devs@haskell.org> *Subject:* Re: On CI I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling.
Before getting into these complex trade-offs, I think we should focus on the cornerstone issue that CI isn't incremental.
1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0 tracks this more or less. 2. We don't cache between jobs. Shake and Make do not enforce dependency soundness, nor cache-correctness when the build plan itself changes, and this had made this hard/impossible to do safely. Naively this only helps with stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 builds, both can be incremental. Yes, this is also lossy, but I only see it leading to false failures not false acceptances (if we can also test the stage 1 one), so I consider it safe. MRs that only work with a slow full build because ABI can so indicate.
The second, main part is quite hard to tackle, but I strongly believe incrementality is what we need most, and what we should remain focused on.
John
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly parallel. There simply no way optimizations to the compiler independent from sharing a cache between CI runs can get anywhere close to that return on investment.
I rather agree with this. I don't think there is much low-hanging fruit on compile times, aside from coercion-zapping which we are working on anyway. If we got a 10% reduction in compile time we'd be over the moon, but our users would barely notice.
To get truly substantial improvements (a factor of 2 or 10) I think we need to do less compiling - hence incremental CI.
Simon
From: ghc-devs

Let me know if I'm talking nonsense, but I believe that we are building both stages for each architecture and flavour. Do we need to build two stages everywhere? What stops us from building a single stage? And if anything, what can we change to get into a situation where we can? Quite better than reusing build incrementally, is not building at all. On Mon, Feb 22, 2021 at 10:09 AM Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:
Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly parallel. There simply no way optimizations to the compiler independent from sharing a cache between CI runs can get anywhere close to that return on investment.
I rather agree with this. I don’t think there is much low-hanging fruit on compile times, aside from coercion-zapping which we are working on anyway. If we got a 10% reduction in compile time we’d be over the moon, but our users would barely notice.
To get truly substantial improvements (a factor of 2 or 10) I think we need to do less compiling – hence incremental CI.
Simon
*From:* ghc-devs
*On Behalf Of *John Ericson *Sent:* 22 February 2021 05:53 *To:* ghc-devs *Subject:* Re: On CI I'm not opposed to some effort going into this, but I would strongly opposite putting all our effort there. Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly parallel. There simply no way optimizations to the compiler independent from sharing a cache between CI runs can get anywhere close to that return on investment.
(FWIW, I'm also skeptical that the people complaining about GHC performance know what's hurting them most. For example, after non-incrementality, the next slowest thing is linking, which is...not done by GHC! But all that is a separate conversation.)
John
On 2/19/21 2:42 PM, Richard Eisenberg wrote:
There are some good ideas here, but I want to throw out another one: put all our effort into reducing compile times. There is a loud plea to do this on Discourse https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D&reserved=0, and it would both solve these CI problems and also help everyone else.
This isn't to say to stop exploring the ideas here. But since time is mostly fixed, tackling compilation times in general may be the best way out of this. Ben's survey of other projects (thanks!) shows that we're way, way behind in how long our CI takes to run.
Richard
On Feb 19, 2021, at 7:20 AM, Sebastian Graf
wrote: Recompilation avoidance
I think in order to cache more in CI, we first have to invest some time in fixing recompilation avoidance in our bootstrapped build system.
I just tested on a hadrian perf ticky build: Adding one line of *comment* in the compiler causes
- a (pretty slow, yet negligible) rebuild of the stage1 compiler - 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It doesn't depend in any way on the change I made) - apparent full rebuild the libraries - apparent full rebuild of the stage2 compiler
That took 17 minutes, a full build takes ~45minutes. So there definitely is some caching going on, but not nearly as much as there could be.
I know there have been great and boring efforts on compiler determinism in the past, but either it's not good enough or our build system needs fixing.
I think a good first step to assert would be to make sure that the hash of the stage1 compiler executable doesn't change if I only change a comment.
I'm aware there probably is stuff going on, like embedding configure dates in interface files and executables, that would need to go, but if possible this would be a huge improvement.
On the other hand, we can simply tack on a [skip ci] to the commit message, as I did for https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bgT0LeZXjF%2BMklzctvZL6WaVpaddN7%2FSpojcEXGXv7Q%3D&reserved=0. Variants like [skip tests] or [frontend] could help to identify which tests to run by default.
Lean
I had a chat with a colleague about how they do CI for Lean. Apparently, CI turnaround time including tests is generally 25 minutes (~15 minutes for the build) for a complete pipeline, testing 6 different OSes and configurations in parallel: https://github.com/leanprover/lean4/actions/workflows/ci.yml https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fleanprover%2Flean4%2Factions%2Fworkflows%2Fci.yml&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9MEWPlRhO2xZK2iu5OqzXS9RZqc9pKNJcGDv7Nj3hyA%3D&reserved=0
They utilise ccache to cache the clang-based C++-backend, so that they only have to re-run the front- and middle-end. In effect, they take advantage of the fact that the "function" clang, in contrast to the "function" stage1 compiler, stays the same.
It's hard to achieve that for GHC, where a complete compiler pipeline comes as one big, fused "function": An external tool can never be certain that a change to Parser.y could not affect the CodeGen phase.
Inspired by Lean, the following is a bit inconcrete and imaginary, but maybe we could make it so that compiler phases "sign" parts of the interface file with the binary hash of the respective subcomponents of the phase?
E.g., if all the object files that influence CodeGen (that will later be linked into the stage1 compiler) result in a hash of 0xdeadbeef before and after the change to Parser.y, we know we can stop recompiling Data.List with the stage1 compiler when we see that the IR passed to CodeGen didn't change, because the last compile did CodeGen with a stage1 compiler with the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the function CodeGen stayed the same", so we can reuse its cached outputs.
Of course, that is utopic without a tool that does the "taint analysis" of which modules in GHC influence CodeGen. Probably just including all the transitive dependencies of GHC.CmmToAsm suffices, but probably that's too crude already. For another example, a change to GHC.Utils.Unique would probably entail a full rebuild of the compiler because it basically affects all compiler phases.
There are probably parallels with recompilation avoidance in a language with staged meta-programming.
Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs
: Doing "optimistic caching" like you suggest sounds very promising. A way to regain more robustness would be as follows.
If the build fails while building the libraries or the stage2 compiler, this might be a false negative due to the optimistic caching. Therefore, evict the "optimistic caches" and restart building the libraries. That way we can validate that the build failure was a true build failure and not just due to the aggressive caching scheme.
Just my 2p
Josef
------------------------------
*From:* ghc-devs
on behalf of Simon Peyton Jones via ghc-devs *Sent:* Friday, February 19, 2021 8:57 AM *To:* John Ericson ; ghc-devs < ghc-devs@haskell.org> *Subject:* RE: On CI 1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Nm6vfgGLLlJpiGa8XKxI6kNkBetp8ZZLPZS8hF%2BydrM%3D&reserved=0 tracks this more or less.
I don’t get this. We have to build GHC before we can test it, don’t we?
2 . We don't cache between jobs.
This is, I think, the big one. We endlessly build the exact same binaries.
There is a problem, though. If we make **any** change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt – we can’t use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead.
I don’t know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR’s slightly modified stage1 GHC. That **might** be wrong; but it’s usually right.
Anyway, there are big wins to be had here.
Simon
*From:* ghc-devs
*On Behalf Of *John Ericson *Sent:* 19 February 2021 03:19 *To:* ghc-devs *Subject:* Re: On CI I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling.
Before getting into these complex trade-offs, I think we should focus on the cornerstone issue that CI isn't incremental.
1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691150320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KlQGW1amK%2BtlRTGl4cDgMyl%2Bfz17fuUAHFNAaNXbzZI%3D&reserved=0 tracks this more or less. 2. We don't cache between jobs. Shake and Make do not enforce dependency soundness, nor cache-correctness when the build plan itself changes, and this had made this hard/impossible to do safely. Naively this only helps with stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 builds, both can be incremental. Yes, this is also lossy, but I only see it leading to false failures not false acceptances (if we can also test the stage 1 one), so I consider it safe. MRs that only work with a slow full build because ABI can so indicate.
The second, main part is quite hard to tackle, but I strongly believe incrementality is what we need most, and what we should remain focused on.
John
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691170308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Yrob9grqAWOxZnFXcM%2BZ60VNsrhIejcmwkSIR3Wq0gA%3D&reserved=0
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I agree one should be able to get most of the testing value from stage1. And the tooling team at IOHK has done some work in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3652 to allow a stage 1 compiler to be tested. That's a very important first step! But TH and GHCi require either iserv (external interpreter) or an compiler whose own ABI and the outputted ABI match for the internal interpreter, and ideally we should test both. I think doing a --freeze1 stage2 build *in addition* to the stage1 build would work in the majority of cases, and that would allow us to incrementally build and test both. Remember that iserv uses the ghc library, and needs to be ABI comparable with the stage1 compiler that is using it, so it is less a panacea than it might seem like for ABI changes vs mere cross compilation. I opened https://github.com/ghc-proposals/ghc-proposals/issues/162 for an ABI-agnostic interpreter that would allow stage1 alone to do GHCi and TH a third away unconditionally. This would also allow TH to safely be used in GHC itself, but for the purposes of this discussion, it's nice to make testing more reliable without the --freeze1 stage 2 gamble. Bottom line is, yes, building stage 2 from a freshly-built stage 1 will invalidate any cache, and so we should avoid that. John On 2/22/21 8:42 AM, Spiwack, Arnaud wrote:
Let me know if I'm talking nonsense, but I believe that we are building both stages for each architecture and flavour. Do we need to build two stages everywhere? What stops us from building a single stage? And if anything, what can we change to get into a situation where we can?
Quite better than reusing build incrementally, is not building at all.
On Mon, Feb 22, 2021 at 10:09 AM Simon Peyton Jones via ghc-devs
mailto:ghc-devs@haskell.org> wrote: Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly parallel. There simply no way optimizations to the compiler independent from sharing a cache between CI runs can get anywhere close to that return on investment.
I rather agree with this. I don’t think there is much low-hanging fruit on compile times, aside from coercion-zapping which we are working on anyway. If we got a 10% reduction in compile time we’d be over the moon, but our users would barely notice.
To get truly substantial improvements (a factor of 2 or 10) I think we need to do less compiling – hence incremental CI.
Simon
*From:*ghc-devs
mailto:ghc-devs-bounces@haskell.org> *On Behalf Of *John Ericson *Sent:* 22 February 2021 05:53 *To:* ghc-devs mailto:ghc-devs@haskell.org> *Subject:* Re: On CI I'm not opposed to some effort going into this, but I would strongly opposite putting all our effort there. Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly parallel. There simply no way optimizations to the compiler independent from sharing a cache between CI runs can get anywhere close to that return on investment.
(FWIW, I'm also skeptical that the people complaining about GHC performance know what's hurting them most. For example, after non-incrementality, the next slowest thing is linking, which is...not done by GHC! But all that is a separate conversation.)
John
On 2/19/21 2:42 PM, Richard Eisenberg wrote:
There are some good ideas here, but I want to throw out another one: put all our effort into reducing compile times. There is a loud plea to do this on Discourse https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D&reserved=0, and it would both solve these CI problems and also help everyone else.
This isn't to say to stop exploring the ideas here. But since time is mostly fixed, tackling compilation times in general may be the best way out of this. Ben's survey of other projects (thanks!) shows that we're way, way behind in how long our CI takes to run.
Richard
On Feb 19, 2021, at 7:20 AM, Sebastian Graf
mailto:sgraf1337@gmail.com> wrote: Recompilation avoidance
I think in order to cache more in CI, we first have to invest some time in fixing recompilation avoidance in our bootstrapped build system.
I just tested on a hadrian perf ticky build: Adding one line of *comment* in the compiler causes
* a (pretty slow, yet negligible) rebuild of the stage1 compiler * 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It doesn't depend in any way on the change I made) * apparent full rebuild the libraries * apparent full rebuild of the stage2 compiler
That took 17 minutes, a full build takes ~45minutes. So there definitely is some caching going on, but not nearly as much as there could be.
I know there have been great and boring efforts on compiler determinism in the past, but either it's not good enough or our build system needs fixing.
I think a good first step to assert would be to make sure that the hash of the stage1 compiler executable doesn't change if I only change a comment.
I'm aware there probably is stuff going on, like embedding configure dates in interface files and executables, that would need to go, but if possible this would be a huge improvement.
On the other hand, we can simply tack on a [skip ci] to the commit message, as I did for https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bgT0LeZXjF%2BMklzctvZL6WaVpaddN7%2FSpojcEXGXv7Q%3D&reserved=0. Variants like [skip tests] or [frontend] could help to identify which tests to run by default.
Lean
I had a chat with a colleague about how they do CI for Lean. Apparently, CI turnaround time including tests is generally 25 minutes (~15 minutes for the build) for a complete pipeline, testing 6 different OSes and configurations in parallel: https://github.com/leanprover/lean4/actions/workflows/ci.yml https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fleanprover%2Flean4%2Factions%2Fworkflows%2Fci.yml&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9MEWPlRhO2xZK2iu5OqzXS9RZqc9pKNJcGDv7Nj3hyA%3D&reserved=0
They utilise ccache to cache the clang-based C++-backend, so that they only have to re-run the front- and middle-end. In effect, they take advantage of the fact that the "function" clang, in contrast to the "function" stage1 compiler, stays the same.
It's hard to achieve that for GHC, where a complete compiler pipeline comes as one big, fused "function": An external tool can never be certain that a change to Parser.y could not affect the CodeGen phase.
Inspired by Lean, the following is a bit inconcrete and imaginary, but maybe we could make it so that compiler phases "sign" parts of the interface file with the binary hash of the respective subcomponents of the phase?
E.g., if all the object files that influence CodeGen (that will later be linked into the stage1 compiler) result in a hash of 0xdeadbeef before and after the change to Parser.y, we know we can stop recompiling Data.List with the stage1 compiler when we see that the IR passed to CodeGen didn't change, because the last compile did CodeGen with a stage1 compiler with the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the function CodeGen stayed the same", so we can reuse its cached outputs.
Of course, that is utopic without a tool that does the "taint analysis" of which modules in GHC influence CodeGen. Probably just including all the transitive dependencies of GHC.CmmToAsm suffices, but probably that's too crude already. For another example, a change to GHC.Utils.Unique would probably entail a full rebuild of the compiler because it basically affects all compiler phases.
There are probably parallels with recompilation avoidance in a language with staged meta-programming.
Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs
mailto:ghc-devs@haskell.org>: Doing "optimistic caching" like you suggest sounds very promising. A way to regain more robustness would be as follows.
If the build fails while building the libraries or the stage2 compiler, this might be a false negative due to the optimistic caching. Therefore, evict the "optimistic caches" and restart building the libraries. That way we can validate that the build failure was a true build failure and not just due to the aggressive caching scheme.
Just my 2p
Josef
------------------------------------------------------------------------
*From:* ghc-devs
mailto:ghc-devs-bounces@haskell.org> on behalf of Simon Peyton Jones via ghc-devs mailto:ghc-devs@haskell.org> *Sent:* Friday, February 19, 2021 8:57 AM *To:* John Ericson mailto:john.ericson@obsidian.systems>; ghc-devs mailto:ghc-devs@haskell.org> *Subject:* RE: On CI 1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Nm6vfgGLLlJpiGa8XKxI6kNkBetp8ZZLPZS8hF%2BydrM%3D&reserved=0 tracks this more or less.
I don’t get this. We have to build GHC before we can test it, don’t we?
2 . We don't cache between jobs.
This is, I think, the big one. We endlessly build the exact same binaries.
There is a problem, though. If we make **any** change in GHC, even a trivial refactoring, its binary will change slightly. So now any caching build system will assume that anything built by that GHC must be rebuilt – we can’t use the cached version. That includes all the libraries and the stage2 compiler. So caching can save all the preliminaries (building the initial Cabal, and large chunk of stage1, since they are built with the same bootstrap compiler) but after that we are dead.
I don’t know any robust way out of this. That small change in the source code of GHC might be trivial refactoring, or it might introduce a critical mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut corners. (Leaving Marge to do full diligence.) For example, we could declare that if we have the result of compiling library module X.hs with the stage1 GHC in the last full commit in master, then we can re-use that build product rather than compiling X.hs with the MR’s slightly modified stage1 GHC. That **might** be wrong; but it’s usually right.
Anyway, there are big wins to be had here.
Simon
*From:*ghc-devs
mailto:ghc-devs-bounces@haskell.org> *On Behalf Of *John Ericson *Sent:* 19 February 2021 03:19 *To:* ghc-devs mailto:ghc-devs@haskell.org> *Subject:* Re: On CI I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling.
Before getting into these complex trade-offs, I think we should focus on the cornerstone issue that CI isn't incremental.
1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues/13897 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691150320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KlQGW1amK%2BtlRTGl4cDgMyl%2Bfz17fuUAHFNAaNXbzZI%3D&reserved=0 tracks this more or less. 2. We don't cache between jobs. Shake and Make do not enforce dependency soundness, nor cache-correctness when the build plan itself changes, and this had made this hard/impossible to do safely. Naively this only helps with stage 1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 builds, both can be incremental. Yes, this is also lossy, but I only see it leading to false failures not false acceptances (if we can also test the stage 1 one), so I consider it safe. MRs that only work with a slow full build because ABI can so indicate.
The second, main part is quite hard to tackle, but I strongly believe incrementality is what we need most, and what we should remain focused on.
John
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org mailto:ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691170308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Yrob9grqAWOxZnFXcM%2BZ60VNsrhIejcmwkSIR3Wq0gA%3D&reserved=0
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (8)
-
Ben Gamari
-
John Ericson
-
Josef Svenningsson
-
Moritz Angermann
-
Richard Eisenberg
-
Sebastian Graf
-
Simon Peyton Jones
-
Spiwack, Arnaud