I am not advocating to drop perf tests during merge requests, I just want them to not be fatal for marge batches. Yes this means that a bunch of unrelated merge requests all could be fine wrt to the perf checks per merge request, but the aggregate might fail perf. And then subsequently the next MR against the merged aggregate will start failing. Even that is a pretty bad situation imo.

I honestly don't have a good answer, I just see marge work on batches, over and over and over again, just to fail. Eventually marge should figure out a subset of the merges that fit into the perf window, but that might be after 10 tries? So after up to ~30+hours?, which means there won't be any merge request landing in GHC for 30hs. I find that rather unacceptable.

I think we need better visualisation of perf regressions that happen on master. Ben has some wip for this, and I think John said there might be some way to add a nice (maybe reflex) ui to it. If we can see regressions on master easily, and go from "ohh this point in time GHC got worse", to "this is the commit". We might be able to figure it out.

But what do we expect of patch authors? Right now if five people write patches to GHC, and each of them eventually manage to get their MRs green, after a long review, they finally see it assigned to marge, and then it starts failing? Their patch on its own was fine, but their aggregate with other people's code leads to regressions? So we now expect all patch authors together to try to figure out what happened? Figuring out why something regressed is hard enough, and we only have a very few people who are actually capable of debugging this. Thus I believe it would end up with Ben, Andreas, Matthiew, Simon, ... or someone else from GHC HQ anyway to figure out why it regressed, be it in the Review Stage, or dissecting a marge aggregate, or on master.

Thus I believe in most cases we'd have to look at the regressions anyway, and right now we just convolutedly make working on GHC a rather depressing job. Increasing the barrier to entry by also requiring everyone to have absolutely stellar perf regression skills is quite a challenge.

There is also the question of our synthetic benchmarks actually measuring real world performance? Do the micro benchmarks translate to the same regressions in say building aeson, vector or Cabal? The latter being what most practitioners care about more than the micro benchmarks.

Again, I'm absolutely not in favour of GHC regressing, it's slow enough as it is. I just think CI should be assisting us and not holding development back.

Cheers,

Moritz

On Wed, Mar 17, 2021 at 5:54 PM Spiwack, Arnaud <arnaud.spiwack@tweag.io> wrote:

Ah, so it was really two identical pipelines (one for the branch where Margebot batches commits, and one for the MR that Margebot creates before merging). That's indeed a non-trivial amount of purely wasted computer-hours.

Taking a step back, I am inclined to agree with the proposal of not checking stat regressions in Margebot. My high-level opinion on this is that perf tests don't actually test the right thing. Namely, they don't prevent performance drift over time (if a given test is allowed to degrade by 2% every commit, it can take a 100% performance hit in just 35 commits). While it is important to measure performance, and to avoid too egregious performance degradation in a given commit, it's usually performance over time which matters. I don't really know how to apply it to collaborative development, and help maintain healthy performance. But flagging performance regressions in MRs, while not making them block batched merges sounds like a reasonable compromise.

On Wed, Mar 17, 2021 at 9:34 AM Moritz Angermann <moritz.angermann@gmail.com> wrote:
*why* is a very good question. The MR fixing it is here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275

On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud <arnaud.spiwack@tweag.io> wrote:
Then I have a question: why are there two pipelines running on each merge batch?

On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann <moritz.angermann@gmail.com> wrote:
No it wasn't. It was about the stat failures described in the next paragraph. I could have been more clear about that. My apologies!

On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud <arnaud.spiwack@tweag.io> wrote:

and if either of both (see below) failed, marge's merge would fail as well.

Re: “see below” is this referring to a missing part of your email?