Consistent CI failure in job nightly-i386-linux-deb9-validate

Hi all, For the past week or so, nightly-i386-linux-deb9-validate has been failing consistently. They show up on the failure dashboard because the logs contain the phrase "Cannot allocate memory". I haven't looked yet to see if they always fail in the same place, but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem. As a consequence of showing up on the dashboard, the jobs get restarted. Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard: https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2&from=now-90d&to=now&refresh=5m&var-types=cannot_allocate To prevent future problems, it would be good if someone could help me look into this. Otherwise I'll just disable the job. :(

Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an
i386 pipeline with it reversed to see if it solves the issue, in which
case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
Hi all,
For the past week or so, nightly-i386-linux-deb9-validate has been failing consistently.
They show up on the failure dashboard because the logs contain the phrase "Cannot allocate memory".
I haven't looked yet to see if they always fail in the same place, but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
As a consequence of showing up on the dashboard, the jobs get restarted. Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard:
To prevent future problems, it would be good if someone could help me look into this. Otherwise I'll just disable the job. :( _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Yep, it seems to mostly be xz that is running out of memory. (All recent
builds that I sampled, but not all builds through all time.) Thanks for
pointing it out!
I can revert the change.
On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao
Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
wrote: Hi all,
For the past week or so, nightly-i386-linux-deb9-validate has been
failing consistently.
They show up on the failure dashboard because the logs contain the
phrase "Cannot allocate memory".
I haven't looked yet to see if they always fail in the same place, but
I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
As a consequence of showing up on the dashboard, the jobs get restarted.
Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard:
To prevent future problems, it would be good if someone could help me
look into this. Otherwise I'll just disable the job. :(
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Sure, in which case pls revert it. Apologies for the impact, though
I'm still a bit curious, the i386 job did pass in the original MR.
On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter
Yep, it seems to mostly be xz that is running out of memory. (All recent builds that I sampled, but not all builds through all time.) Thanks for pointing it out!
I can revert the change.
On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao
wrote: Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
wrote: Hi all,
For the past week or so, nightly-i386-linux-deb9-validate has been failing consistently.
They show up on the failure dashboard because the logs contain the phrase "Cannot allocate memory".
I haven't looked yet to see if they always fail in the same place, but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
As a consequence of showing up on the dashboard, the jobs get restarted. Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard:
To prevent future problems, it would be good if someone could help me look into this. Otherwise I'll just disable the job. :( _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Aha: while i386-linux-deb9-validate sets no extra XZ options,
*nightly*-i386-linux-deb9-validate
(the failing job) sets "XZ_OPT = 9".
A revert would fix the problem, but presumably so would tweaking that
option. Does anyone have information that would lead to a better decision
here?
On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao
Sure, in which case pls revert it. Apologies for the impact, though I'm still a bit curious, the i386 job did pass in the original MR.
On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter
wrote: Yep, it seems to mostly be xz that is running out of memory. (All recent
builds that I sampled, but not all builds through all time.) Thanks for pointing it out!
I can revert the change.
On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao
wrote: Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
wrote: Hi all,
For the past week or so, nightly-i386-linux-deb9-validate has been
They show up on the failure dashboard because the logs contain the
I haven't looked yet to see if they always fail in the same place,
but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
As a consequence of showing up on the dashboard, the jobs get
restarted. Since they fail consistently, they keep getting restarted. Since
failing consistently. phrase "Cannot allocate memory". problem. the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard:
To prevent future problems, it would be good if someone could help me
look into this. Otherwise I'll just disable the job. :(
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I believe we can either modify ci.sh to disable parallel compression
for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable
XZ_OPT=-9 for i386.
On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter
Aha: while i386-linux-deb9-validate sets no extra XZ options, nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
A revert would fix the problem, but presumably so would tweaking that option. Does anyone have information that would lead to a better decision here?
On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao
wrote: Sure, in which case pls revert it. Apologies for the impact, though I'm still a bit curious, the i386 job did pass in the original MR.
On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter
wrote: Yep, it seems to mostly be xz that is running out of memory. (All recent builds that I sampled, but not all builds through all time.) Thanks for pointing it out!
I can revert the change.
On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao
wrote: Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
wrote: Hi all,
For the past week or so, nightly-i386-linux-deb9-validate has been failing consistently.
They show up on the failure dashboard because the logs contain the phrase "Cannot allocate memory".
I haven't looked yet to see if they always fail in the same place, but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem.
As a consequence of showing up on the dashboard, the jobs get restarted. Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard:
To prevent future problems, it would be good if someone could help me look into this. Otherwise I'll just disable the job. :( _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Matthew pointed out that the build system already parallelizes jobs, so
it's risky to force parallelization of any individual job. That means I
should just revert.
On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao
I believe we can either modify ci.sh to disable parallel compression for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable XZ_OPT=-9 for i386.
On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter
wrote: Aha: while i386-linux-deb9-validate sets no extra XZ options,
nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
A revert would fix the problem, but presumably so would tweaking that
option. Does anyone have information that would lead to a better decision here?
On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao
wrote: Sure, in which case pls revert it. Apologies for the impact, though I'm still a bit curious, the i386 job did pass in the original MR.
On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter
Yep, it seems to mostly be xz that is running out of memory. (All
recent builds that I sampled, but not all builds through all time.) Thanks for pointing it out!
I can revert the change.
On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao
wrote:
Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in
which
case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
wrote: Hi all,
For the past week or so, nightly-i386-linux-deb9-validate has been
failing consistently.
They show up on the failure dashboard because the logs contain the
I haven't looked yet to see if they always fail in the same place,
but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the
As a consequence of showing up on the dashboard, the jobs get
restarted. Since they fail consistently, they keep getting restarted. Since
wrote: phrase "Cannot allocate memory". problem. the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard:
To prevent future problems, it would be good if someone could help
me look into this. Otherwise I'll just disable the job. :(
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

When hadrian builds the binary-dist job, invoking tar and xz is
already the last step and there'll be no other ongoing jobs. But I do
agree with reverting, this minor optimization I proposed has caused
more trouble than its worth :/
On Thu, Sep 29, 2022 at 9:25 AM Bryan Richter
Matthew pointed out that the build system already parallelizes jobs, so it's risky to force parallelization of any individual job. That means I should just revert.
On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao
wrote: I believe we can either modify ci.sh to disable parallel compression for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable XZ_OPT=-9 for i386.
On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter
wrote: Aha: while i386-linux-deb9-validate sets no extra XZ options, nightly-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9".
A revert would fix the problem, but presumably so would tweaking that option. Does anyone have information that would lead to a better decision here?
On Wed, Sep 28, 2022 at 2:02 PM Cheng Shao
wrote: Sure, in which case pls revert it. Apologies for the impact, though I'm still a bit curious, the i386 job did pass in the original MR.
On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter
wrote: Yep, it seems to mostly be xz that is running out of memory. (All recent builds that I sampled, but not all builds through all time.) Thanks for pointing it out!
I can revert the change.
On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao
wrote: Hi Bryan,
This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master?
On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs
wrote: > > Hi all, > > For the past week or so, nightly-i386-linux-deb9-validate has been failing consistently. > > They show up on the failure dashboard because the logs contain the phrase "Cannot allocate memory". > > I haven't looked yet to see if they always fail in the same place, but I'll do that soon. The first example I looked at, however, has the line "xz: (stdin): Cannot allocate memory", so it's not GHC (alone) causing the problem. > > As a consequence of showing up on the dashboard, the jobs get restarted. Since they fail consistently, they keep getting restarted. Since the jobs keep getting restarted, the pipelines stay alive. When I checked just now, there were 8 nightly runs still running. :) Thus I'm going to cancel the still-running nightly-i386-linux-deb9-validate jobs and let the pipelines die in peace. You can still find all examples of failed jobs on the dashboard: > > https://grafana.gitlab.haskell.org/d/167r9v6nk/ci-spurious-failures?orgId=2&from=now-90d&to=now&refresh=5m&var-types=cannot_allocate > > To prevent future problems, it would be good if someone could help me look into this. Otherwise I'll just disable the job. :( > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (2)
-
Bryan Richter
-
Cheng Shao