GitLab partial outage - attempting to mitigate

I am seeing a few different problems with the GitLab server right now. I am gonna try to mitigate the issues, so the server might be unavailable for a few short periods.

I eventually resorted to a server reboot, which cleared up all the problems
I was seeing. I think we're back in business.
Symptoms were:
* No new data coming into
https://grafana.gitlab.haskell.org/d/iiCppweMz/marge-bot?orgId=2&from=now-7d&to=now&refresh=30m
* High-frequency repetition of the system log message
"systemd-journald[1622008]: Failed to open runtime journal: Device or
resource busy"
* ~50% failure rate connecting to the server with ssh
None of those are happening anymore.
-Bryan
On Mon, 20 Mar 2023 at 14:41, Bryan Richter
I am seeing a few different problems with the GitLab server right now. I am gonna try to mitigate the issues, so the server might be unavailable for a few short periods.

Bryan Richter via ghc-devs
I eventually resorted to a server reboot, which cleared up all the problems I was seeing. I think we're back in business.
The root partition was close to running out of disk space yesterday. The problem appears to be that /nix is located on the small system drive. We should really address this although moving /nix is sadly not easy and will certainly require downtime. Cheers, - Ben

Isn't it just "move /nix out of the way, bind mount a new one from a
larger drive, use rsync to move the data"?
On Mon, Mar 20, 2023 at 9:25 AM Ben Gamari
Bryan Richter via ghc-devs
writes: I eventually resorted to a server reboot, which cleared up all the problems I was seeing. I think we're back in business.
The root partition was close to running out of disk space yesterday. The problem appears to be that /nix is located on the small system drive. We should really address this although moving /nix is sadly not easy and will certainly require downtime.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- brandon s allbery kf8nh allbery.b@gmail.com

Brandon Allbery
Isn't it just "move /nix out of the way, bind mount a new one from a larger drive, use rsync to move the data"?
Something like that, yes [1]. Cheers, - Ben [1] https://nixos.wiki/wiki/Storage_optimization#Moving_the_store

Ben Gamari
Bryan Richter via ghc-devs
writes: I eventually resorted to a server reboot, which cleared up all the problems I was seeing. I think we're back in business.
The root partition was close to running out of disk space yesterday. The problem appears to be that /nix is located on the small system drive. We should really address this although moving /nix is sadly not easy and will certainly require downtime.
In the meantime, I have significantly reduced the number of snapshots retained in the root dataset. This brought disk usage down from 70% to 12%, which should keep us afloat for a long while. Cheers, - Ben
participants (3)
-
Ben Gamari
-
Brandon Allbery
-
Bryan Richter