Gitlab currently throwing a lot of 502s - ghc-devs - Haskell.org

Gitlab currently throwing a lot of 502s

older
Short emergency gitlab downtime to...

Magnus Viernickel

20 Apr 2026 20 Apr '26

6:52 p.m.

Hello GHC devs, we're seeing a lot of 502s at the moment, we're getting twice the rate of requests than normal, I am investigating what is going on and trying to fix this. Best Magnus

Reply

Sign in to reply online Use email software

Show replies by date

Andreas Klebinger

20 Apr 20 Apr

11:15 p.m.

Hello Devs, just a short update and some background about Gitlab issues today. Over the week and in particular today the number of requests to our GitLab instance has grown steadily, it now receiving way more than doubled the number of requests it received last week. We have tried to solve this by adjusting some GitLab/server settings to deal with this (rate limits, checking for spammy IPs, the works) but with little success so far since it seems that the requests are pretty distributed across their origins. Given it's not garbage requests but for valid endpoints we assume it's "just" crawlers (likely driven by the AI hype) indexing our instance. But while this isn't a DoS attack in it's totality the effect similar and it's unclear when we will make progress on improving the situation. Magnus is working towards resolving this issue and we will let you know. Cheers Andreas On 20/04/2026 10:52, Magnus Viernickel via ghc-devs wrote:

Hello GHC devs,

we're seeing a lot of 502s at the moment, we're getting twice the rate of requests than normal, I am investigating what is going on and trying to fix this.

Best

Magnus

_______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

Reply

Sign in to reply online Use email software

Magnus Viernickel

21 Apr 21 Apr

7:02 p.m.

Hello GHC devs, The gitlab should be fine again, it looks like the spammers have left on their own. Here is my analysis of the problem. If you have an idea on what to do or are against my proposed "solution", please say so. 1. We had a much higher load than usual, leading to congestion on the workers. 2. No matter what, we could not have added new workers, since either they would get reaped via gitlabs normal worker reaping or they would be OOMkilled when the entire application exceeded available RAM 3. Gitlab's own reverse proxy (yes, we have two) talks to the web workers via a unix domain socket. 4. When we hit a certain load, the workers could not take requests from the queue quickly enough 5. instead of stopping, the reverse proxy kept hammering the the queue 6. When the queue didn't fill up, you would just get a lot of client cancels after a couple seconds 7. When the queue *did* fill up, the gitlab would start throwing "service unavailable" errors *and immediately return with a 502* 8. Now a vicious cycle begins: the clients (the spammers) would *immediately* resubmit their request instead of timing out, which kept the queue congested. The only fix is to restart the gitlab, which of course clears up the queue until it becomes congested again, from which point on it kept being congested. (which is why I had a couple of situations where I went "oh I fixed it" but then after half an hour and a spike of traffic later, we were back to 502s) The conclusion here is much sadder though: 1. I did an analysis of the origin and locked out a couple of bad /24 subnets but they only made up a tiny fraction of the traffic -- overall, I could not identify "big bad subnets" 2. delegated blocks came from random (as far as cloud providers are random given their are only three of them, we could e.g. see AWS) cloud providers. This is expected, as bad actors of course wouldn't have their own infra which would be pretty easy to ban wholesale 3. This is why IP address based blocking (i.e. rate limiting) did not work, which we're doing on the gitlab level (per IP) and on the reverse proxy level (for /24s) (our nginx, not gitlab's own reverse proxy). 4. The only thing that would really help is more workers, which also isn't a great fix. They could just hammer with more requests *and* the workers are taking up an absurd amount of RAM (currently about 50GB, every worker is 1-2 more GB, so convervatively one worker is 2 GB of ram extra that we need) 5. Or to reintroduce anubis or some tarpit on the expensive routes. Maybe this is one of the easiest fixes. We would of course only have this for unauthenticated requests and on expensive routes. The worst offender being graphql requests. Best Magnus On 4/20/26 15:15, Andreas Klebinger via ghc-devs wrote:

Hello Devs,

just a short update and some background about Gitlab issues today.

Over the week and in particular today the number of requests to our GitLab instance has grown steadily, it now receiving way more than doubled the number of requests it received last week.

We have tried to solve this by adjusting some GitLab/server settings to deal with this (rate limits, checking for spammy IPs, the works) but with little success so far since it seems that the requests are pretty distributed across their origins.

Given it's not garbage requests but for valid endpoints we assume it's "just" crawlers (likely driven by the AI hype) indexing our instance. But while this isn't a DoS attack in it's totality the effect similar and it's unclear when we will make progress on improving the situation.

Magnus is working towards resolving this issue and we will let you know.

Cheers Andreas

On 20/04/2026 10:52, Magnus Viernickel via ghc-devs wrote:

...
Hello GHC devs,

we're seeing a lot of 502s at the moment, we're getting twice the rate of requests than normal, I am investigating what is going on and trying to fix this.

Best

Magnus

_______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

Reply

Sign in to reply online Use email software

Simon Peyton Jones

23 Apr 23 Apr

8:45 p.m.

I'm still getting 20-second load times for individual pages (e.g. tickets) on gitlab.haskell.org. Sometimes they errror out with "something went wrong, try again". Are we stuck with what to do? It's a bit frustrating. Simon On Mon, 20 Apr 2026 at 14:15, Andreas Klebinger via ghc-devs < ghc-devs@haskell.org> wrote:

Hello Devs,

just a short update and some background about Gitlab issues today.

Over the week and in particular today the number of requests to our GitLab instance has grown steadily, it now receiving way more than doubled the number of requests it received last week.

We have tried to solve this by adjusting some GitLab/server settings to deal with this (rate limits, checking for spammy IPs, the works) but with little success so far since it seems that the requests are pretty distributed across their origins.

Given it's not garbage requests but for valid endpoints we assume it's "just" crawlers (likely driven by the AI hype) indexing our instance. But while this isn't a DoS attack in it's totality the effect similar and it's unclear when we will make progress on improving the situation.

Magnus is working towards resolving this issue and we will let you know.

Cheers Andreas

On 20/04/2026 10:52, Magnus Viernickel via ghc-devs wrote:

...
Hello GHC devs,

we're seeing a lot of 502s at the moment, we're getting twice the rate of requests than normal, I am investigating what is going on and trying to fix this.

Best

Magnus

_______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

Reply

Sign in to reply online Use email software

Magnus Viernickel

9:48 p.m.

Hi Simon, since 12:00 we're getting another wave of spam. It's really hard to do something useful here, I will try more and more things but they will degrade UX for non-logged in users more and more. I'm prioritising UX for logged in users though. I'm unsure what to do to actually "solve" this issue. We're not seeing high traffic from singular IPs, we can't actually require log in for everything that people would like to do on the gitlab and gitlab itself is buggy in that it doesn't degrade well under load (now that I have fixed most of the other performance bottlenecks). I agree it's really frustrating and it's eating away at my personal time budget of working on actual GHC issues. Best Magnus On 4/23/26 12:45, Simon Peyton Jones via ghc-devs wrote:

I'm still getting 20-second load times for individual pages (e.g. tickets) on gitlab.haskell.org http://gitlab.haskell.org.

Sometimes they errror out with "something went wrong, try again".

Are we stuck with what to do? It's a bit frustrating.

Simon

On Mon, 20 Apr 2026 at 14:15, Andreas Klebinger via ghc-devs wrote:

Hello Devs,

just a short update and some background about Gitlab issues today.

Over the week and in particular today the number of requests to our GitLab instance has grown steadily, it now receiving way more than doubled the number of requests it received last week.

We have tried to solve this by adjusting some GitLab/server settings to deal with this (rate limits, checking for spammy IPs, the works) but with little success so far since it seems that the requests are pretty distributed across their origins.

Given it's not garbage requests but for valid endpoints we assume it's "just" crawlers (likely driven by the AI hype) indexing our instance. But while this isn't a DoS attack in it's totality the effect similar and it's unclear when we will make progress on improving the situation.

Magnus is working towards resolving this issue and we will let you know.

Cheers Andreas

On 20/04/2026 10:52, Magnus Viernickel via ghc-devs wrote: > Hello GHC devs, > > we're seeing a lot of 502s at the moment, we're getting twice the rate > of requests than normal, I am investigating what is going on and > trying to fix this. > > Best > > Magnus > > _______________________________________________ > ghc-devs mailing list -- ghc-devs@haskell.org > To unsubscribe send an email to ghc-devs-leave@haskell.org _______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

_______________________________________________ ghc-devs mailing list --ghc-devs@haskell.org To unsubscribe send an email toghc-devs-leave@haskell.org

Reply

Sign in to reply online Use email software

Tom Ellis

9:58 p.m.

I'm not sure if this idea is any good, but: would it be possible and desirable to have "public access" to the GHC GitLab go to a mirror, and have authentication required on all access to the "live" instance that actually serves developers? (This would require the live service to be behind a different domain name, and would be a little bit like hand-rolling a special purpose CDN.) Tom On Thu, Apr 23, 2026 at 01:48:32PM +0200, Magnus Viernickel via ghc-devs wrote:

since 12:00 we're getting another wave of spam. It's really hard to do something useful here, I will try more and more things but they will degrade UX for non-logged in users more and more.

I'm prioritising UX for logged in users though.

I'm unsure what to do to actually "solve" this issue. We're not seeing high traffic from singular IPs, we can't actually require log in for everything that people would like to do on the gitlab and gitlab itself is buggy in that it doesn't degrade well under load (now that I have fixed most of the other performance bottlenecks).

I agree it's really frustrating and it's eating away at my personal time budget of working on actual GHC issues.

On 4/23/26 12:45, Simon Peyton Jones via ghc-devs wrote:

...
I'm still getting 20-second load times for individual pages (e.g. tickets) on gitlab.haskell.org http://gitlab.haskell.org.

Sometimes they errror out with "something went wrong, try again".

Are we stuck with what to do? It's a bit frustrating.

Simon

On Mon, 20 Apr 2026 at 14:15, Andreas Klebinger via ghc-devs wrote:

Hello Devs,

just a short update and some background about Gitlab issues today.

Over the week and in particular today the number of requests to our GitLab instance has grown steadily, it now receiving way more than doubled the number of requests it received last week.

We have tried to solve this by adjusting some GitLab/server settings to deal with this (rate limits, checking for spammy IPs, the works) but with little success so far since it seems that the requests are pretty distributed across their origins.

Given it's not garbage requests but for valid endpoints we assume it's "just" crawlers (likely driven by the AI hype) indexing our instance. But while this isn't a DoS attack in it's totality the effect similar and it's unclear when we will make progress on improving the situation.

Magnus is working towards resolving this issue and we will let you know.

Cheers Andreas

On 20/04/2026 10:52, Magnus Viernickel via ghc-devs wrote: > Hello GHC devs, > > we're seeing a lot of 502s at the moment, we're getting twice the rate > of requests than normal, I am investigating what is going on and > trying to fix this.

Reply

Sign in to reply online Use email software

Magnus Viernickel

10:19 p.m.

Hi Tom First of all, thanks for trying to help! Here are some details, feel free to skip them. There are several problems with your idea: 1. Gitlab is not very cacheable. Requests e.g. go to /random-user-fork/ghc/commits/random-commit-hash. And then you have exponentially many diffs between those commits. This is just an example. Even if we could enumerate all of them, I don’t think any cache would like to keep them up for us. Additionally, most routes are not static. 2. we want users to be able to use the instance without needing to log in. That includes issues, MRs, etc. 3. it is a large amount of work that doesn’t really fix any issues. The actual issue is - we can’t reliably distinguish users and spammers - we do have a lot of non-GHC developer users - Gitlab is not as performant as we’d like (we’re running this on an VERY beefy machine) There are a few things in reach for us though: - re-enabling Anubis for unauthenticated usage (this was disabled for good reasons though) - tarpitting (I think this actually works well but people had concerns about whether this would be too „mean“) - moving some other services off the machine. The rate limiting route is exhausted, the machine performance route is also exhausted. Adding yet more RAM to the machine seems like it’s just a waste of resources at this point. Every 2GB ram gives us one more puma worker and we’re already running 36 of them. The machine also has an absurd amount of ram either way, certainly enough to serve the ~200 actual people who are at worst using the instance at the same time. Best Magnus

On 23. Apr 2026, at 13:58, Tom Ellis via ghc-devs wrote:

I'm not sure if this idea is any good, but: would it be possible and desirable to have "public access" to the GHC GitLab go to a mirror, and have authentication required on all access to the "live" instance that actually serves developers?

(This would require the live service to be behind a different domain name, and would be a little bit like hand-rolling a special purpose CDN.)

Tom

...
On Thu, Apr 23, 2026 at 01:48:32PM +0200, Magnus Viernickel via ghc-devs wrote: since 12:00 we're getting another wave of spam. It's really hard to do something useful here, I will try more and more things but they will degrade UX for non-logged in users more and more.

I'm prioritising UX for logged in users though.

I'm unsure what to do to actually "solve" this issue. We're not seeing high traffic from singular IPs, we can't actually require log in for everything that people would like to do on the gitlab and gitlab itself is buggy in that it doesn't degrade well under load (now that I have fixed most of the other performance bottlenecks).

I agree it's really frustrating and it's eating away at my personal time budget of working on actual GHC issues.

...
On 4/23/26 12:45, Simon Peyton Jones via ghc-devs wrote: I'm still getting 20-second load times for individual pages (e.g. tickets) on gitlab.haskell.org http://gitlab.haskell.org.

Sometimes they errror out with "something went wrong, try again".

Are we stuck with what to do? It's a bit frustrating.

Simon

On Mon, 20 Apr 2026 at 14:15, Andreas Klebinger via ghc-devs wrote:

Hello Devs,

just a short update and some background about Gitlab issues today.

Over the week and in particular today the number of requests to our GitLab instance has grown steadily, it now receiving way more than doubled the number of requests it received last week.

We have tried to solve this by adjusting some GitLab/server settings to deal with this (rate limits, checking for spammy IPs, the works) but with little success so far since it seems that the requests are pretty distributed across their origins.

Given it's not garbage requests but for valid endpoints we assume it's "just" crawlers (likely driven by the AI hype) indexing our instance. But while this isn't a DoS attack in it's totality the effect similar and it's unclear when we will make progress on improving the situation.

Magnus is working towards resolving this issue and we will let you know.

Cheers Andreas

...
Hello GHC devs,

we're seeing a lot of 502s at the moment, we're getting twice

On 20/04/2026 10:52, Magnus Viernickel via ghc-devs wrote: the rate

...
of requests than normal, I am investigating what is going on and trying to fix this.

ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org

Reply

Sign in to reply online Use email software

19

Age (days ago)

22

Last active (days ago)

Download

6 comments

4 participants

tags

participants (4)

Andreas Klebinger
Magnus Viernickel
Simon Peyton Jones
Tom Ellis