Ansible Community Status page & Notifications

During Ansible Contributor Summit 2026 we talked about the recent Galaxy downtime and how we can do a better job at communicating.

While (as most of us have been sysadmins at somepoint) folks recognise that systems do fail over, though Red Hat needs to be communication earlier and more consistently when issues do happen, updates during the incident and as we all geeks, some details on the root cause analysis.

There was some good discussion about ensuring the signal-to-noise ratio is correct, the initial ideas were:

  • New Forum tag that people can subscribe to get email, possibly critical-service-status
  • New Status page for Ansible Community

Status page

Some of the folks in the room had used Gatus or Uptime Kuma.

Reporting status

We talked about how we can report status, some ideas included

  • Dedicated email status people can subscribe to
  • Notifications in Matrix (IRC), Forum
  • Ansible Forum Banner

Monitoring

Help needed

  • What would you like to see?
  • Have you setup a similar system, what worked?
  • Which of these systems can be mostly (fully) be configured via a Git repo
  • Which allow a test/development branches so we can easily allow the community to test updated monitors & reporting.
4 Likes

Gatus could be an awesome idea. It’s configuration driven via a YAML file: GitHub - TwiN/gatus: Automated developer-oriented status page with alerting and incident support

Maybe an idea could be to have the configuration in a community repository on GitHub? If any new community endpoints need monitoring then this could be done here? Community members could raise a PR to do this and a GitHub Action could run to update the config on the host running Gatus.

1 Like

Some additional thought we didn’t touch on:

We heavily discussed a workflow of status page → forum → community member, I think this is a good way but there should also be other ways to consume the status page in case other parts besides galaxy are not available

Therefore:

  • Include other parts of community infrastructure on the status page
    • forum
    • docs
    • matrix
    • probably other things that don’t come to my mind
  • have additional communication channels (e.g. directly subscribe to mails from the status page)
  • SEO so that people find the status page in case of a larger blackout of default communication channels

I’m happy to help on this topic or be part of a beta users group :slight_smile:

2 Likes

Excellent suggestion (as someone who also notices galaxy being down early).

Please also consider:

  • adding important/crucial ā€œexternalā€ dependencies such as (for example): Community managed Ansible repositories Ā· GitHub
  • hosting the status page on another system as where the current forum/docs site are hosted
  • linking to the eventually chosen status page everywhere, so everyone has the same single pane of glass
  • any FOSS technology is fine (we care about the result, a status page), because I assume every techy will have different (tool) preferences
  • having a status page and having ā€˜more information’ about it (including post-mortems) on the forum, is fine (imho)

Question:

  • Can or should we assume, that if something is down, the people who can act on it, have already been informed?
1 Like

I think that the forum evolved into quite an important part of the Ansible Community. So a status page should cover it, too, because it also can be down. But this would somehow rule out a forum banner. If we also want to cover the status of the forum, a separate and dedicated system would be needed.

Adding a forum banner additionally when other important systems are down doesn’t hurt and can even be helpful. This banner could link to the status page when anything but the forum is down, and like this make people aware of it and advertise it. And if the forum is down, at least some will hopefully remember the status page and have a look there.

3 Likes

@mariolenz You are correct. I’m thinking a a new status.ansible.com which doesn’t use the same hosting/infrastructure as The Forum (or any other Community infrastructure), I’m thinking that if there is a service degradation, then we will (automatically, or manually) add a Form Banner.

1 Like

I’d suggest using a different TLD from ansible.com, in case that is down, perhaps RedHat has some spare ones that would be suitable? If not ansible.website is available (Ā£2.99 at Gandi.net for the first year), status.ansible.website might work? Best using a different Registrar and DNS from ansible.com also…

3 Likes

Sites supporting development, CI and releases:

  • zuul
  • quay.io (or at least a link to their status page, if one exists)
  • github (link to their status page)
  • pypi.org
  • what else?

(kick the ball to the other side of the court)

3 Likes

I think linking to the status pages of related projects would be awesome.

I don’t know about the Zuul instance, but the other projects have their own status pages:

Other things that could be monitored (or linked to):

  • AZP (CI for ansible-core and some collections)
  • Ansible’s internal testing infrastructure that is used by ansible-test when handling VMs

AZP likely has its own status page as well, and the other might be too specialized (not sure how many folks outside Red Hat have access to it anyway - also I don’t remember it having problems so far, as opposed to many other things :wink: ).

3 Likes

That’s a good idea @chris

Would need to check with the team but I think we have control of the ansible.community TLD. Could be a good use case for that domain?

status.ansible.community :slightly_smiling_face:

2 Likes

If we decided on Gatus then we could have buttons on the top of the status page linking to other status pages you mentioned here.

Here is a quick mockup of what it could look like with Gatus and a couple of endpoints:

Dark Mode

Light Mode

There isn’t a Discourse alerting provider right now but we could create and contribute one to the project :slightly_smiling_face:

3 Likes

I’ll need to double check, though from memory
ansible.com DNS is managed by Cloudfair. If that’s down it’s fairly likely a large chunk of the Internet is as well.

Red Hat will happily host an EC2 for this, and once the prototype is working, we could look at high(er) available with shared DB, etc if we feel that’s useful.

That’s the power of open source right there.

Nice, thanks for that and confirming we can link to other status pages.

1 Like

We should also add monitoring for the Forum badge granter (used at in-person events and meetups)

2 Likes

That reminds me that there’s also ansibot (ansible-core) and ansibullbot (collections) that could be monitored.

2 Likes

Also open_polls and the GitHub stats (once that’s back online)

FYI there’s a relevant issue in Galaxy here: Galaxy Status Page? Ā· Issue #3508 Ā· ansible/galaxy Ā· GitHub I’ve commented in that thread with a link to the forum. Maybe we’ll get a few more folks to join the discussion.

3 Likes

It’s pretty simplistic, but just so you all are aware there is an existing community provided status tracker here:

2 Likes