Ansible Community Status page & Notifications

During Ansible Contributor Summit 2026 we talked about the recent Galaxy downtime and how we can do a better job at communicating.

While (as most of us have been sysadmins at somepoint) folks recognise that systems do fail over, though Red Hat needs to be communication earlier and more consistently when issues do happen, updates during the incident and as we all geeks, some details on the root cause analysis.

There was some good discussion about ensuring the signal-to-noise ratio is correct, the initial ideas were:

  • New Forum tag that people can subscribe to get email, possibly critical-service-status
  • New Status page for Ansible Community

Status page

Some of the folks in the room had used Gatus or Uptime Kuma.

Reporting status

We talked about how we can report status, some ideas included

  • Dedicated email status people can subscribe to
  • Notifications in Matrix (IRC), Forum
  • Ansible Forum Banner

Monitoring

Help needed

  • What would you like to see?
  • Have you setup a similar system, what worked?
  • Which of these systems can be mostly (fully) be configured via a Git repo
  • Which allow a test/development branches so we can easily allow the community to test updated monitors & reporting.
3 Likes

Gatus could be an awesome idea. It’s configuration driven via a YAML file: GitHub - TwiN/gatus: Automated developer-oriented status page with alerting and incident support

Maybe an idea could be to have the configuration in a community repository on GitHub? If any new community endpoints need monitoring then this could be done here? Community members could raise a PR to do this and a GitHub Action could run to update the config on the host running Gatus.

1 Like

Some additional thought we didn’t touch on:

We heavily discussed a workflow of status page → forum → community member, I think this is a good way but there should also be other ways to consume the status page in case other parts besides galaxy are not available

Therefore:

  • Include other parts of community infrastructure on the status page
    • forum
    • docs
    • matrix
    • probably other things that don’t come to my mind
  • have additional communication channels (e.g. directly subscribe to mails from the status page)
  • SEO so that people find the status page in case of a larger blackout of default communication channels

I’m happy to help on this topic or be part of a beta users group :slight_smile:

2 Likes

Excellent suggestion (as someone who also notices galaxy being down early).

Please also consider:

  • adding important/crucial “external” dependencies such as (for example): Community managed Ansible repositories ¡ GitHub
  • hosting the status page on another system as where the current forum/docs site are hosted
  • linking to the eventually chosen status page everywhere, so everyone has the same single pane of glass
  • any FOSS technology is fine (we care about the result, a status page), because I assume every techy will have different (tool) preferences
  • having a status page and having ‘more information’ about it (including post-mortems) on the forum, is fine (imho)

Question:

  • Can or should we assume, that if something is down, the people who can act on it, have already been informed?
1 Like

I think that the forum evolved into quite an important part of the Ansible Community. So a status page should cover it, too, because it also can be down. But this would somehow rule out a forum banner. If we also want to cover the status of the forum, a separate and dedicated system would be needed.

Adding a forum banner additionally when other important systems are down doesn’t hurt and can even be helpful. This banner could link to the status page when anything but the forum is down, and like this make people aware of it and advertise it. And if the forum is down, at least some will hopefully remember the status page and have a look there.