AWX taking ~>15 mins to reconnect after DB failover (multi-region PG cluster with pgpool)

fareedsk330 · September 15, 2025, 4:04pm

Setup:

AWX deployed in Kubernetes using awx-operator (in Region1 and Region2).
Both AWX deployments use a shared PostgreSQL cluster deployed across both regions.
Database cluster details:
- DB1 (Region1)
- DB2 (Region1)
- DB3 (Region2)
- PGAF (Postgres Auto Failover) in Region2
DBs are configured so that:
- Only one DB is Primary (read-write) at any time.
- The other two are Standby (read-only) and in sync.
- A cron job runs every minute on all DB nodes to promote a new primary during failover/switchover.
AWX connects to the database through pgpool on port 51902

Observed behavior:

During failover from DB2 → DB1:
- New DB becomes Primary within ~59s.
- AWX successfully reconnects in ~1 minute.
During failover from DB1 → DB2:
- New DB becomes Primary within ~59s.
- But AWX takes ~15 minutes or more to detect and reconnect to the new primary.

Troubleshooting / Attempts so far:

Specified all three DB IPs in AWX connection string → no improvement.
Set PGCONNECT_TIMEOUT=10 → no improvement.
Manually restarted AWX deployment pods (rollout restart) → issue still persists.

Problem:
Failover detection is inconsistent. AWX reconnects quickly in one direction (DB2 → DB1) but takes ~15 minutes in the other direction (DB1 → DB2).

Ask:
Has anyone seen similar behavior with AWX and PostgreSQL failover (with pgpool/PGAF)?

Why might AWX detect failover faster in one direction but not the other?
Are there recommended AWX/Postgres/pgpool settings to improve failover detection and reconnection times?

Topic		Replies	Views
External Postgres configured with Patroni AWX Project awx , kubernetes	2	29	July 19, 2019
AWX HA with Auto failover of DB AWX Project awx	6	44	November 6, 2019
Extremely slow performance when using AWX with external Postgres instance AWX Project awx	4	62	December 14, 2017
Installation on 2 kubernetes clusters with the same database AWX Project awx , kubernetes	2	73	January 13, 2023
Common database for AWX servers AWX Project awx	2	52	October 31, 2021

AWX taking ~>15 mins to reconnect after DB failover (multi-region PG cluster with pgpool)

Related topics