Setup:
- AWX deployed in Kubernetes using awx-operator (in Region1 and Region2).
 - Both AWX deployments use a shared PostgreSQL cluster deployed across both regions.
 - Database cluster details:
- DB1 (Region1)
 - DB2 (Region1)
 - DB3 (Region2)
 - PGAF (Postgres Auto Failover) in Region2
 
 - DBs are configured so that:
- Only one DB is Primary (read-write) at any time.
 - The other two are Standby (read-only) and in sync.
 - A cron job runs every minute on all DB nodes to promote a new primary during failover/switchover.
 
 - AWX connects to the database through pgpool on port 
51902 
Observed behavior:
- During failover from DB2 → DB1:
- New DB becomes Primary within ~59s.
 - AWX successfully reconnects in ~1 minute. 

 
 - During failover from DB1 → DB2:
- New DB becomes Primary within ~59s.
 - But AWX takes ~15 minutes or more to detect and reconnect to the new primary. 

 
 
Troubleshooting / Attempts so far:
- Specified all three DB IPs in AWX connection string → no improvement.
 - Set 
PGCONNECT_TIMEOUT=10→ no improvement. - Manually restarted AWX deployment pods (
rollout restart) → issue still persists. 
Problem:
Failover detection is inconsistent. AWX reconnects quickly in one direction (DB2 → DB1) but takes ~15 minutes in the other direction (DB1 → DB2).
Ask:
Has anyone seen similar behavior with AWX and PostgreSQL failover (with pgpool/PGAF)?
- Why might AWX detect failover faster in one direction but not the other?
 - Are there recommended AWX/Postgres/pgpool settings to improve failover detection and reconnection times?