I’m running into challenges getting AWX scaled up in Kubernetes to higher instance counts, specifically that it is overwhelming our PostgreSQL 9.6 external database with connections. Our target instance count is 16, and we’re seeing PostgreSQL connection counts climbing the hundreds until we hit max_connections. With PostgreSQL being so heavy connection-wise, we’re trying to keep the number of connections to a reasonable number.
Since the majority of these connections seem short-lived and idle, I’ve approached the issue from the angle of re-using short-lived connections or creating persistent connections.
For re-using connections, I’ve tried multiplexing connections via pgbouncer. In session pool_mode, there doesn’t seem to be much change other than AWX creating more and more connections until instances get queued instead of dropped entirely. In transaction pool_mode, the cluster heartbeat tasks intermittently fail due to the use of advisory locks, so it seems the usual methods of multiplexing connections are non-starters.
I’ve also tried using persistent connections (CONN_MAX_AGE: None) and closing connections immediately (CONN_MAX_AGE: 0) in combination with session pool_mode, but I’m not coming across a configuration that prevents connections from continuing to grow and grow until they either queue or are refused.
Are there any strategies for scaling that can keep the number of AWX connections sustainable and low? Anything that Tower does under-the-hood that could be applicable?