We are running AWX 19.3.0 deployed with AWX Operation 0.13.0 including Postgres. Recently we’ve been getting a lot of logs messages on the Postgres pod saying “FATAL: sorry, too many clients already”. The Postgres DB seems to be set with the default value of “max_connections” at 100. The value of “idle_in_transaction_session_timeout” is 0. We notice that over time the number of idle connections rises “select count(*) from pg_stat_activity where (state = ‘idle’);” and when it reached the value of “max_connections” AWX becomes inoperable. This behaviour can be correlated to the recent addition of several instance_groups, some of which are for API endpoints which are not (yet) accessible
Can the addition of an instance_group trigger a Postgres connection? Can such a connection be stuck in idle state?
Is it possible to increase the value of “max_connections” with AWX Operation 0.13.0?
Should the value of “idle_in_transaction_session_timeout” be something other than 0? If so how can this be set with AWX Operation 0.13.0?
Sorry that I am using a old thread for my query. But my issue is the same as above. I see a lot of idle connections from the dispatcher service and due to this I see frequent alerts for “could not receive data from client: Connection reset by peer”. We are using Postgresql database with awx.
Point to note is that due to this behavior I am seeing a rise in memory usage as every second we get this message in the logs.
My question is:
How can I stop this behavior and do not see so many frequent logs for “could not receive data from client: Connection reset by peer”?
Is this normal behavior of dispatcher service to create idle sessions every second?
I understand you are on 21.0.0 Have you considered upgrading? Significant work has been done regarding decreasing the number of idle connections that AWX keeps open in more recent releases, namely 21.6.0, though work has continued since then so upgrading to the latest release is always preferable.
Note, database connections are opened at the beginning of a job and at the end of the job to update status and other information, but the win is we don’t have a fixed number of database connections open for the whole duration of the job.