Setting max_connections in Postgres deployed through AWX operator 0.13.0 with AWX 19.3.0

Hi All,

We are running AWX 19.3.0 deployed with AWX Operation 0.13.0 including Postgres. Recently we’ve been getting a lot of logs messages on the Postgres pod saying “FATAL: sorry, too many clients already”. The Postgres DB seems to be set with the default value of “max_connections” at 100. The value of “idle_in_transaction_session_timeout” is 0. We notice that over time the number of idle connections rises “select count(*) from pg_stat_activity where (state = ‘idle’);” and when it reached the value of “max_connections” AWX becomes inoperable. This behaviour can be correlated to the recent addition of several instance_groups, some of which are for API endpoints which are not (yet) accessible

  1. Can the addition of an instance_group trigger a Postgres connection? Can such a connection be stuck in idle state?
  2. Is it possible to increase the value of “max_connections” with AWX Operation 0.13.0?
  3. Should the value of “idle_in_transaction_session_timeout” be something other than 0? If so how can this be set with AWX Operation 0.13.0?

Best regards

Rod

Hi All,

Following up on my own post. I built a custom Postgres image for use in AWX deployment with a Dockerfile as below.

FROM postgres:12.11
COPY init.sql /docker-entrypoint-initdb.d/

With init.sql being:

ALTER system SET max_connections = 1024;

When using this custom image with AWX operator 0.13.0 max_connections becomes 1024.

I’m trying to figure out what is causing all the idle connections, which almost all are from ‘/usr/bin/awx-manage run_dispatcher’.

Any ideas?

Rod

Hi,

On the newest version of operator, you should be able to set max_connections on the awx-demo.yml file, see this section in the README

https://github.com/fosterseth/awx-operator#managed-postgresql-service

postgres_extra_args:

  • ‘-c’
  • ‘max_connections=1000’

After bumping max_connections up to 1024, do you still see the “FATAL: sorry, too many clients already” errors?

We’d be curious to know if you also see the extra idle connections from the dispatcher service on the latest awx and awx-operator versions.

AWX Team

Sorry that I am using a old thread for my query. But my issue is the same as above. I see a lot of idle connections from the dispatcher service and due to this I see frequent alerts for “could not receive data from client: Connection reset by peer”. We are using Postgresql database with awx.

Point to note is that due to this behavior I am seeing a rise in memory usage as every second we get this message in the logs.

My question is:

  1. How can I stop this behavior and do not see so many frequent logs for “could not receive data from client: Connection reset by peer”?
  2. Is this normal behavior of dispatcher service to create idle sessions every second?

AWX version is : 21.0.0

It(AWX) is a pod running in openshift.

Attached screenshot of error message.

(attachments)

connection reset by peer.jpg

Imran,

I understand you are on 21.0.0 Have you considered upgrading? Significant work has been done regarding decreasing the number of idle connections that AWX keeps open in more recent releases, namely 21.6.0, though work has continued since then so upgrading to the latest release is always preferable.

In the past, multiple database connections were left open for the life of a job. Since https://github.com/ansible/awx/releases/tag/21.6.0 which included https://github.com/ansible/awx/pull/11745 these should be closed once the job is started.

Note, database connections are opened at the beginning of a job and at the end of the job to update status and other information, but the win is we don’t have a fixed number of database connections open for the whole duration of the job.