Reduce Postgres memory usage for AWX

Crimrose · February 13, 2024, 2:21am

My AWX platform is experiencing error 499 from awx UI. I think the issue comes from the slow response from Postgres.

I had one big job run, it took about 10 hours to finish. However, when this job started, Postgres’s memory use was quite high and error 499 was visible . But it did not go down even after this task was finished.

Should I modify any configs? Or add vacuum full job for PG?

2024-02-13T01:48:07.889762187Z stdout F X.X.X.X - - [13/Feb/2024:01:48:07 +0000] "GET /api/v2/metrics/ HTTP/1.1" 499 0 "http://X.X.X.X:8052/api/v2/metrics" "Prometheus/2.48.0" "-"
2024-02-13T01:47:59.115374436Z stdout F X.X.X.X - - [13/Feb/2024:01:47:59 +0000] "GET / HTTP/1.1" 499 0 "-" "Blackbox Exporter/0.24.0" "X.X.X.X"

This causes: AWX UI response slow, prometheus can’t scape metrics.
My platform: AKS
AWX-operator: 2.10.0
AWX: 23.6.0
AWX-EE: 23.6.0

My current pg config:

  postgres_resource_requirements:
    requests:
      cpu: 2000m
      memory: 8Gi
    limits:
      cpu: 4000m
      memory: 16Gi
  postgres_storage_requirements:
    requests:
      storage: 64Gi
    limits:
      storage: 100Gi
  postgres_image: docker.io/library/postgres
  postgres_image_version: "13.13"
  postgres_extra_args:
    - "-c"
    - "max_locks_per_transaction=1024"
    - "-c"
    - "max_connections=4000"
    - "-c"
    - "maintenance_work_mem=512MB"
    - "-c"
    - "shared_buffers=2048MB"
    - "-c"
    - "min_wal_size=2048MB"
    - "-c"
    - "max_wal_size=4096MB"
    - "-c"
    - "effective_io_concurrency=200"
    - "-c"
    - "checkpoint_completion_target=0.9"
    - "-c"
    - "effective_cache_size=6144MB"

Crimrose · February 15, 2024, 9:14am

Do you have any idea why one job run too long can cause throttling for postgres’s CPU? I have one theory that this job run too long and it insert some jobs output then it caused high CPU.

TheRealHaoLiu · February 27, 2024, 3:36am

does your long running job spit out a lot of output events?
those all get store in the database and can definitely be expensive

Crimrose · March 1, 2024, 11:32am

Sorry for the late response. I’m quite busy recently with a ton of work
Here was my ansible job

    - name: Start restore database
      ansible.windows.win_powershell:
        script: |
		  ### do restore here
      async: 86400
      poll: 10

Output something like that:

ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0
ASYNC POLL on XXXX: jid=j422747873893.12560 started=1 finished=0

Job execute for 10 hours

I’m wondering that, this job execute in remote server, but why postgres cpu usage was too high ~ 3.11 cores. The number of write requests not very large if it only write some lines of logs to DB. Normally, it only consumes 1,5 cores

Topic		Replies	Views
Extremely slow performance when using AWX with external Postgres instance AWX Project awx	4	28	December 14, 2017
AWX 24.6.1 - jobs are running very slow Get Help awx , rhel	5	595	October 3, 2024
Scaling PostgreSQL to higher instance counts AWX Project awx , kubernetes	0	7	March 12, 2019
Can't start AWX web, connection rabbitmq, postgres timeout AWX Project awx	17	21	March 6, 2020
AWX fails to connect to my external postgres server AWX Project awx	2	51	March 16, 2018

Reduce Postgres memory usage for AWX

Related topics