My AWX platform is experiencing error 499 from awx UI. I think the issue comes from the slow response from Postgres.
I had one big job run, it took about 10 hours to finish. However, when this job started, Postgres’s memory use was quite high and error 499 was visible . But it did not go down even after this task was finished.
Should I modify any configs? Or add vacuum full job for PG?
2024-02-13T01:48:07.889762187Z stdout F X.X.X.X - - [13/Feb/2024:01:48:07 +0000] "GET /api/v2/metrics/ HTTP/1.1" 499 0 "http://X.X.X.X:8052/api/v2/metrics" "Prometheus/2.48.0" "-"
2024-02-13T01:47:59.115374436Z stdout F X.X.X.X - - [13/Feb/2024:01:47:59 +0000] "GET / HTTP/1.1" 499 0 "-" "Blackbox Exporter/0.24.0" "X.X.X.X"
This causes: AWX UI response slow, prometheus can’t scape metrics.
My platform: AKS
AWX-operator: 2.10.0
AWX: 23.6.0
AWX-EE: 23.6.0
My current pg config:
postgres_resource_requirements:
requests:
cpu: 2000m
memory: 8Gi
limits:
cpu: 4000m
memory: 16Gi
postgres_storage_requirements:
requests:
storage: 64Gi
limits:
storage: 100Gi
postgres_image: docker.io/library/postgres
postgres_image_version: "13.13"
postgres_extra_args:
- "-c"
- "max_locks_per_transaction=1024"
- "-c"
- "max_connections=4000"
- "-c"
- "maintenance_work_mem=512MB"
- "-c"
- "shared_buffers=2048MB"
- "-c"
- "min_wal_size=2048MB"
- "-c"
- "max_wal_size=4096MB"
- "-c"
- "effective_io_concurrency=200"
- "-c"
- "checkpoint_completion_target=0.9"
- "-c"
- "effective_cache_size=6144MB"