AWX using only one Task container when running jobs in container groups

Ilija_Matoski · November 16, 2020, 1:48pm

Hello,

When running jobs in task containers, our k8s cluster runs on EKS with autoscaler.
We are running 100 jobs on the cluster all jobs require on average between 22-25 minutes to complete.

Currently my replica set for the AWX deployment is set to 3

AWX configuration of note:

SYSTEM_TASK_ABS_CPU = 12 # ← This was auto-calculated by the installation script
SYSTEM_TASK_ABS_MEM = 61 # ← This was auto-calculated by the installation script
AWX_CONTAINER_GROUP_K8S_API_TIMEOUT = 30 AWX_CONTAINER_GROUP_POD_LAUNCH_RETRIES = 300 AWX_CONTAINER_GROUP_POD_LAUNCH_RETRY_DELAY = 30

There is more than enough time for all jobs to start, but what I’ve noticed is that all the tasks are ran through only one task container, and that container increases in memory until it crashes or if by any chance there is enough memory the jobs finish correctly.

With 100 jobs I’ve noticed that the only one task container goes to 12GB of memory and if it runs out all the jobs crash afterwards.

The containers limits are

Web:

Memory [1, 2]
CPU [1, 1.5]
Task:
Memory [6, 12]
CPU [3, 6]
Redis:
Memory [0.5, 2]
CPU: [0.5,1.5]

I confirmed that all the jobs were created from different web API container, as I can see in the logs that the POST for the new job is randomised between all the containers.

I’ve also tried using one Redis (ElastiCache), but then it’s the same issue only one container picks up the jobs and if it runs out of memory it crashes, and there is also some other weirdness happening at that time.

Does anyone know if I’m missing something obvious, or know what I need to change in the code to make it work between multiple task containers.

Is there any other information needed?

Topic		Replies	Views
multiple awx_task containers AWX Project awx	4	22	April 16, 2019
AWX kubernetes deployment having several group instances don't run jobs on an specific instance... AWX Project awx , kubernetes	3	29	December 20, 2019
AWX Instance groups Question AWX Project awx	4	18	January 21, 2019
Job fails intermittently on AWX (Kubernetes) AWX Project awx , kubernetes	4	18	January 10, 2020
Kindly advice for AnsibleAWX Performance AWX Project awx , kubernetes , aws	1	44	May 25, 2022

AWX using only one Task container when running jobs in container groups

Related topics