How is Ansible capacity calculated?

Hi

We were testing out AWX’s ability to handle concurrent Ansible jobs. We executed a single Ansible job to a 1000 hosts with various number of forks (100,200,500 etc). The AWX server’s hardware’s spec is very good (8 CPU and 64GB of memory). The assigned Ansible job is a simple job where it’s shell command is “echo start; sleep 20; echo”. In the results, the AWX_web shows it inconsistently doesn’t get a response from some of the 1000 hosts. So the whole job status get stuck in a running state. In this state, another job in the AWX queue will not get executed as well. As almost all of the jobs on the host gets executed, and only a few hosts didn’t get executed, there should be ample capacity for the next Ansible job to run. It appears that the way AWX node capacity gets calculated prevents other AWX jobs from being executed. Can you please tell us how the maximum capacity and used capacity gets calculated. Also in the UI, the value used capacity calculated at the start of the job execution doesn’t get changed throughout the execution of the Ansible job.

Jae Kim

See get_system_task_capacity in https://github.com/ansible/awx/blob/devel/awx/main/utils/common.py#L636 for the algorithm used, which can be overridden by an explicit SYSTEM_TASK_CAPACITY setting.

Also look at https://github.com/ansible/awx/blob/devel/awx/main/scheduler/task_manager.py#L568 for how consumed capacity is calculated for all running jobs and https://github.com/ansible/awx/blob/devel/awx/main/models/jobs.py#L619 for how the number of hosts/forks is used to determine the capacity for a single job.

The capacity for a job is fixed while it is running, and not based on actual ansible-playbook activity. Even though your job is mostly finished running and may not be using many actual system resources, it is still “consuming” most/all of the capacity allocated for jobs in AWX.