AWX Container Group Job Queuing

AnsibleAndChill · December 9, 2024, 11:44pm

AWX 24.6.1

Two separate k8s clusters, one for my AWX instance group and one for my container group. Currently running 1 web and 1 task container for testing. Container group is 10 on-prem k8s nodes with tons of dedicated CPU/MEM. AWX capacity says I’m at 8% at max on my instance group and each node should run 100+ jobs without issue per the math from the white paper. I created a test job template that pauses for 60 minutes and I can launch/run 50 (per my config) jobs at a time if the jobs run long enough to overlap.

I have a basic template that takes 30 seconds to run and I may make ~500 API calls to AWX to run said job template when running a report. AWX will run my 500 jobs but it will only start them 1 at a time and only about 1 job every 30 seconds. The result is 450 pending jobs, 50 running, but only 1-5 of the jobs is actually running at any given time. So instead of burning through 500 jobs in a few minutes it generally takes 3-4 hours to complete all of the queued jobs, similar if they were queued to only run one after another.

Is this a limitation on my k8s cluster or some setting in AWX? I would expect AWX to fire batch API calls to my k8s cluster and be done with it but AWX appears to queue jobs one at a time when sending to the container group. With my 30 second test job the most I’ve ever seen running (via kubectl) simultaneously is ~5 jobs at a time and there isn’t anything “waiting” on my k8s cluster.

Thanks in advance!

mcen1 · December 10, 2024, 1:08am

This may be a silly question but have you enabled concurrent job runs in your job template’s settings?

AnsibleAndChill · December 10, 2024, 1:25am

Yep, I can get (50) 60-minute test templates running at the same time, they just take ~30 minutes to start and queue up.

The most I’ve seen of the 30-second template is 5 at once. There are 50 jobs “running” but it’s only starting one at a time. 5 isn’t a magic number in this case, just the amount of time it takes to send and start a job. As soon as job #1 is wrapping up job #6 is ready to send.

Like I said, it’s like it’s sending the jobs to the container group one at a time for some reason.

Topic		Replies	Views
AWX using only one Task container when running jobs in container groups AWX Project awx , kubernetes	0	52	November 16, 2020
AWX kubernetes deployment having several group instances don't run jobs on an specific instance... AWX Project awx , kubernetes	3	23	December 20, 2019
Long start time for jobs AWX Project awx	7	41	June 9, 2023
Understanding the queueing of the job Templates AWX Project awx	0	8	November 15, 2018
AWX Slowness issue - Long start time for jobs-AWX 24.6.1 Get Help awx , rhel	13	589	October 31, 2024

AWX Container Group Job Queuing

Related topics