AWX kubernetes deployment having several group instances don't run jobs on an specific instance...

Cesar_Sanchez2 · March 16, 2018, 3:18pm

Hi guys…

I’ve been trying to move our docker setup into kubernetes, to benefit from the Instance Groups allocation feature, but, we’re having some problems so far.

Our baremetal kubernetes cluster has 4 nodes (12G ram / 4 CPUs each)
We want to run one playbook on all our inventories (500) once a day. The inventory host count varies from 2 hosts to 100.
We want to deploy 4 replicas (1 on each node), however, we’ve seen some weird behaviour where instances are not automatically provisioned, or removed, so, I’m currently only deploying 2 instances for testing purposes, which has shown more stability on that matter.

In AWX, I’ve created a new Instance Group, for the playbook I want to run daily, and assigned one instance to it, and left the tower default instance group with the 2 provisioned instances.

I’ve specifically set the instance group for the job template I want to run, to the newly created instance.

If I batch schedule the playbook on all inventories, the scheduler runs the inventory syncs correctly, using the default tower instance, but, when the time comes to run the playbook using the other instance group, the job is failed with explanation: "Task was marked as running in Tower but was not present in the job queue, so it has been marked as failed. "

If I remove the new instance group and let AWX handle it using the default instance group, It works. The job is scheduled and executed correctly, until the max schedulable jobs capacity is reached (25 in my case), After that, we’re totally block to run any other playbook, until capacity is freed (main reason we’re looking at the new feature in the first place).

Any ideas, bugs addressing this issue? Any suggestions, workarounds?

Thanks,

-Cesar

Mauro_Aragunde · January 4, 2019, 8:27pm

Hey,

I am having the same issue when I installed AWX in Openshift with 2 replicas. It seems that the first replica (instance 0) works fine but when a job tries to use the second replica (instance 1) it keeps on waiting and then says "Task was marked as running in Tower but was not present in the job queue, so it has been marked as failed. ".

Have you been able to solve it?

dutu.adrian90 · February 1, 2019, 9:37am

What is your db setup? Is it running as a container in a pod or is it external?

Vibin · December 20, 2019, 4:03pm

Hello Cesar,

Have you had any luck on fixing this issue or an alternative ?

I’m eager to hear from you as we are also facing the same issue.

Regards,
Vibin

Topic		Replies	Views
AWX Instance groups Question AWX Project awx	4	6	January 21, 2019
AWX Kubernetes AWX Project awx , kubernetes	10	18	November 15, 2019
multiple awx_task containers AWX Project awx	4	10	April 16, 2019
Scaling AWX in a docker-compose/rancher installation AWX Project awx , kubernetes	13	8	April 10, 2019
AWX Container Group Job Queuing Get Help awx , kubernetes	2	54	December 10, 2024

AWX kubernetes deployment having several group instances don't run jobs on an specific instance...

Related topics