Hi guys…
I’ve been trying to move our docker setup into kubernetes, to benefit from the Instance Groups allocation feature, but, we’re having some problems so far.
- Our baremetal kubernetes cluster has 4 nodes (12G ram / 4 CPUs each)
- We want to run one playbook on all our inventories (500) once a day. The inventory host count varies from 2 hosts to 100.
- We want to deploy 4 replicas (1 on each node), however, we’ve seen some weird behaviour where instances are not automatically provisioned, or removed, so, I’m currently only deploying 2 instances for testing purposes, which has shown more stability on that matter.
In AWX, I’ve created a new Instance Group, for the playbook I want to run daily, and assigned one instance to it, and left the tower default instance group with the 2 provisioned instances.
I’ve specifically set the instance group for the job template I want to run, to the newly created instance.
If I batch schedule the playbook on all inventories, the scheduler runs the inventory syncs correctly, using the default tower instance, but, when the time comes to run the playbook using the other instance group, the job is failed with explanation: "Task was marked as running in Tower but was not present in the job queue, so it has been marked as failed. "
If I remove the new instance group and let AWX handle it using the default instance group, It works. The job is scheduled and executed correctly, until the max schedulable jobs capacity is reached (25 in my case), After that, we’re totally block to run any other playbook, until capacity is freed (main reason we’re looking at the new feature in the first place).
Any ideas, bugs addressing this issue? Any suggestions, workarounds?
Thanks,
-Cesar