AWX Operator v0.13.0 and lots of AWX instances

We have been using AWX 19.0.0 with AWX operator 0.13.0 for a while. The AWX operator is installed in a dedicated namespace. Each member of our team has their own AWX instance which is installed in its own namespace. This worked fine, until the number the number of AWX instances went toward 20. I’ve noticed that instance spawning takes quite a long time and is becoming unreliable. I’ve also noticed that if the operator pod is deleted, it looks like it refreshes AWX current deployment state, during which it’s not able to deploy new AWX instances.

Is there a way to give the operator more resources, in order that it can manage a lot of instances?

I’ve seen that from AWX operator 0.14.0 it’s required to install the operator in the same namespace as the AWX instance. I’m wondering whether one of the reasons for this change from 0.13.0 was to deal with the strains of the 1:N ratio between operator and AWX instances, and that with the number of AWX instances to deal with, that going to 0.14.0+ would be a good idea.Any thoughts?

Hi there,

Is there a reason for each person having their own instance of AWX? We do have robust RBAC/user settings and permissions that I think would allow you to cut down on the number of dedicated instances and therefore cut down on the stress on the operator. AWX wasn’t really designed to have one user per isntance (although it certainly can) and I think maintaining the resources for each of those individual instances is quite a bit of computational burden.
All that being said, if you could give me more info about how it’s becoming unreliable it would be helpful.
You certainly can give operator more dedicated capacity (advanced settings info is here: https://github.com/ansible/awx-operator#advanced-configuration), but I think I need a bit more info to give you better/more specific help.

Cheers
Beccah & The AWX Team

Hello,

Beccah brings up a good point. But if you are already locked into the “1 deployment per user” workflow, I would recommend customizing your container resources via the AWX spec.

spec:

ee_resource_requirements:
requests:
cpu: 50m
memory: 50Mb
task_resource_requirements:
requests:
cpu: 50m
memory: 50Mb
web_resource_requirements:
requests:
cpu: 50m
memory: 50Mb

I expect that if you were to describe the pods for your new deployments, you would see resource/taint errors and that your pod is not being scheduled as a result. By minimizing the resources requested per container in the pod, you should be able to squeeze some more instances out of your cluster. Also, I recommend reading this k8s doc: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits. It is recommended to keep your requests low, and increase the replicas as needed for more capacity.

I hope this helps,
Christian & The AWX Team

I just noticed there was a typo in the yaml I pasted above. For memory, it is Mi, not Mb.

spec:

ee_resource_requirements:

requests:

cpu: 50m

memory: 50Mb

task_resource_requirements:

requests:

cpu: 50m

memory: 50Mb

web_resource_requirements:

requests:

cpu: 50m

memory: 50Mb

Also, I did a test deployment with cpu set to 10m (millicore) for all of these containers and was able to deploy and run a job without issue. Performance there will vary based on what hardware and resources your actual cluster has of course.

Thanks,
Christian

Sorry for the typo again…

spec:

ee_resource_requirements:

requests:

cpu: 50m

memory: 50Mi

task_resource_requirements:

requests:

cpu: 50m

memory: 50Mi

web_resource_requirements:

requests:

cpu: 50m

memory: 50Mi

Thanks,
Christian