Hi Team,
We deployed AWX on EKS. Configured cluster auto scaler, hpa(pod auto scaler).
When we increase replica count, pods are failing with insufficient cpu. Nodes are not auto scaling. Nodes are not getting added.
My question: Is there anything else we need to add to cluster for nodes t auto scale.
Please help! Thanks in advance.
Best wishes & Regards,
AB
Hi AB,
At the moment, the awx-operator does not support using a HorizontalPodAutoscaler with it. The problem is that the operator tries to maintain the replicas value set in the spec (or the default of 1). So as the service comes under load, the HPA tries to scale up, but the operator’s reconciliation loop will come by and overwrite the changes the HPA made.
One approach to support HPA’s would be to set watchDependentResources
to False in the watches.yaml, but we have other logic in the operator that depends upon that being true…
More info: https://master.sdk.operatorframework.io/docs/building-operators/ansible/reference/dependent-watches/
We are interested in other potential solutions. For now, if you want to scale up, you can change the value for replicas on the AWX spec and the operator will reconcile that change.
Thank you,
AWX Team
Thank you very much for responding and sharing great info regarding HPA.
We have set replica count as 4 for AWX but two of the pods are going into pending state due to insufficient CPU.
We have also deployed/configured cluster autoscaler and metrics server in the EKS cluster therefore we expecting node group to scale out i.e., to add new nodes in node group but we are wondering why it isn’t happening?
Can you please help us in this regard?
Thanks in advance,
AB
I am not sure about cluster autoscaling. It sounds like the AWX pods are hitting a resource constraint. You could try specifying lower CPU requests on the AWX spec - https://github.com/ansible/awx-operator#containers-resource-requirements
Honestly the default values in that doc are pretty high in my opinion. We should probably change those to 50m each.
Thanks,
Christian