I’m trying to deploy awx with awx-operator. When I apply my kustomize, things start up, but then it terminates all the pods and only restarts postgres. It has replicas for awx-task and awx-web set to 0. I have to manually increase them to 1 before it starts these containers. If I make any changes to my config and reapply, it does the same thing. How can I get it to keep running awx-task and awx-web?
K8s 1.27 (k3s)
Kustomize.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# Find the latest tag here: https://github.com/ansible/awx-operator/releases
- github.com/ansible/awx-operator/config/default?ref=2.19.1
# - secrets.yaml
- tls.yaml
- awx.yaml
# Set the image tags to match the git version from above
images:
- name: quay.io/ansible/awx-operator
newTag: 2.19.1
# Specify a custom namespace in which to install AWX
namespace: sea
This may be unrelated, but when I deleted my namespace and tried to recreate, the awx-task won’t start at all. It’s endlessly waiting for database migrations, but there is no migration job ever created, so it seems completely stuck here.
{“level”:“error”,“ts”:“2024-09-17T14:21:20Z”,“msg”:“Reconciler error”,“controller”:“awx-controller”,“object”:{“name”:“awx”,“namespace”:“sea”},“namespace”:“sea”,“name”:“awx”,“reconcileID”:“1fa7a53a-ca3a-46db-9049-dcd78f6e1cbb”,“error”:“event runner on failed”,“stacktrace”:“sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227”}
I think this is the problem, but I don’t know what to do about it.
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 248, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply deployment resources\n ^ here\n"}
I saw an issue report on the github saying it was a problem upgrading from 2.18 to 2.19, but I didn’t upgrade from 2.18. I installed fresh from 2.19.1. If I install version 2.18, I don’t get this error and the deployment gets scaled back up after the initial “migration”.
Hi, I see the same error trying to go from 2.15.0 to 2.19.1. I tried adding the spec to awx-operator helm deployment: web_manage_replicas: true
But it still says it is not defined.
p.s. wxs.awx.ansible.com CRD needs to be forcefully upgraded or AWX complete redeployed.
Yes, the problem for me was that someone else already installed an older awx-operator on the cluster with helm, so the CRDs were wrong, and even if I updated the CRDs, the helm installer would revert the CRDs while my awx was deploying, so it was always a bit random if things would work depending on when the CRDs were broken.