I did a Kubernetes upgrade today (from 1.34.1 to 1.34.3), and now the operator-controller-manager won’t start completely.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m49s default-scheduler Successfully assigned awxtest/awx-operator-controller-manager-687b856498-t5rxf to aks-agentpool2-21415998-vmss00000h
Normal Pulling 8m48s kubelet Pulling image "quay.io/ansible/awx-operator:2.19.1"
Normal Pulled 8m14s kubelet Successfully pulled image "quay.io/ansible/awx-operator:2.19.1" in 33.948s (33.948s including waiting). Image size: 203955289 bytes.
Normal Created 8m12s kubelet Created container: awx-manager
Normal Started 8m12s kubelet Started container awx-manager
Normal Pulling 5m22s (x5 over 8m48s) kubelet Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0"
Warning Failed 5m22s (x5 over 8m48s) kubelet Failed to pull image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0": rpc error: code = NotFound desc = failed to pull and unpack image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0": failed to resolve reference "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0": gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0: not found
Warning Failed 5m22s (x5 over 8m48s) kubelet Error: ErrImagePull
Normal BackOff 3m45s (x18 over 8m11s) kubelet Back-off pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0"
Warning Failed 3m45s (x18 over 8m11s) kubelet Error: ImagePullBackOff
Is this a temporary problem on the side of gcr.io or is this version of kube-rbac-proxy no longer supported?
And what are the exact consequences if this proxy can’t start? The web and task pods are running, automation-job pods are created when running a job, and as far as I can see everything seems to work fine.
The kube-rbac-proxy image on gcr.io has already been discontinued, so you should replace it with the drop-in replacement from another registry, such as quay.io/brancz/kube-rbac-proxy.
I don’t know how you’ve deployed your AWX Operator, but if you’ve used kustomization.yaml, adding the following to your configuration may solve the issue:
This proxy provides RBAC for the /metrics endpoint of the AWX Operator.
In other words, if you don’t use any Prometheus in the same cluster to gather metrics from the AWX Operator (not from AWX itself), this proxy is not used in your environment.
...
resources:
- github.com/ansible/awx-operator/config/default?ref=2.19.1
images:
- name: quay.io/ansible/awx-operator
newTag: 2.19.1
# 👇👇👇 These lines are no longer required; you can comment (or wipe) them out.
# - name: gcr.io/kubebuilder/kube-rbac-proxy
# newName: quay.io/brancz/kube-rbac-proxy
# 👇👇👇 Instead, add following patch
patches:
- target:
kind: Deployment
name: awx-operator-controller-manager
patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: awx-operator-controller-manager
spec:
template:
spec:
containers:
- name: kube-rbac-proxy
$patch: delete
The above patch modifies the AWX Operator’s Pod so that kube-rbac-proxy container is not included.
Strictly speaking, after applying this, the following resources also become unnecessary and can be deleted:
If you’re concerned that these deleted four resources might come back when you redeploy the kustomization.yaml for the Operator, you can make the patch more comprehensive.
It would get a bit lengthy though