Operator 2.19.1 not starting completely

Hello,

I did a Kubernetes upgrade today (from 1.34.1 to 1.34.3), and now the operator-controller-manager won’t start completely.

Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  8m49s                   default-scheduler  Successfully assigned awxtest/awx-operator-controller-manager-687b856498-t5rxf to aks-agentpool2-21415998-vmss00000h
  Normal   Pulling    8m48s                   kubelet            Pulling image "quay.io/ansible/awx-operator:2.19.1"
  Normal   Pulled     8m14s                   kubelet            Successfully pulled image "quay.io/ansible/awx-operator:2.19.1" in 33.948s (33.948s including waiting). Image size: 203955289 bytes.
  Normal   Created    8m12s                   kubelet            Created container: awx-manager
  Normal   Started    8m12s                   kubelet            Started container awx-manager
  Normal   Pulling    5m22s (x5 over 8m48s)   kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0"
  Warning  Failed     5m22s (x5 over 8m48s)   kubelet            Failed to pull image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0": rpc error: code = NotFound desc = failed to pull and unpack image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0": failed to resolve reference "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0": gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0: not found
  Warning  Failed     5m22s (x5 over 8m48s)   kubelet            Error: ErrImagePull
  Normal   BackOff    3m45s (x18 over 8m11s)  kubelet            Back-off pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0"
  Warning  Failed     3m45s (x18 over 8m11s)  kubelet            Error: ImagePullBackOff

Is this a temporary problem on the side of gcr.io or is this version of kube-rbac-proxy no longer supported?

And what are the exact consequences if this proxy can’t start? The web and task pods are running, automation-job pods are created when running a job, and as far as I can see everything seems to work fine.

@ildjarn
Hi,

The kube-rbac-proxy image on gcr.io has already been discontinued, so you should replace it with the drop-in replacement from another registry, such as quay.io/brancz/kube-rbac-proxy.
I don’t know how you’ve deployed your AWX Operator, but if you’ve used kustomization.yaml, adding the following to your configuration may solve the issue:

images:
  - name: quay.io/ansible/awx-operator
    newTag: 2.19.1
  - name: gcr.io/kubebuilder/kube-rbac-proxy     👈👈👈
    newName: quay.io/brancz/kube-rbac-proxy     👈👈👈

This proxy provides RBAC for the /metrics endpoint of the AWX Operator.
In other words, if you don’t use any Prometheus in the same cluster to gather metrics from the AWX Operator (not from AWX itself), this proxy is not used in your environment.

Yes, thank you, adding those two lines to my kustomization.yaml worked.
If I don’t use metrics, is there a way to completely disable the proxy?

@ildjarn

You can :slight_smile:

...
resources:
  - github.com/ansible/awx-operator/config/default?ref=2.19.1

images:
  - name: quay.io/ansible/awx-operator
    newTag: 2.19.1
  # 👇👇👇 These lines are no longer required; you can comment (or wipe) them out.
  # - name: gcr.io/kubebuilder/kube-rbac-proxy
  #   newName: quay.io/brancz/kube-rbac-proxy

# 👇👇👇 Instead, add following patch
patches:
  - target:
      kind: Deployment
      name: awx-operator-controller-manager
    patch: |-
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: awx-operator-controller-manager
      spec:
        template:
          spec:
            containers:
            - name: kube-rbac-proxy
              $patch: delete

The above patch modifies the AWX Operator’s Pod so that kube-rbac-proxy container is not included.
Strictly speaking, after applying this, the following resources also become unnecessary and can be deleted:

kubectl -n awx delete \
  service/awx-operator-controller-manager-metrics-service

kubectl delete \
  clusterrole/awx-operator-proxy-role \
  clusterrole/awx-operator-metrics-reader \
  clusterrolebinding/awx-operator-proxy-rolebinding

If you’re concerned that these deleted four resources might come back when you redeploy the kustomization.yaml for the Operator, you can make the patch more comprehensive.
It would get a bit lengthy though :stuck_out_tongue:

Complete patch (Click to expand)
patches:
  - target:
      kind: Deployment
      name: awx-operator-controller-manager
    patch: |-
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: awx-operator-controller-manager
      spec:
        template:
          spec:
            containers:
            - name: kube-rbac-proxy
              $patch: delete
  - target:
      kind: Service
      name: awx-operator-controller-manager-metrics-service
    patch: |-
      $patch: delete
      apiVersion: v1
      kind: Service
      metadata:
        name: awx-operator-controller-manager-metrics-service
  - target:
      kind: ClusterRole
      name: awx-operator-proxy-role
    patch: |-
      $patch: delete
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: awx-operator-proxy-role
  - target:
      kind: ClusterRole
      name: awx-operator-metrics-reader
    patch: |-
      $patch: delete
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: awx-operator-metrics-reader
  - target:
      kind: ClusterRoleBinding
      name: awx-operator-proxy-rolebinding
    patch: |-
      $patch: delete
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: awx-operator-proxy-rolebinding