Jobs are not following their output correctly after upgrade?

I’m running an operator on OpenShift controlling 5 instances and connected to the current release. The operator upgraded itself and now my instances are reporting themselves as AWX 23.8.0. That’s all good. Everything functions as expected, except the follow function.

Running jobs no longer “follow” correctly. I launch the job, and the GUI switches to the job page correctly. The status button correctly reports as running. I can watch the automation pod start, run, and stop in OpenShift. The whole time, the job page status button says the job is running with no output. It will stay in that state indefinitely. I can switch to the Details tab, and the status button will still say running. Then I can switch back to the output tab, and the status button will still be spinning and say running, but the accurate job output showing success and closure will appear. The output tab has the correct output, but the status button is still spinning and running.

If I go back to the jobs tab, the job accurately shows successful. If I open the job again, the status button reports successful. Everything is working, except the follow function. I’ve restarted the deployments and I’ve bounced the pods.

Today, I saw a couple jobs intermittently follow correctly. How do I even begin to see what’s happening on this?

I’m thinking this says my Operator is correctly loaded, right?

 containerStatuses:
    - restartCount: 0
      started: true
      ready: true
      name: awx-manager
      state:
        running:
          startedAt: '2024-02-20T13:48:55Z'
      imageID: >-
        quay.io/ansible/awx-operator@sha256:0274c3ca399fde5a22c4d8ea4199f1474e62092acac788c9023e703d76b5ec2d
      image: 'quay.io/ansible/awx-operator:2.12.0'
      lastState: {}
      containerID: 'cri-o://7f39ab451165c30c1b5ada7d7a248711954b19b1d727d3b08ba8c0a5839d6a35'
    - restartCount: 0
      started: true
      ready: true
      name: kube-rbac-proxy
      state:
        running:
          startedAt: '2024-02-20T13:48:54Z'
      imageID: >-
        gcr.io/kubebuilder/kube-rbac-proxy@sha256:a3768b8f9d259df714ebbf176798c380f4d929216e656dc30754eafa03a74c41
      image: 'gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0'
      lastState: {}
      containerID: 'cri-o://fde493f417b7b32b4b7e106aa6324db8c5af2247b9bb2ddb675a54575ae19b91'
  qosClass: Burstable

Does this ring any bells for anyone?

{"level":"error","ts":"2024-02-20T18:52:19Z","msg":"Reconciler error","controller":"awx-controller","object":{"name":"awx-i-sid","namespace":"awx-operator"},"namespace":"awx-operator","name":"awx-i-sid","reconcileID":"637da168-d0cb-47e0-ab9e-c50c339e4c5a","error":"event runner on failed","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.

@kevin_codey
Hi, I don’t think this is the root cause of your follow issue, but I suggest you to upgrade your Operator to 2.12.1.

2.12.0 has know issue and already marked as NOT RECOMMENDED.

For your Reconciler error, we would like to have more logs from Operator not only that single line.

1 Like

Thank you, @kurokobo.

We are trying a new method of installing AWX (as a tenant to OpenShift). We are starting with 2.12.1, so this will eliminate that variable. I am also going to use the more familiar way of creating secrets, which will remove another variable. I suspect that’s the problem with the reconcile. Time will tell.