Jobs are not following their output correctly after upgrade?

kevin_codey · February 14, 2024, 3:40pm

I’m running an operator on OpenShift controlling 5 instances and connected to the current release. The operator upgraded itself and now my instances are reporting themselves as AWX 23.8.0. That’s all good. Everything functions as expected, except the follow function.

Running jobs no longer “follow” correctly. I launch the job, and the GUI switches to the job page correctly. The status button correctly reports as running. I can watch the automation pod start, run, and stop in OpenShift. The whole time, the job page status button says the job is running with no output. It will stay in that state indefinitely. I can switch to the Details tab, and the status button will still say running. Then I can switch back to the output tab, and the status button will still be spinning and say running, but the accurate job output showing success and closure will appear. The output tab has the correct output, but the status button is still spinning and running.

If I go back to the jobs tab, the job accurately shows successful. If I open the job again, the status button reports successful. Everything is working, except the follow function. I’ve restarted the deployments and I’ve bounced the pods.

kevin_codey · February 15, 2024, 7:56pm

Today, I saw a couple jobs intermittently follow correctly. How do I even begin to see what’s happening on this?

kevin_codey · February 20, 2024, 1:53pm

I’m thinking this says my Operator is correctly loaded, right?

 containerStatuses:
    - restartCount: 0
      started: true
      ready: true
      name: awx-manager
      state:
        running:
          startedAt: '2024-02-20T13:48:55Z'
      imageID: >-
        quay.io/ansible/awx-operator@sha256:0274c3ca399fde5a22c4d8ea4199f1474e62092acac788c9023e703d76b5ec2d
      image: 'quay.io/ansible/awx-operator:2.12.0'
      lastState: {}
      containerID: 'cri-o://7f39ab451165c30c1b5ada7d7a248711954b19b1d727d3b08ba8c0a5839d6a35'
    - restartCount: 0
      started: true
      ready: true
      name: kube-rbac-proxy
      state:
        running:
          startedAt: '2024-02-20T13:48:54Z'
      imageID: >-
        gcr.io/kubebuilder/kube-rbac-proxy@sha256:a3768b8f9d259df714ebbf176798c380f4d929216e656dc30754eafa03a74c41
      image: 'gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0'
      lastState: {}
      containerID: 'cri-o://fde493f417b7b32b4b7e106aa6324db8c5af2247b9bb2ddb675a54575ae19b91'
  qosClass: Burstable

kevin_codey · February 20, 2024, 7:28pm

Does this ring any bells for anyone?

{"level":"error","ts":"2024-02-20T18:52:19Z","msg":"Reconciler error","controller":"awx-controller","object":{"name":"awx-i-sid","namespace":"awx-operator"},"namespace":"awx-operator","name":"awx-i-sid","reconcileID":"637da168-d0cb-47e0-ab9e-c50c339e4c5a","error":"event runner on failed","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.

kurokobo · February 21, 2024, 4:29am

@kevin_codey
Hi, I don’t think this is the root cause of your follow issue, but I suggest you to upgrade your Operator to 2.12.1.

2.12.0 has know issue and already marked as NOT RECOMMENDED.

Release 2.12.0 (Not recommended) · ansible/awx-operator · GitHub
awx.main.wsrelay records repeated errors due to 403 with 'Invalid response status' over '/websocket/relay/' · Issue #14876 · ansible/awx · GitHub

For your Reconciler error, we would like to have more logs from Operator not only that single line.

kevin_codey · February 21, 2024, 3:35pm

Thank you, @kurokobo.

We are trying a new method of installing AWX (as a tenant to OpenShift). We are starting with 2.12.1, so this will eliminate that variable. I am also going to use the more familiar way of creating secrets, which will remove another variable. I suspect that’s the problem with the reconcile. Time will tell.

Topic		Replies	Views
Jobs inconsistently report "Error" although run to completion AWX Project awx	24	20	January 27, 2023
Jobs no longer complete on 19.4.0 or 19.5.0 AWX Project awx , kubernetes	0	3	November 23, 2021
Anyone with output from jobs not updating in real time after upgrading to operator 2.0/awx 22? AWX Project awx	1	1	May 16, 2023
recurring problem: missing output on jobs AWX Project awx	0	15	December 14, 2018
AWX Issue reporting results Get Help awx	6	52	August 16, 2024

Jobs are not following their output correctly after upgrade?

Related topics