AWX Jobs Constantly Failing

I have migrated AWX to a new K3s cluster, and find that I now get the following errors for almost every job run:

Error opening pod stream: Get “https://awx.k3s.net:10250/containerLogs/awx/automation-job-12053-blqxd/worker?follow=true”: EOF

Sometimes there are no errors at all, the job simply has a status of failed with:

“No output for this job”

What could be causing this?

Also, I can see the automation pods being created, they are then terminated with a reason ‘Killing’.

I cannot see any OOM or CPU errors, there is plenty of overhead resource wise.

I encountered this too although on rke2. What version of awx

Glad to know I’m not the only one. Versions:

AWX - 21.2.0
Operator - 0.23.0
K3s - v1.23.7+k3s1

Hello!

a similar issue reported here https://github.com/ansible/awx/issues/12288

Sounds like an issue when too many jobs are running at the same time, and resource limits are being hit.

I cannot see any OOM or CPU errors, there is plenty of overhead resource wise.

What metrics are you using from the k8s engine to see overhead?

AWX Team

Hello

Thanks for getting back to me.

This appears to only occur when I run a playbook which requires dynamic inventories to be updated. The inventories (3 sources) appear to be updated one at a time (each inventory source creates a pod for the update, the next source waits for the previous before starting it’s own pod).

I am using k9s, which I believe gets it’s metrics from the k8 metrics API. This shows, prior to playbook run, CPU 5%, memory 49%, there are no spikes when the errors occur.

I must also mention that no other AWX tasks are running at the time.

Regardless, I have increased the memory/CPU of the nodes and alas the same issue occurs.

Thanks

Hi

Any idea where to take this?

Thanks

I have migrated AWX to a new K3s cluster

what k8s cluster (and version) were you using previously where AWX was working?

Anything special in your awx-demo.yml (or equivalent), feel free to copy and paste it here (remove sensitive info first please)

AWX Team

Unfortunately I can’t remember the k3s version and that is now gone.

I can say the old version of AWX was 19.5.0 (operator 0.15.0).

You can find my configurations in the github issue:

https://github.com/ansible/awx/issues/12549

Thanks

Hi Poloniuns,

Since we don’t have too much insight into your environment, are you able to try out Tower in another isolated k8s environment such as minikube?

If so, may we kindly ask that you do so, and let us know if the issue persists. Try running the demo OOTB project on a fresh instance of AWX. This might help us discern if the issue is within our product or an environmental issue.

Thanks,
AWX Team

Hi

Apologies for the delay.

To be frank, I don’t see the point in doing this, I assume that you already know that OOTB AWX works on a minikube instance.

What I was hoping for in raising this issue is some pointers as to where I might be able to look to help identify where the issue is coming from.

Thanks

I relooked at your issue. I solved my problem by upgrading the kubernetes version. I suggest you upgrade the k3s version to the latest one first

Thanks, but I’m not able to upgrade k3s as another app depends on the earlier version I’m afraid.

Then unfortunately I can’t help you with that. The newer version of awx requires a certain version of kubernetes that is only provided by upgrading to the latest version of k3s.

The only option I have for you is to remain on awx 19.5 which is the one I think is good your version of

K3s until you are ready to move up

Which version of k8s is required and why?

Скачайте Outlook для iOS

It’s the latest one. I am just sharing my experiences. I had a an older version of kubernetes. Then as I upgraded awx to the later versions of awx I increasingly found those errors that the op mentioned. I then read a change log mentioning this and to upgrade to a later version.

The moment I upgraded the problems I had immediately went away

Thanks, that is good information. Unfortunately now I need to find a way to downgrade AWX… I imagine this is not easy.

Does AWX keep a kubernetes version requirement? I can’t seem to find one.

I kept my awx configuration as code using redhat-cop controller configuration. Then i just wiped my awx and deployed a lesser Version. Then ran the controller configuration against my awx installation. Boom. All my awx configuration came back.

That’s my suggestion to you

Thanks for the tip, I see passwords aren’t exported with the tool you mentioned (too many to repopulate) so have opted to try upgrading the cluster.

AWX team - is there any documentation around supported kubernetes versions? I can’t find any myself.

Thanks