AWX Jobs Constantly Failing

Matt_Page · July 7, 2022, 11:22am

I have migrated AWX to a new K3s cluster, and find that I now get the following errors for almost every job run:

Error opening pod stream: Get “https://awx.k3s.net:10250/containerLogs/awx/automation-job-12053-blqxd/worker?follow=true”: EOF

Sometimes there are no errors at all, the job simply has a status of failed with:

“No output for this job”

What could be causing this?

Matt_Page · July 7, 2022, 11:43am

Also, I can see the automation pods being created, they are then terminated with a reason ‘Killing’.

I cannot see any OOM or CPU errors, there is plenty of overhead resource wise.

Wei-Yen_Tan · July 7, 2022, 2:44pm

I encountered this too although on rke2. What version of awx

Matt_Page · July 8, 2022, 10:29am

Glad to know I’m not the only one. Versions:

AWX - 21.2.0
Operator - 0.23.0
K3s - v1.23.7+k3s1

AWX_Project · July 13, 2022, 7:56pm

Hello!

a similar issue reported here https://github.com/ansible/awx/issues/12288

Sounds like an issue when too many jobs are running at the same time, and resource limits are being hit.

I cannot see any OOM or CPU errors, there is plenty of overhead resource wise.

What metrics are you using from the k8s engine to see overhead?

AWX Team

Matt_Page · July 14, 2022, 9:27am

Hello

Thanks for getting back to me.

This appears to only occur when I run a playbook which requires dynamic inventories to be updated. The inventories (3 sources) appear to be updated one at a time (each inventory source creates a pod for the update, the next source waits for the previous before starting it’s own pod).

I am using k9s, which I believe gets it’s metrics from the k8 metrics API. This shows, prior to playbook run, CPU 5%, memory 49%, there are no spikes when the errors occur.

I must also mention that no other AWX tasks are running at the time.

Regardless, I have increased the memory/CPU of the nodes and alas the same issue occurs.

Thanks

Matt_Page · July 18, 2022, 10:19am

Hi

Any idea where to take this?

Thanks

AWX_Project · July 20, 2022, 7:44pm

I have migrated AWX to a new K3s cluster

what k8s cluster (and version) were you using previously where AWX was working?

Anything special in your awx-demo.yml (or equivalent), feel free to copy and paste it here (remove sensitive info first please)

AWX Team

Matt_Page · July 29, 2022, 9:46am

Unfortunately I can’t remember the k3s version and that is now gone.

I can say the old version of AWX was 19.5.0 (operator 0.15.0).

You can find my configurations in the github issue:

https://github.com/ansible/awx/issues/12549

Thanks

AWX_Project · August 5, 2022, 5:20pm

Hi Poloniuns,

Since we don’t have too much insight into your environment, are you able to try out Tower in another isolated k8s environment such as minikube?

If so, may we kindly ask that you do so, and let us know if the issue persists. Try running the demo OOTB project on a fresh instance of AWX. This might help us discern if the issue is within our product or an environmental issue.

Thanks,
AWX Team

Matt_Page · August 9, 2022, 9:37am

Hi

Apologies for the delay.

To be frank, I don’t see the point in doing this, I assume that you already know that OOTB AWX works on a minikube instance.

What I was hoping for in raising this issue is some pointers as to where I might be able to look to help identify where the issue is coming from.

Thanks

Wei-Yen_Tan · August 9, 2022, 10:32am

I relooked at your issue. I solved my problem by upgrading the kubernetes version. I suggest you upgrade the k3s version to the latest one first

Matt_Page · August 9, 2022, 12:01pm

Thanks, but I’m not able to upgrade k3s as another app depends on the earlier version I’m afraid.

Wei-Yen_Tan · August 9, 2022, 12:04pm

Then unfortunately I can’t help you with that. The newer version of awx requires a certain version of kubernetes that is only provided by upgrading to the latest version of k3s.

The only option I have for you is to remain on awx 19.5 which is the one I think is good your version of

K3s until you are ready to move up

Alex3 · August 9, 2022, 12:24pm

Which version of k8s is required and why?

Скачайте Outlook для iOS

Wei-Yen_Tan · August 9, 2022, 12:29pm

It’s the latest one. I am just sharing my experiences. I had a an older version of kubernetes. Then as I upgraded awx to the later versions of awx I increasingly found those errors that the op mentioned. I then read a change log mentioning this and to upgrade to a later version.

The moment I upgraded the problems I had immediately went away

Matt_Page · August 9, 2022, 12:51pm

Thanks, that is good information. Unfortunately now I need to find a way to downgrade AWX… I imagine this is not easy.

Does AWX keep a kubernetes version requirement? I can’t seem to find one.

Wei-Yen_Tan · August 9, 2022, 12:58pm

I kept my awx configuration as code using redhat-cop controller configuration. Then i just wiped my awx and deployed a lesser Version. Then ran the controller configuration against my awx installation. Boom. All my awx configuration came back.

Wei-Yen_Tan · August 9, 2022, 12:58pm

That’s my suggestion to you

Matt_Page · August 9, 2022, 2:05pm

Thanks for the tip, I see passwords aren’t exported with the tool you mentioned (too many to repopulate) so have opted to try upgrading the cluster.

AWX team - is there any documentation around supported kubernetes versions? I can’t find any myself.

Thanks

Topic		Replies	Views
Error opening pod stream' or 'Stream error running pod' randomly on some jobs AWX Project awx	2	21	November 2, 2022
AWX Crashes when launch 7 concurrent jobs AWX Project awx , ubuntu	5	6	March 13, 2023
Pod awx-operator-controller-manager crashing and restarting nonstop with huge inventory Get Help awx , kubernetes , ee	12	1550	May 3, 2024
awx 19.0.0 and higher: awx stops working out of the blue AWX Project awx , kubernetes	2	5	May 10, 2021
Automation pod started up and not completed. AWX Project awx , kubernetes	4	11	April 27, 2023

AWX Jobs Constantly Failing

Related topics