Scheduled jobs failling to execute

Hi team,

I recently deploy version 21.4 of AWX using the 0.26 operator on a single node K3s cluster.
All seem to be working as expected, some scheduled jobs fail with this error:

"Stream error running pod: stdin: error dialing backend: EOF, stdout: http2: response body closed

Job is scheduled to run every 5 minutes and runs a simple playbook to execute a python script.
More concerning though is that this also occurs randomly when executing scheduled backup jobs.

Can anyone help identify what might be causing this issue.

Kind regards,
Michael

Hey there,
It sounds like there have been a couple other people to run into similar issues. It might be that your cluster is being restarted since that error is a k3s error.
https://github.com/ansible/awx/issues/10219 ← the last comment in here has an explanation of what was going on with their setup, maybe it will help you out.
AWX Team

What version of k3s? And have You has multiple upgraded iterations of awx?

Hi,

I did see that github thread, I ran “watch kubectl get pods -n awx” while the tasks are running and all the pods are up. I also see the automation-job-xxxx container being created and started, but after a few seconds it terminates with this error.

Would the pods still stay up while the cluster is restarted?

Thanks for the help,

Michael

(attachments)

cityscoot.gif

Hi Team,

I have been monitoring the k3s logs and noticed the following error when jobs fail:

log.go:195] http: TLS handshake error from 127.0.0.1:39512: tls: first record does not look like a TLS handshake

Has anyone seen this behavior?

Any pointers to what I should check?

Regards,

(attachments)

cityscoot.gif

Hello,
We were able to locate a GitHub issue that seems similar to yours: https://github.com/kubernetes/dashboard/issues/2895. This resolution may prove to be helpful to you, as what you have described appears to be an issue with the Kubernetes configuration.

Thanks,
AWX Team

Hi Team,

I was able to resolve this problem by downgrading to an earlier version of k3s (v1.21.9+k3s1).

Below the commands I used :

to uninstall the version (latest) I was running, all persistent data is lost :slightly_frowning_face:

/usr/local/bin/k3s-uninstall.sh

to install the version which works perfect with the latest version of the operator 0.26.0 and AWX 21.4.0

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION={v1.21.9+k3s1} sh -

Hope this helps other.

Thanks for your time and help.

Regards,

(attachments)

cityscoot.gif