AWX job terminated - task timeout?

Hello,
I’m having an issue with a job template on AWX that terminated unexpectedly.
AKS version: 1.23.12
AWX version: 21.6.0

The automation job pod terminated on aks while waiting for a task execution without any errors on awx after 5 minutes.
I can reproduce the issue each time I run a playbook with a task that has a long execution (about 5 minutes… not so long :frowning: )

There is a timeout for task execution? Why the pod terminated?

And on AWX I see error:

Any suggests?

Thank you very much.

Elia

I’ve replicated the issue with a simple task on ansible:

- name: Sleep for seconds and continue with play
wait_for:
timeout: 600

After 5 minutes the pod terminated and automation failed.
Is there any setting to increase this timeout?

Thank you very much for your help.

Elia

this is a known issue in AKS and has been reported on AWX github issues, see this comment that describes the problem https://github.com/ansible/awx/issues/12530#issuecomment-1279364075

This PR may help https://github.com/ansible/receptor/pull/683

the latest awx-ee image already has this fix, but you’ll need to be on a k8s version specified here https://github.com/ansible/receptor/pull/683/files#diff-792611baeb730234abface9c9bc33204ea86453469e7fcdfbe8de8d4e04f2598R659

in order for the fix to take affect.

Please try it and let us know if it fixes the 5 minute idle timeout issue you are experiencing

AWX Team

Thanks for reply.
Meanwhile I’ve found and tested I workaround to fix this AKS issue with AWX: https://github.com/ansible/awx/issues/12530#issuecomment-1192616101

Unfortunaly my AKS cluster is in 1.23.12 and from azure portal is not possible to upgrade it to 1.23.14 or other versions specified.

So I can’t test it right now… but I hope it will fixe the issue. As soon as I can update the cluster to the required versions I’ll give you a feedback.

Elia