Hello,
I’m having an issue with a job template on AWX that terminated unexpectedly.
AKS version: 1.23.12
AWX version: 21.6.0
It’s a long automation but after 31m and 7000 tasks executed fails each time as you can see from pic.
No errors displayed in Stdout in AWX ui.
Running:
kubectl logs awx-c7647dbf5-2mt88 -n awx -c awx-task | grep error
I see this error:
2022-10-20 11:24:53,613 WARNING [a5fc6bd17e2146a883b741b0bd8a3bbd] awx.main.dispatch job 142359 (error) encountered an
error (rc=None), please see task stdout for details.
I’ve tried to get logs directly from automation pod:
kubectl logs automation-job-142359-nxxpb -n awx -f
The automation stops without errors displayed and automation pod destroyed from aks.
This is the last line before the crash:
{“uuid”: “5fce5d17-9f63-4705-bf05-adf9b4b2a73b”, “counter”: 26487, “stdout”: “”, “start_line”: 24454, “end_line”: 24454, “runner_ident”: “142359”, “event”: “runner_on_start”, “job_id”: 142359, “pid”: 21, “created”: “2022-10-20T11:19:39.658349”, “parent_uuid”: “02901256-661c-2a55-e472-000000001bc8”, “event_data”: {“playbook”: “log_query_for_all_cids_from_appservers.yml”, “playbook_uuid”: “ef3e2675-daf8-4ac4-84db-9b797d3ffc6e”, “play”: “log query on tse db for all customers”, “play_uuid”: “02901256-661c-2a55-e472-000000000066”, “play_pattern”: “res_alyante_srv_grp_prod”, “task”: “set facts”, “task_uuid”: “02901256-661c-2a55-e472-000000001bc8”, “task_action”: “set_fact”, “task_args”: “”, “task_path”: “/runner/project/roles/common/exec_query_with_logs_on_win_vm/tasks/log_query.yml:2”, “role”: “exec_query_with_logs_on_win_vm”, “host”: “aly110-srv”, “uuid”: “5fce5d17-9f63-4705-bf05-adf9b4b2a73b”}}
What could be the problem? Why the automation-job pod is destroyed unexpectedly? Is possible something like a timeout after about 31min?
Who runs the command for delete automation-job pod?
Thank you for your help.