Any help here
I have seen other related #12297, have now set RECEPTOR_KUBE_SUPPORT_RECONNECT to off according to #13380 (comment)but seems not helping.
Hello @mogamal1 welcome to the Ansible Community Forum!
Could you kindly provide a bit more information about your issue? This would greatly assist us in pinpointing the root cause of the problem.
What error do you get on AWX?
Are you using the default control plane EE on AWX, or did you just build your own with ansible-builder? What ansible-core version does it run?
Knowing the AWX version may also help
Also, If you could share your role’s implementation, or at least part of it, that would be helpful
I’m guessing you did set-up the credentials and user elevation properly on the AWX Template Job, since I understand from your post that the kernel update task did finish successfully, is that right?
For debugging purposes, I recommend you running the role or playbook from the command line interface (CLI). This way, you can check if the issue is connected to your AWX setup. If not, it could be something else affecting the role’s execution… I’m sharing with you a little test I performed using ansible-navigator, see if that helps you on your debugging quest (feel free to use my EE if you need to, it’s publicly available on quay.io):
Playbook:
---
- name: Update Kernel and reboot RHEL 8.6
hosts: all
gather_facts: false
remote_user: root
tasks:
- name: Install new RHEL Kernel
ansible.builtin.dnf:
enablerepo:
- rhel-8-for-x86_64-baseos-rpms
name:
- kernel.x86_64
update_only: true
notify: Reboot RHEL Host
handlers:
- name: Reboot RHEL Host
ansible.builtin.reboot:
pre_reboot_delay: 10
msg: "System will be rebooting in 10 seconds..."
reboot_command: reboot
...
PS: Just a heads up—I’m planning to run the same playbook from my AWX host in the meantime. However, it might take a bit longer because my lab server is currently undergoing maintenance. So, I’ll get to it as soon as I can.
This issue seems happening on any task that takes more than 10 minutes to execute, even a long running dnf install ... (intentionally takes more than 10 minutes to install).
AWX version: 23.0.0
Ansible-core version: 2.14.9
AWX task POD output:
2023-12-13 16:48:30,374 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 waiting
2023-12-13 16:48:31,288 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 pre run
2023-12-13 16:48:31,452 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 notifications sent
2023-12-13 16:54:59,484 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 preparing playbook
2023-12-13 16:55:02,456 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 running playbook
2023-12-13 16:55:11,892 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 work unit id received
2023-12-13 16:55:11,993 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 work unit id assigned
2023-12-13 17:04:07,233 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 7790
2023-12-13 17:04:07,276 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 post run
2023-12-13 17:04:12,877 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 finalize run
2023-12-13 17:04:13,173 INFO [08f4137fcb424101ae2f2037d88b5c57] awx.analytics.job_lifecycle job-7790 notifications sent
2023-12-13 17:04:13,849 WARNING [08f4137fcb424101ae2f2037d88b5c57] awx.main.dispatch job 7790 (error) encountered an error (rc=None), please see task stdout for details.
Thank for the proposal, setting K8S Ansible Runner Keep-Alive Message Interval to 180 seems solved this issue for any task that takes 10 minute to execute.
We do have another error sample where the exact same rows printed out on the task POD log not after nearly 9 minutes, rather after 20 sec:
2024-01-05 20:10:45,652 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 waiting
2024-01-05 20:10:46,063 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 pre run
2024-01-05 20:10:46,213 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 notifications sent
2024-01-05 20:16:56,678 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 preparing playbook
2024-01-05 20:16:58,622 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 running playbook
2024-01-05 20:17:25,924 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 work unit id received
2024-01-05 20:17:26,009 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 work unit id assigned
2024-01-05 20:17:45,313 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 7972
2024-01-05 20:17:45,424 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 post run
2024-01-05 20:17:50,787 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 finalize run
2024-01-05 20:17:50,945 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 notifications sent
2024-01-05 20:17:52,009 WARNING [a5d9366112384c909ab58a74d02f6ac6] awx.main.dispatch job 7972 (error) encountered an error (rc=None), please see task stdout for details.
2024-01-05 20:17:26,009 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.analytics.job_lifecycle job-7972 work unit id assigned
2024-01-05 20:17:45,313 INFO [a5d9366112384c909ab58a74d02f6ac6] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 7972
Do you know any other case
awx.main.commands.run_callback_receiver Starting EOF event processing for Job 7972
hey @yhzs8 were you able to resolve the initial issue that you ran into? If this new error is separate from the original one, can you create a new thread so that we don’t get wires crossed etc?