Specific command results in 'unreachable' error

Hi everyone!

I am creating rolesfor automatic patching of machines. As part of the patching, we also clean up old kernels after a reboot. However, the command that I run to do that doesn’t work, and consistently results in a “unreachable” error.

Here is the role playbook below:

---
- name: Reboot machine after updates have taken place
  ansible.builtin.reboot:

- name: Determine active kernel
  ansible.builtin.command:
    cmd: "uname -r"
  changed_when: false
  register: cleanup_kernels_active_kernel

- name: Determine newest available kernel
  ansible.builtin.shell:
    cmd: "set -o pipefail && dnf search --showduplicates kernel-core | tail -n 1"
  changed_when: false
  register: cleanup_kernels_newest_kernel

- name: Check if the current kernel is the newest kernel
  ansible.builtin.fail:
  when: cleanup_kernels_active_kernel.stdout not in cleanup_kernels_newest_kernel.stdout

- name: Remove old kernel and module files
  ansible.builtin.command:
    cmd: dnf remove --oldinstallonly --setopt installonly_limit=2 kernel
  changed_when: false

All jobs run normally, except the final one:

TASK [company.update_roles.cleanup_kernels : Reboot machine after updates have taken place] ***
changed: [devweb1001]
changed: [devapp1001]
TASK [company.update_roles.cleanup_kernels : Determine active kernel] ***********
ok: [devweb1001]
ok: [devapp1001]
TASK [company.update_roles.cleanup_kernels : Determine newest available kernel] ***
ok: [devapp1001]
ok: [devweb1001]
TASK [company.update_roles.cleanup_kernels : Check if the current kernel is the newest kernel] ***
skipping: [devweb1001]
skipping: [devapp1001]
TASK [company.update_roles.cleanup_kernels : Remove old kernel and module files] ***
fatal: [devapp1001]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to 10.0.0.19 closed.", "unreachable": true}
fatal: [devweb1001]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to 10.0.0.18 closed.", "unreachable": true}

I have investigated the issue and have some observations:

  1. The other jobs succeed and the firewall shows no blocked traffic between the controller and the host, ruling out connection issues.
  2. Manually running the command works fine and rerunning the playbook works up until the final step, ruling out the possibility of a lockout.
  3. Replacing the command with a simple echo command works, so the command itself is tied to the issue.
  4. Running the command using the shell module results in the same behaviour, so the command module specifically is not at fault either.
  5. Running this against other machines also results in the same behaviour. All machines are RHEL9 with most of the CIS-recommended hardening implemented.
  6. Running with the shell module results in a python script generated by Ansible being left behind on the host in /home/user/.ansible/tmp/ansible-tmp-<long-number>/AnsiballZ_command.py. Running this script manually hangs the shell. I suspect that that may be (part of) the cause, but I might also be executing it incorrectly.

As for our environment; we run ansible in a container based on alpine/ansible:2.20.0

Does anyone have an inkling as to what may be causing these issues? I am happy to provide more information, but at this point I just don’t know what else to look for.

Is it waiting for user confirmation? The command doesn’t include --assumeyes/-y.

Man, that was a rookie mistake on my part. Thank you for catching it!