I have a host where Ansible used to run correctly.
A while ago, all runs of Ansible on this host started to hang on Gathering Facts.
I’ve searched on the internet for a solution, and found some tips on how to debug this.
So I ran Ansible with ANSIBLE_KEEP_REMOTE_FILES=1 and used
python3 AnsiballZ_setup.py explode
to look into this further.
Running the module with strace did not yield any useful results, so I used the gather_subset option with !all,!min and enabled the collectors one by one.
Using this method for all collectors in the min set, I found three collectors which did not terminate: env, service_mgr, ssh_pub_keys
I don’t know how to proceed from here. My search on the internet suggests stale network mounts as the main reason for such behavior, but we don’t have any on the affected host.
I tried rebooting as well as upgrading all packages but the problem persists.
Is there a way to debug this further?
Simply turning the gathering of facts off is not an option, as we rely heavily on host dependent facts in some of our roles.
Since the affected hosts are physical devices, that are remotely managed, I can’t simply destroy them and spin up new ones.
Therefore, I would be very interested in finding (at least) the cause of this if not a solution, so we can avoid it in the future.
Do you have any ideas on how to further debug this?
Hi Sven,
I don’t have any ansible oriented solution for you, other than making the ansible output -vvvv very verbose, but would offer that you look for more traditional host based answers.
If it used to work, then when did it stop? What changes happened on the host around that time? package updates and config changes are prime candidates.
Look into your security tooling (selinux, apparmor, or even anti-virus tools), particularly their log files, to see if they interact with the ansible run, or those three services you mentioned.
Fall back to systems logs, and look for any other odd behavior. Since it’s a physical host, does it have ECC or Registered memory? A failing power supply can introduce odd errors, particularly on a stressed system.
Regards,