We’ve observed intermittent instances where the first step of the Ansible playbook, which gathers facts, hangs for certain VMs. Excerpts of the playbook has been given below and from this, the debug statement that prints all the environment variables passed to the playbook doesn’t get printed ie msg: “{{ env }}”. After killing the hung processes and retrying the operation, it goes through successfully.
Any pointers for debugging this issue would be appreciated, as it is causing serious concerns with the predictability of operations. Thank you inadvance.
This is one of the funDF2014:Fun - Dwarf Fortress Wiki issues that often appear with fact gathering. Most of the time this has to do with an issue with some hardware device, probably a mount or network issue.
To debug fact gathering i would suggest using gather_subset to eliminate possible culprits, also set a gather_timeout (which is for mount facts). This should narrow down possible culprits.
Ohter options are to strace/dtrace the process on the target machine while it is stuck, see what it is stuck on and try to derive the problem source from there.
Thank you Jorn for taking the time to respond to this. Yes, {{ env }} is not related to Ansible facts, but the issue I was highlighting is that the control never reached that point because it got stuck during the gathering facts step.
Another observation was that I could only see the following statement in /var/log/messages, and after that, nothing prints from the gather_facts task or any other shell commands written in the playbook.
ansible-setup: Invoked with gather_timeout=10 gather_subset=[‘all’] filter=* fact_path=/etc/ansible/facts.d