Out of hundreds of hosts we have one host always hangs when gathering facts when using become.
Create a playbook with the following:
hosts: all
call the playbook with the -b flag
The last thing reported in the log is “Escalation succeeded”.
No matter how long I wait it never returns the prompt and there are two processes running on the remote host. One as my username and one as root.
There is no nfs (I have seen other issues where facts hang on nfs).
This is a RHEL5 host with python 2.7 installed. 1/4 of our hosts are RHEL5 with python 2.7 installed and they don’t have this issue
Anyone see this before? Any ideas how to troubleshoot it further?
I have tried the highest verbosity but there doesn’t appear to be any helpful information; compared it to successful runs and nothing is different.
Out of hundreds of hosts we have one host always hangs when gathering facts
when using become.
Create a playbook with the following:
- hosts: all
call the playbook with the -b flag
The last thing reported in the log is "Escalation succeeded".
No matter how long I wait it never returns the prompt and there are two
processes running on the remote host. One as my username and one as root.
There is no nfs (I have seen other issues where facts hang on nfs).
I almost always IO related, mounts, LVM, filsystem...
Anyone see this before? Any ideas how to troubleshoot it further?
I have tried the highest verbosity but there doesn't appear to be any
helpful information; compared it to successful runs and nothing is
different.
Thank you. I did the strace and it shows that it is just repeating the same two lines over and over again.
select(7, [4 6], , [4 6], {1, 0}) = 0 (Timeout)
wait4(29548, 0x7fff6a145c84, WNOHANG, NULL) = 0
When I checked the details of the select I get the following:
lsof -p 10984 -ad 4,6
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python2.7 10984 root 4r FIFO 0,7 0t0 819132068 pipe
python2.7 10984 root 6r FIFO 0,7 0t0 819132069 pipe
What happens right before it goes into this loop is probably the interesting part and can identifies what it trying to access.
If not you probably need to add print statements in the python code to identifies where it hangs.