Hi all,
Ansible seems to hang on the gather facts phase.
Some details:
- The problem seems not related to password prompts and it was not
hanging this way until now anyway.
- The behavior seems 100% reproducible on LAN or localhost hosts but
*not* on WAN ones.
- LAN hosts are VMs running on SmartOS; WAN are basically the same, but
running on bare metal or other virtualized environments.
- I’m using the devel versione because I encountered the problem with
ansible 1.7 and tried to solve it by updating it.
Here’s an excerpt of the verbose output:
<dev-vm01.local> ESTABLISH CONNECTION FOR USER: root
<dev-vm01.local> REMOTE_MODULE setup
<dev-vm01.local> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/Users/giorgio/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 dev-vm01.local /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431 && echo $HOME/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431'
<dev-vm01.local> PUT /var/folders/yq/ydckqkv92jz4dhhlvd1ry5yr0000gn/T/tmpya9Vyg TO /root/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431/setup
<dev-vm01.local> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/Users/giorgio/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 dev-vm01.local /bin/sh -c 'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /opt/local/bin/python /root/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431/setup; rm -rf /root/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431/ >/dev/null 2>&1'
Any help would be really appreciated.
I think the easiest thing to do here would to do a checkout on the machine you are managing and then:
source ./hacking/env-setup
./hacking/test-module -m setup
And see if that hangs. If that does, it’s most likely something in the setup module, rather than a connection issue - which I would say with a large degree of confidence is the case.
From there, we can help with some suggestions about how to figure out what part is taking so long to return – it might not be hanging but just taking a very long time, but we’ll see.
Michael DeHaan <michael@ansible.com> writes:
I think the easiest thing to do here would to do a checkout on the machine
you are managing and then:
source ./hacking/env-setup
./hacking/test-module -m setup
And see if that hangs. If that does, it's most likely something in the
setup module, rather than a connection issue - which I would say with a
large degree of confidence is the case.
From there, we can help with some suggestions about how to figure out what
part is taking so long to return -- it might not be hanging but just taking
a very long time, but we'll see.
You’re right it does not hang.
I retried from my control machine and the gathering facts phase does
complete after a very long time, a minute or so.
How can I collect additional info in order help you to track down the
issue?
Giorgio Valoti <giorgio_v@me.com> writes:
Michael DeHaan <michael@ansible.com> writes:
I think the easiest thing to do here would to do a checkout on the machine
you are managing and then:
source ./hacking/env-setup
./hacking/test-module -m setup
And see if that hangs. If that does, it's most likely something in the
setup module, rather than a connection issue - which I would say with a
large degree of confidence is the case.
From there, we can help with some suggestions about how to figure out what
part is taking so long to return -- it might not be hanging but just taking
a very long time, but we'll see.
You’re right it does not hang.
I retried from my control machine and the gathering facts phase does
complete after a very long time, a minute or so.
How can I collect additional info in order help you to track down the
issue?
I’ve tried to gather facts with the setup module from my machine with a
SmartOS zone and I get back the facts, but some are wrong and I am quite
sure that this was *not* the case in the past.
Here are the most obvious errors:
ansible vm-01.local -i hosts/dev -u root -m setup
vm-01.local | success >> {
"ansible_facts": {
"ansible_distribution": "NA",
"ansible_distribution_major_version": "NA",
"ansible_distribution_release": "NA",
"ansible_distribution_version": "NA",
"ansible_os_family": "NA",
"ansible_pkg_mgr": "macports",
},
"changed": false
}
Thanks for info – how long does this operation take?
Curious if you can debug to find individual methods, if not, that’s fine – maybe another SmartOS user could help.
Michael DeHaan <michael@ansible.com> writes:
Thanks for info -- how long does this operation take?
A simple ssh command as a baseline:
time ssh root@dev-vm01.local ls -a
Warning: Permanently added 'dev-vm01.local,192.168.38.147' (RSA) to the list of known hosts.
.
..
.ansible
.bash_profile
.bashrc
.cshrc
.irbrc
.login
.profile
.ssh
.vimrc
ssh root@dev-vm01.local ls -a 0,01s user 0,00s system 0% cpu 1,235 total
And with ansible:
time ansible dev-vm01.local -i hosts/localhost-dev -u root -m setup
<lots of output...>
ansible dev-vm01.local -i hosts/localhost-dev -u root -m setup 0,18s user 0,17s system 0% cpu 2:05,37 total
Hope this helps