Ansible 1.9 dev 34ec877fe3 hangs on gather facts phase

Hi all,

Ansible seems to hang on the gather facts phase.

Some details:

- The problem seems not related to password prompts and it was not
hanging this way until now anyway.
- The behavior seems 100% reproducible on LAN or localhost hosts but
*not* on WAN ones.
- LAN hosts are VMs running on SmartOS; WAN are basically the same, but
running on bare metal or other virtualized environments.
- I’m using the devel versione because I encountered the problem with
ansible 1.7 and tried to solve it by updating it.

Here’s an excerpt of the verbose output:

<dev-vm01.local> ESTABLISH CONNECTION FOR USER: root
<dev-vm01.local> REMOTE_MODULE setup
<dev-vm01.local> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/Users/giorgio/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 dev-vm01.local /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431 && echo $HOME/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431'
<dev-vm01.local> PUT /var/folders/yq/ydckqkv92jz4dhhlvd1ry5yr0000gn/T/tmpya9Vyg TO /root/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431/setup
<dev-vm01.local> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/Users/giorgio/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 dev-vm01.local /bin/sh -c 'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /opt/local/bin/python /root/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431/setup; rm -rf /root/.ansible/tmp/ansible-tmp-1417279179.55-12830134400431/ >/dev/null 2>&1'

Any help would be really appreciated.

I think the easiest thing to do here would to do a checkout on the machine you are managing and then:

source ./hacking/env-setup
./hacking/test-module -m setup

And see if that hangs. If that does, it’s most likely something in the setup module, rather than a connection issue - which I would say with a large degree of confidence is the case.

From there, we can help with some suggestions about how to figure out what part is taking so long to return – it might not be hanging but just taking a very long time, but we’ll see.

Michael DeHaan <michael@ansible.com> writes:

I think the easiest thing to do here would to do a checkout on the machine
you are managing and then:

source ./hacking/env-setup
./hacking/test-module -m setup

And see if that hangs. If that does, it's most likely something in the
setup module, rather than a connection issue - which I would say with a
large degree of confidence is the case.

From there, we can help with some suggestions about how to figure out what
part is taking so long to return -- it might not be hanging but just taking
a very long time, but we'll see.

You’re right it does not hang.

I retried from my control machine and the gathering facts phase does
complete after a very long time, a minute or so.

How can I collect additional info in order help you to track down the
issue?

Giorgio Valoti <giorgio_v@me.com> writes:

Michael DeHaan <michael@ansible.com> writes:

I think the easiest thing to do here would to do a checkout on the machine
you are managing and then:

source ./hacking/env-setup
./hacking/test-module -m setup

And see if that hangs. If that does, it's most likely something in the
setup module, rather than a connection issue - which I would say with a
large degree of confidence is the case.

From there, we can help with some suggestions about how to figure out what
part is taking so long to return -- it might not be hanging but just taking
a very long time, but we'll see.

You’re right it does not hang.

I retried from my control machine and the gathering facts phase does
complete after a very long time, a minute or so.

How can I collect additional info in order help you to track down the
issue?

I’ve tried to gather facts with the setup module from my machine with a
SmartOS zone and I get back the facts, but some are wrong and I am quite
sure that this was *not* the case in the past.

Here are the most obvious errors:

ansible vm-01.local -i hosts/dev -u root -m setup

vm-01.local | success >> {
    "ansible_facts": {
        "ansible_distribution": "NA",
        "ansible_distribution_major_version": "NA",
        "ansible_distribution_release": "NA",
        "ansible_distribution_version": "NA",
        "ansible_os_family": "NA",
        "ansible_pkg_mgr": "macports",
    },
    "changed": false
}

Thanks for info – how long does this operation take?

Curious if you can debug to find individual methods, if not, that’s fine – maybe another SmartOS user could help.

Michael DeHaan <michael@ansible.com> writes:

Thanks for info -- how long does this operation take?

A simple ssh command as a baseline:

time ssh root@dev-vm01.local ls -a

Warning: Permanently added 'dev-vm01.local,192.168.38.147' (RSA) to the list of known hosts.
.
..
.ansible
.bash_profile
.bashrc
.cshrc
.irbrc
.login
.profile
.ssh
.vimrc
ssh root@dev-vm01.local ls -a 0,01s user 0,00s system 0% cpu 1,235 total

And with ansible:

time ansible dev-vm01.local -i hosts/localhost-dev -u root -m setup

<lots of output...>

ansible dev-vm01.local -i hosts/localhost-dev -u root -m setup 0,18s user 0,17s system 0% cpu 2:05,37 total

Hope this helps