ansible -m setup takes extra 2 min on Fedora 20

the setup module (and any playbooks) takes an extra 2 min for a fedora 20 based vm. other ansible modules, at least command(date) and yum (upgrade all), take less than a second, on an up to date machine.

tried from os x 10.10 (homebrew) and ubuntu 14.04, both running ansible 1.7.2. heres, the -vvvv

ESTABLISH CONNECTION FOR USER: pixel
REMOTE_MODULE setup
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/Users/pixel/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘akaran’, “/bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1414929572.02-34091227529968 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1414929572.02-34091227529968 && echo $HOME/.ansible/tmp/ansible-tmp-1414929572.02-34091227529968’”]
PUT /var/folders/2v/w9x69ytx017ckmpz4hmjn5qh0000gn/T/tmpPQYuj0 TO /home/pixel/.ansible/tmp/ansible-tmp-1414929572.02-34091227529968/setup
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/Users/pixel/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘akaran’, u"/bin/sh -c ‘LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/pixel/.ansible/tmp/ansible-tmp-1414929572.02-34091227529968/setup; rm -rf /home/pixel/.ansible/tmp/ansible-tmp-1414929572.02-34091227529968/ >/dev/null 2>&1’"]
… facts …

Hmm, that’s quite curious and not something I’ve heard reported much.

There’s also not much logic to get stuck in a loop in there.

If you have Python skills, using “./hacking/test-module” from a checkout on that machine would allow inserting some debug that could help isolate why it may be taking longer, if it’s in fact the setup module that is spending the time.

Do you have a valid FQDN entry in /etc/hosts? Without that, the setup module slows down considerably.

“Do you have a valid FQDN entry in /etc/hosts? Without that, the setup module slows down considerably.”

Assume you’re referring to the one socket.gethostname() equivalent as a DNS hit?

I’m not sure it would be considerably… though interested in any timing info you might have.

(May imply DNS issues?)

Honestly I haven’t bothered digging into ansible source, but here’s my results on Fedora 20, same as the OP:

$ ansible --version
ansible 1.7.2

Without valid FQDN entry

$ time ansible all -i “localhost,” -m setup -c local
0.32s user 0.11s system 2% cpu 20.528 total

With valid FQDN entry

$ time ansible all -i “localhost,” -m setup -c local

0.26s user 0.08s system 83% cpu 0.399 total

I’d call a twenty second speed up considerable.

I get the same behavior with devel:

$ ansible --version
ansible 1.8 (devel e1662422bf) last updated 2014/11/03 12:22:34 (GMT -600)
lib/ansible/modules/core: (detached HEAD 7f611468a8) last updated 2014/10/24 11:15:28 (GMT -600)
lib/ansible/modules/extras: (detached HEAD a0df36c6ab) last updated 2014/10/24 11:15:31 (GMT -600)
v2/ansible/modules/core: (detached HEAD cb69744bce) last updated 2014/10/24 11:15:35 (GMT -600)
v2/ansible/modules/extras: (detached HEAD 8a4f07eecd) last updated 2014/10/24 11:15:38 (GMT -600)
configured module search path = None

Without valid FQDN entry

$ time ansible all -i “localhost,” -m setup -c local

0.37s user 0.14s system 2% cpu 20.614 total

Without valid FQDN entry

$ time python -c ‘import socket; socket.gethostname()’
python -c ‘import socket; socket.gethostname()’ 0.01s user 0.01s system 97% cpu 0.021 total

This last one implies that the socket.gethostname() is not to blame.

If someone can help isolate the slow methods on those systems, that would be great.

We are not currently seeing this.

I've been able to reproduce but not entirely consistently. If I have
only one line in my /etc/hosts file:

127.0.0.1 localhost

Running: time ansible all -i "localhost," -m setup -c local

takes anywhere from 0.9 seconds to 15 seconds.

Adding a second line to /etc/hosts:
127.0.0.1 roan.lan

brings runtime into the 0.3 to 0.7 second range. So I think there's
definitely something else going on in the system besides ansible
that's making this noticable.... Perhaps some non-parallelizable
segment of Fedora's DNS lookup mechanism...

I do note that having the second line always makes things faster (when
the system is in the 0.9 second range for the non-fqdn case, it's in
the 0.3 second range for the fqdn case; when it's in the 15 second
range for non-fqdn, it's in the 0.7 second range for fqdn.

I don't know if this is something that can be addressed within
ansible, if it's a problem that the OS needs to look into, or is just
considered a host configuration bug.

-Toshio

If it’s not something Fedora can resolve, we MAY have to remove this check on Fedora 20+ until they can, as we tend to get hammered for some issues in other things repeatedly at times.

OTOH, 20 seconds is slow, but not insurmountable terrible.

If we can pin it down to a python function (or better yet, a shell command), I think we should definitely file a bugzilla on it.

If I read between the lines correctly, the faster operation happens when
youe add that line to /etc/hosts on your Ansible machine, not all the
target machines.

If that's true, it appears an Ansible process running on the Ansible
machine is triggering a forward lookup of the hostname (i.e., trying to
obtain the IP address) as connections to target nodes are initiated. I'm
not familiar with Red Hat based distros, so I don't know if if could be an
ssh client config thing, or an ssh agent config thing, or a paramiko thing,
or something else.

  -Greg

Interesting - so it’s not about fact gathering at all.

Experimentation to see if regular SSH commands have issues with first connections would be welcome.

It might result in the need to file a ticket on the distro.

Correct but: we're only specifying localhost in the inventory so in
this particular example, it's also on "all target machines". Probably
would need to do more involved testing before drawing a conclusion of
whether this is happening due to the /etc/hosts on the managing
machine or the managed machine.

-Toshio

I'm still not sure and unfortunately, I'm still not able to reliably
reproduce this to narrow it down. On my system, at least, there's
definitely something non-ansible related that is contributing to this.

-Toshio

Circling around to where we started here, @pixelfairy, was Scott's
suggestion about FQDN helpful? Were you able to work around the issue
by adding one for the VM?

-Toshio