Issues with connecting to a host

I’m having a weird issue with only one of my hosts that I’m trying to connect with. I initially had this issue with Ansible 1.3, but I decided to test it using the dev trunk (ansible 1.5 (devel dc41912158) last updated 2013/12/12 17:55:33 (GMT -500)) and it still doesn’t work, so I figured I’d reach out to the group.

$ ssh ddecker@host
Last login: Thu Dec 12 17:42:37 2013
[ddecker@host ~]$

As you can see above, I can SSH into it using SSH keys with no issues. However, no matter what command I run with Ansible, it just hangs and I have to CTL+C it to get it to come back. When I do, Python spits out some stuff:

$ ansible host -u ddecker -s -m raw -a “ls”
Traceback (most recent call last):
File “/home/ddecker/ansible/bin/ansible”, line 157, in
(runner, results) = cli.run(options, args)
File “/home/ddecker/ansible/bin/ansible”, line 131, in run
results = runner.run()
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 992, in run
results = [ self._executor(h, None) for h in hosts ]
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 394, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 485, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 685, in _executor_internal_inner
result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/action_plugins/raw.py”, line 47, in run
result=self.runner._low_level_exec_command(conn, module_args, tmp, sudoable=True, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 771, in _low_level_exec_command
rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user, sudoable=sudoable, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/connection_plugins/ssh.py”, line 221, in exec_command
rfd, wfd, efd = select.select(rpipes, , rpipes, 1)
KeyboardInterrupt

Not sure why this is happening - could someone give me some sort of idea on what else I can do to help provide some sort of fix to this issue?

Thanks,
Drew

try running without -s (that forces sudo and might be the issue here).

also -vvv is helpful when debugging.

I don’t get much with “-vvv” and I’ve also removed “-s”:

It still hangs and this is the output (after several minutes of waiting for it to do something):

$ ansible host -u ddecker -m raw -a “ls” -vvvv
ESTABLISH CONNECTION FOR USER: ddecker
EXEC [‘ssh’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/home/ddecker/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘host’, ‘ls’]

Traceback (most recent call last):
File “/home/ddecker/ansible/bin/ansible”, line 157, in
(runner, results) = cli.run(options, args)
File “/home/ddecker/ansible/bin/ansible”, line 131, in run
results = runner.run()
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 992, in run
results = [ self._executor(h, None) for h in hosts ]
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 394, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 485, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 685, in _executor_internal_inner
result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/action_plugins/raw.py”, line 47, in run
result=self.runner._low_level_exec_command(conn, module_args, tmp, sudoable=True, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 771, in _low_level_exec_command
rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user, sudoable=sudoable, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/connection_plugins/ssh.py”, line 221, in exec_command
rfd, wfd, efd = select.select(rpipes, , rpipes, 1)
KeyboardInterrupt

So, it appears that I’ve figured it out - but don’t really understand why exactly. As you can see from my previous threads that I was able to SSH into the host, but Ansible just wouldn’t return anything for a random host, correct? Another thing I noticed was it was also a bit slow. In our environment, we have about 4 main domains, for different locations. In the /etc/resolv.conf file, we have search domains defined so we can use the short hostnames that are in DNS. This doesn’t appear to have any issues whatsoever, except sometimes in Ansible; not sure why.

What I ended up doing was adding the FQDN into the /etc/ansible/hosts file to each host, and when I did that I got a HUGE performance increase and I also no longer run into this problem.

What was in the hosts file prior to the FQDNs ?

Just to short hostnames.

Prior to FQDN:

host1
host2
host3

(resolv.conf had the search domains) - regular SSHing into machines worked, but ansible was either slow or never returned a value on “-m ping”.

After adding FQDN:

host1.domain.com
host2.domain.com
host3.domain.com

(resolv.conf still has the search domains, but aren’t needed for Ansible now because I’m using the FQDN) - huge performance increase and no more failures.

Sounds like you have sketchy DNS then?

Might want to look into that…

Having the search domains and nameservers in resolv.conf in an optimal order should also help in this case.

We do not - because like I said - regular SSH is working. I can do anything with the regular hostnames with anything such as regular hostnames, applications, SNMP, backups, etc. I think it might be an issue with something either in Ansible (i doubt its Ansible directly) or (more than likely) possibly a python dependency such as paramiko (since it appears that that’s the module that is actually performing the SSH. When I can run all system servers without the FQDN but I have weird issues with a single application, then I’m more than likely going to blame that single applications, as I would be having a lot more issues across the entire environment.

Yup this has been done - we have the Ansible “Control Node”’s search domain first, and then the other environments next, however, it doesn’t get any more configurable than than unfortunately.

Are the speeds between -c paramiko and -c ssh similar or is there a drastic difference (shortnames in hosts file) ?

Oh wow - yup there’s a huge difference. When I type “-c ssh” its REALLY fast, and I get no errors at all.

So this is really seems to be a paramiko DNS performance issue in your case.

I always add “usedns no” to my sshd_config for this reason.

Brian Coca

I also set
KerberosAuthentication no
GSSAPIAuthentication no

Both have a tendency to use DNS and slow things down.