Issues with connecting to a host

Drew_Decker · December 13, 2013, 12:04am

I’m having a weird issue with only one of my hosts that I’m trying to connect with. I initially had this issue with Ansible 1.3, but I decided to test it using the dev trunk (ansible 1.5 (devel dc41912158) last updated 2013/12/12 17:55:33 (GMT -500)) and it still doesn’t work, so I figured I’d reach out to the group.

$ ssh ddecker@host
Last login: Thu Dec 12 17:42:37 2013
[ddecker@host ~]$

As you can see above, I can SSH into it using SSH keys with no issues. However, no matter what command I run with Ansible, it just hangs and I have to CTL+C it to get it to come back. When I do, Python spits out some stuff:

$ ansible host -u ddecker -s -m raw -a “ls”
Traceback (most recent call last):
File “/home/ddecker/ansible/bin/ansible”, line 157, in
(runner, results) = cli.run(options, args)
File “/home/ddecker/ansible/bin/ansible”, line 131, in run
results = runner.run()
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 992, in run
results = [ self._executor(h, None) for h in hosts ]
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 394, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 485, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 685, in _executor_internal_inner
result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/action_plugins/raw.py”, line 47, in run
result=self.runner._low_level_exec_command(conn, module_args, tmp, sudoable=True, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 771, in _low_level_exec_command
rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user, sudoable=sudoable, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/connection_plugins/ssh.py”, line 221, in exec_command
rfd, wfd, efd = select.select(rpipes, , rpipes, 1)
KeyboardInterrupt

Not sure why this is happening - could someone give me some sort of idea on what else I can do to help provide some sort of fix to this issue?

Thanks,
Drew

Brian_Coca1 · December 13, 2013, 2:18am

try running without -s (that forces sudo and might be the issue here).

Brian_Coca1 · December 13, 2013, 2:18am

also -vvv is helpful when debugging.

Drew_Decker · December 13, 2013, 3:04am

I don’t get much with “-vvv” and I’ve also removed “-s”:

It still hangs and this is the output (after several minutes of waiting for it to do something):

$ ansible host -u ddecker -m raw -a “ls” -vvvv
ESTABLISH CONNECTION FOR USER: ddecker
EXEC [‘ssh’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/home/ddecker/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘host’, ‘ls’]

Traceback (most recent call last):
File “/home/ddecker/ansible/bin/ansible”, line 157, in
(runner, results) = cli.run(options, args)
File “/home/ddecker/ansible/bin/ansible”, line 131, in run
results = runner.run()
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 992, in run
results = [ self._executor(h, None) for h in hosts ]
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 394, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 485, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 685, in _executor_internal_inner
result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
File “/home/ddecker/ansible/lib/ansible/runner/action_plugins/raw.py”, line 47, in run
result=self.runner._low_level_exec_command(conn, module_args, tmp, sudoable=True, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/init.py”, line 771, in _low_level_exec_command
rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user, sudoable=sudoable, executable=executable)
File “/home/ddecker/ansible/lib/ansible/runner/connection_plugins/ssh.py”, line 221, in exec_command
rfd, wfd, efd = select.select(rpipes, , rpipes, 1)
KeyboardInterrupt

Drew_Decker · December 13, 2013, 3:38pm

So, it appears that I’ve figured it out - but don’t really understand why exactly. As you can see from my previous threads that I was able to SSH into the host, but Ansible just wouldn’t return anything for a random host, correct? Another thing I noticed was it was also a bit slow. In our environment, we have about 4 main domains, for different locations. In the /etc/resolv.conf file, we have search domains defined so we can use the short hostnames that are in DNS. This doesn’t appear to have any issues whatsoever, except sometimes in Ansible; not sure why.

What I ended up doing was adding the FQDN into the /etc/ansible/hosts file to each host, and when I did that I got a HUGE performance increase and I also no longer run into this problem.

tannerjc · December 13, 2013, 3:41pm

What was in the hosts file prior to the FQDNs ?

Drew_Decker · December 13, 2013, 3:44pm

Just to short hostnames.

Prior to FQDN:

host1
host2
host3

(resolv.conf had the search domains) - regular SSHing into machines worked, but ansible was either slow or never returned a value on “-m ping”.

After adding FQDN:

host1.domain.com
host2.domain.com
host3.domain.com

(resolv.conf still has the search domains, but aren’t needed for Ansible now because I’m using the FQDN) - huge performance increase and no more failures.

Michael_DeHaan2 · December 13, 2013, 3:52pm

Sounds like you have sketchy DNS then?

Might want to look into that…

tannerjc · December 13, 2013, 3:56pm

Having the search domains and nameservers in resolv.conf in an optimal order should also help in this case.

Drew_Decker · December 13, 2013, 3:56pm

We do not - because like I said - regular SSH is working. I can do anything with the regular hostnames with anything such as regular hostnames, applications, SNMP, backups, etc. I think it might be an issue with something either in Ansible (i doubt its Ansible directly) or (more than likely) possibly a python dependency such as paramiko (since it appears that that’s the module that is actually performing the SSH. When I can run all system servers without the FQDN but I have weird issues with a single application, then I’m more than likely going to blame that single applications, as I would be having a lot more issues across the entire environment.

Drew_Decker · December 13, 2013, 3:57pm

Yup this has been done - we have the Ansible “Control Node”’s search domain first, and then the other environments next, however, it doesn’t get any more configurable than than unfortunately.

tannerjc · December 13, 2013, 4:00pm

Are the speeds between -c paramiko and -c ssh similar or is there a drastic difference (shortnames in hosts file) ?

Drew_Decker · December 13, 2013, 4:11pm

Oh wow - yup there’s a huge difference. When I type “-c ssh” its REALLY fast, and I get no errors at all.

tannerjc · December 13, 2013, 4:23pm

So this is really seems to be a paramiko DNS performance issue in your case.

Brian_Coca1 · December 14, 2013, 7:03pm

I always add “usedns no” to my sshd_config for this reason.

Brian Coca

Lars_Hansson · December 15, 2013, 6:11pm

I also set
KerberosAuthentication no
GSSAPIAuthentication no

Both have a tendency to use DNS and slow things down.

Topic		Replies	Views
Hang issue when use ansible Ansible Project	1	7	March 19, 2014
ansible cmd run takes over 40s per server Ansible Project fedora	7	7	March 26, 2014
SSH connections to EC2 hang sporadically Ansible Project	4	7	November 17, 2014
SSH connection to host failed.. Ansible Project	3	22	May 14, 2017
SSH connection problem Ansible Project	10	163	May 21, 2014

Issues with connecting to a host

Related topics