Ansible adhoc command stuck on host pingable but not reachable on port 22

Hi

When I run adhoc command (via awx or command line) on host pingable but not reachable )via ssh) my job/ansible command stuck/hang

How can I avoid such behavior?

[root@d8dc57f649cd project]# ansible --version
ansible [core 2.15.8]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.18 (main, Sep  7 2023, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True
Steps to Reproduce
ansible -i client1, -a "uptime" all

Note that client1 should be pingable but not reachable via ssh

Expected Results
should answer immediately to adhoc command

Actual Results
ssh command stuck/hang
ssh command stuck/hang

root     3864278  0.0  0.0  11128  7476 pts/0    S+   14:52   0:00 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User="root" -o ConnectTimeout=600 -o ControlPath="/root/.ansible/cp/679112afb7" vmtesttat1cdint3552 /bin/sh -c 'echo ~root && sleep 0'

Thanks for your support

Your best bet to figure out what is going on is to strace the ansible process on the remote end, also lsof might help to see if it is stuck on accessing a file.

But reading this:

Note that client1 should be pingable but not reachable via ssh

Makes me think it is getting stuck on the firewall while trying to ssh. Ansible uses ssh by default, unless you give it another connection plugin/method of accessing the machine, Ansible won’t be able to function.

Hi Bcoca

My question is why ssh ConnectTimeout or ansible ssh timeout option does notwork ?
Thanks for your support

Those timeouts apply to some specific parts of the connection, not the total execution (see task timeout for that), so you can get to a point where those timeouts are not relevant anymore and still get stuck.

Hi

Nothing to highlight with strace except that process is stuck

root     3046686  0.0  0.0  50412  6304 pts/27   S+   18:03   0:00 ssh -T -C -o ControlMaster=auto -o ControlPersist=120s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User="root" -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/9c81620aa6 vmtesttat1cdbin9944 /bin/sh -c 'echo PLATFORM; uname; echo FOUND; command -v '"'"'/usr/bin/python'"'"'; command -v '"'"'python3.7'"'"'; command -v '"'"'python3.6'"'"'; command -v '"'"'python3.5'"'"'; command -v '"'"'python2.7'"'"'; command -v '"'"'python2.6'"'"'; command -v '"'"'/usr/libexec/platform-python'"'"'; command -v '"'"'/usr/bin/python3'"'"'; command -v '"'"'python'"'"'; echo ENDFOUND && sleep 0'
[root@vmgobemouche playbooks]# 
[root@vmgobemouche playbooks]# strace -v -p 3046686
strace: Process 3046686 attached
select(4, [3], NULL, NULL, NULL^Cstrace: Process 3046686 detached
 <detached ...>

[root@vmgobemouche playbooks]# strace -vvv -p 3046686
strace: Process 3046686 attached
select(4, [3], NULL, NULL, NULL^Cstrace: Process 3046686 detached
 <detached ...>

[playbooks]# strace -v -ff -p 3046686
strace: Process 3046686 attached
select(4, [3], NULL, NULL, NULL

thanks for your support

seems it is stuck waiting for input, which could be a firewall issue or a resources one. Since you already said ssh is not allowed, I’m going to guess its a firewall issue.

Hi

no firewall it seems that host is simply not reachable via ssh but only pingable

from time to time ansible command stuck on ssh

If I reboot the server issue disappear

[ playbooks]# ansible -i vmtesttat1cdbin9944, -a “rpm -q falcon-sensor” all -v
Using /playbooks/ansible.cfg as config file
[WARNING]: Unhandled error in Python interpreter discovery for host vmtesttat1cdbin9944: Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host
vmtesttat1cdbin9944 | UNREACHABLE! => {
“changed”: false,
“msg”: “Data could not be sent to remote host "vmtesttat1cdbin9944". Make sure this host can be reached over ssh: kex_exchange_identification: Connection closed by remote host\r\n”,
“unreachable”: true
}
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a “rpm -q falcon-sensor” all -v
Using /playbooks/ansible.cfg as config file
[WARNING]: Unhandled error in Python interpreter discovery for host vmtesttat1cdbin9944: Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host
vmtesttat1cdbin9944 | UNREACHABLE! => {
“changed”: false,
“msg”: “Data could not be sent to remote host "vmtesttat1cdbin9944". Make sure this host can be reached over ssh: kex_exchange_identification: Connection closed by remote host\r\n”,
“unreachable”: true
}
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a “rpm -q falcon-sensor” all -v
Using /playbooks/ansible.cfg as config file
[WARNING]: Unhandled error in Python interpreter discovery for host vmtesttat1cdbin9944: Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host
vmtesttat1cdbin9944 | UNREACHABLE! => {
“changed”: false,
“msg”: “Data could not be sent to remote host "vmtesttat1cdbin9944". Make sure this host can be reached over ssh: kex_exchange_identification: Connection closed by remote host\r\n”,
“unreachable”: true
}
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a “rpm -q falcon-sensor” all -v
Using /playbooks/ansible.cfg as config file

if it is not reachable by ssh, i don’t know how you expect Ansible to work at all.

1 Like

This is not an Ansible problem. You’re literally being denied access by the remote host. Either the firewall is blocking you, or PAM is. Instead of testing with ansible ad-hoc commands, test with ssh.

Hi

What I am waiting is that Ansible timeout and do no stay hang

Thanks for your help

set a task timeout using the playbook keyword timeout or configure a default task timeout

Sorry but you do not understand my issue (may be not enough explained from my side)

Issue o during first connection to the server so task not yet proceed

As already said server is pingable so ansible try to connect but as ssh not available ansible adhoc command stay hang/stuck

I am wondering if ansible can manage such case and timeout to avoid hanging all the command

Thanks for your help

As I said before, this is not really an Ansible issue. Just because you can ping the host over ICMP, that doesn’t mean you can establish a tcp connection to port 22 on that host. Something is denying your ssh connection. If you can prove that you can ssh <username>@vmtesttat1cdbin9944 successfully from where you’re trying to run Ansible, then we can at least rule out networking/firewall issues.

Did you try it? Apologies if you already looked into it, but you didn’t acknowledge it when responding to Ansible adhoc command stuck on host pingable but not reachable on port 22 - #4 by bcoca.

Testing locally, setting the timeout at the play level does appear to apply to the implicit gather_facts tasks too, but it’s possible it’s a bug in the version you’re using…

(obviously, if you’re testing via adhoc there is no playbook, so you have to use the second link I shared to configure it)

(and to be clear, it won’t “make it work”, it will just prevent the hang since you said “What I am waiting is that Ansible timeout and do no stay hang”)

I think @birb is saying that they expect some hosts to fail the SSH connection from time to time, but they don’t want that failure to cause such a long delay in the execution of the rest of the playbook or workflow.

1 Like

That is what the task timeout can do (big hammer), but not the connection timeout.

I know this might be counter intuitive, but connections have many phases and each with it’s own timeout, the timeout in the connection plugins normally maps to ‘how long a does each interaction/response take w/o a ping/alive’ … but it is stuck long before that applies in ‘initial connection waiting for response’.

The ssh client configuration has dozens of timeouts, one of them applies to this case, but not the ‘connection_timeout’.

You probably want to set ConnectTimeout to not match the TCP timeout, which sometimes does not trigger due to how the firewall prevents ssh access.

:thinking: Well don’t mind me then…

1 Like