Your best bet to figure out what is going on is to strace the ansible process on the remote end, also lsof might help to see if it is stuck on accessing a file.
But reading this:
Note that client1 should be pingable but not reachable via ssh
Makes me think it is getting stuck on the firewall while trying to ssh. Ansible uses ssh by default, unless you give it another connection plugin/method of accessing the machine, Ansible wonāt be able to function.
Those timeouts apply to some specific parts of the connection, not the total execution (see task timeout for that), so you can get to a point where those timeouts are not relevant anymore and still get stuck.
seems it is stuck waiting for input, which could be a firewall issue or a resources one. Since you already said ssh is not allowed, Iām going to guess its a firewall issue.
no firewall it seems that host is simply not reachable via ssh but only pingable
from time to time ansible command stuck on ssh
If I reboot the server issue disappear
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a ārpm -q falcon-sensorā all -v
Using /playbooks/ansible.cfg as config file
[WARNING]: Unhandled error in Python interpreter discovery for host vmtesttat1cdbin9944: Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host
vmtesttat1cdbin9944 | UNREACHABLE! => {
āchangedā: false,
āmsgā: āData could not be sent to remote host "vmtesttat1cdbin9944". Make sure this host can be reached over ssh: kex_exchange_identification: Connection closed by remote host\r\nā,
āunreachableā: true
}
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a ārpm -q falcon-sensorā all -v
Using /playbooks/ansible.cfg as config file
[WARNING]: Unhandled error in Python interpreter discovery for host vmtesttat1cdbin9944: Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host
vmtesttat1cdbin9944 | UNREACHABLE! => {
āchangedā: false,
āmsgā: āData could not be sent to remote host "vmtesttat1cdbin9944". Make sure this host can be reached over ssh: kex_exchange_identification: Connection closed by remote host\r\nā,
āunreachableā: true
}
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a ārpm -q falcon-sensorā all -v
Using /playbooks/ansible.cfg as config file
[WARNING]: Unhandled error in Python interpreter discovery for host vmtesttat1cdbin9944: Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host
vmtesttat1cdbin9944 | UNREACHABLE! => {
āchangedā: false,
āmsgā: āData could not be sent to remote host "vmtesttat1cdbin9944". Make sure this host can be reached over ssh: kex_exchange_identification: Connection closed by remote host\r\nā,
āunreachableā: true
}
[ playbooks]# ansible -i vmtesttat1cdbin9944, -a ārpm -q falcon-sensorā all -v
Using /playbooks/ansible.cfg as config file
This is not an Ansible problem. Youāre literally being denied access by the remote host. Either the firewall is blocking you, or PAM is. Instead of testing with ansible ad-hoc commands, test with ssh.
As I said before, this is not really an Ansible issue. Just because you can ping the host over ICMP, that doesnāt mean you can establish a tcp connection to port 22 on that host. Something is denying your ssh connection. If you can prove that you can ssh <username>@vmtesttat1cdbin9944 successfully from where youāre trying to run Ansible, then we can at least rule out networking/firewall issues.
Testing locally, setting the timeout at the play level does appear to apply to the implicit gather_facts tasks too, but itās possible itās a bug in the version youāre usingā¦
(obviously, if youāre testing via adhoc there is no playbook, so you have to use the second link I shared to configure it)
(and to be clear, it wonāt āmake it workā, it will just prevent the hang since you said āWhat I am waiting is that Ansible timeout and do no stay hangā)
I think @birb is saying that they expect some hosts to fail the SSH connection from time to time, but they donāt want that failure to cause such a long delay in the execution of the rest of the playbook or workflow.
That is what the task timeout can do (big hammer), but not the connection timeout.
I know this might be counter intuitive, but connections have many phases and each with itās own timeout, the timeout in the connection plugins normally maps to āhow long a does each interaction/response take w/o a ping/aliveā ā¦ but it is stuck long before that applies in āinitial connection waiting for responseā.
The ssh client configuration has dozens of timeouts, one of them applies to this case, but not the āconnection_timeoutā.
You probably want to set ConnectTimeout to not match the TCP timeout, which sometimes does not trigger due to how the firewall prevents ssh access.