Ansible playbook fails on ssh in between the tasks.

Hi!

After 5 out of 11 ansible tasks ran successfuly, my playbook failed throwing the below error:

failed: [<ip>] => {"changed": true, "rc": 1}
stderr: OpenSSH_6.2p2, OpenSSL 1.0.1k-fips 8 Jan 2015
debug1: Reading configuration data /var/lib/jenkins/.ssh/config
debug1: /var/lib/jenkins/.ssh/config line 1: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 50: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 20175
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 1
Shared connection to <ip> closed.

I want to know the reason of failure and I am wondering how can ssh fail all of a sudden when it worked for the previous tasks.

Thank You!

I had this problem and I did two things to ensure it wouldn’t happen:

  1. Increase ControlPersist timeout value. check your ssh_args settings in ansible.cfg. You might need to bump up your ControlPersist timeout. I set mine to 10m

  2. Update your ssh config to decrease ServerAliveInterval (in my case to 30) and increase ServerAliveCountMax (in my case to 4). I added these to appropriate user’s .ssh/ssh_config

Why are we decreasing the ServerAliveInterval ?