SSH connections to EC2 hang sporadically

Hello,

I’m having problems with SSH connections hanging seemingly indefinitely. I don’t recognize any patterns as to what tasks or handlers this happens on. I’m running ansible 1.7.1 on OS X Yosemite against a bunch of ec2 instances. I also deploy to vagrant but don’t think it has had similar SSH hangs.

Today I determined to kill -HUP the ssh process that was hung, and then ansible resumed happily.

I’d be grateful if anyone has troubleshooting tips for this.

-Tore

Just thought I’d follow up with a typical output with -vvvv; This is a play where I’m installing a couple of apt packages to three AWS hosts (shortened host1-3 for brevity). You see one of the hosts succeed while the other two apparently do not. The one that succeeds is in EU (Ireland) while the unfinished hosts are both in different AZs in East US.

I logged in to the hosts which did not resume play and verified that the temporary directory actually had been deleted.

TASK: [common | Install common packages] **************************************
ESTABLISH CONNECTION FOR USER: ubuntu
ESTABLISH CONNECTION FOR USER: ubuntu
ESTABLISH CONNECTION FOR USER: ubuntu
REMOTE_MODULE apt pkg=git,supervisor update_cache=yes state=present
REMOTE_MODULE apt pkg=git,supervisor update_cache=yes state=present
REMOTE_MODULE apt pkg=git,supervisor update_cache=yes state=present
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, ‘IdentityFile=“/Users/toreo/.ssh/cargame-ssh.pem”’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=ubuntu’, ‘-o’, ‘ConnectTimeout=10’, u’host2’, “/bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1415964802.72-81109603826031 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1415964802.72-81109603826031 && echo $HOME/.ansible/tmp/ansible-tmp-1415964802.72-81109603826031’”]
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, ‘IdentityFile=“/Users/toreo/.ssh/cargame-ssh.pem”’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=ubuntu’, ‘-o’, ‘ConnectTimeout=10’, u’host1’, “/bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790 && echo $HOME/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790’”]
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, ‘IdentityFile=“/Users/toreo/.ssh/cargame-ssh.pem”’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=ubuntu’, ‘-o’, ‘ConnectTimeout=10’, u’host3’, “/bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1415964802.72-65992156942899 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1415964802.72-65992156942899 && echo $HOME/.ansible/tmp/ansible-tmp-1415964802.72-65992156942899’”]
PUT /var/folders/f6/hp9f9y0933q95_9mlbhk27dm0000gn/T/tmpQGGHOr TO /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-65992156942899/apt
PUT /var/folders/f6/hp9f9y0933q95_9mlbhk27dm0000gn/T/tmp38enSO TO /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790/apt
PUT /var/folders/f6/hp9f9y0933q95_9mlbhk27dm0000gn/T/tmpYViEBl TO /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-81109603826031/apt
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, ‘IdentityFile=“/Users/toreo/.ssh/cargame-ssh.pem”’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=ubuntu’, ‘-o’, ‘ConnectTimeout=10’, u’host3’, u’/bin/sh -c 'sudo -k && sudo -H -S -p “[sudo via ansible, key=uuhortuzkrrxndipfnakoyezwvmlbrrg] password: " -u root /bin/sh -c '”'“'echo SUDO-SUCCESS-uuhortuzkrrxndipfnakoyezwvmlbrrg; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-65992156942899/apt; rm -rf /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-65992156942899/ >/dev/null 2>&1'”'“''']
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, 'IdentityFile=”/Users/toreo/.ssh/cargame-ssh.pem"‘, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=ubuntu’, ‘-o’, ‘ConnectTimeout=10’, u’host1’, u’/bin/sh -c 'sudo -k && sudo -H -S -p “[sudo via ansible, key=kmkiimakutvlojnbqartowszmpdbndzk] password: " -u root /bin/sh -c '”'“'echo SUDO-SUCCESS-kmkiimakutvlojnbqartowszmpdbndzk; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790/apt; rm -rf /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790/ >/dev/null 2>&1'”'“''']
EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, 'IdentityFile=”/Users/toreo/.ssh/cargame-ssh.pem"‘, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=ubuntu’, ‘-o’, ‘ConnectTimeout=10’, u’host2’, u’/bin/sh -c 'sudo -k && sudo -H -S -p “[sudo via ansible, key=iyzsrbntouwziaxcxllqidtsopvifmac] password: " -u root /bin/sh -c '”'“'echo SUDO-SUCCESS-iyzsrbntouwziaxcxllqidtsopvifmac; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-81109603826031/apt; rm -rf /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-81109603826031/ >/dev/null 2>&1'”'"''']
ok: [host3] => (item=git,supervisor) => {“changed”: false, “item”: “git,supervisor”}

If it helps, I just kill -HUP’ed one of the ssh processes and got a lot of debug output:

failed: [host1] => (item=git,supervisor) => {“failed”: true, “item”: “git,supervisor”, “parsed”: false}

invalid output was: SUDO-SUCCESS-kmkiimakutvlojnbqartowszmpdbndzk

OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011

debug1: Reading configuration data /Users/toreo/.ssh/config

debug1: Reading configuration data /etc/ssh_config

debug1: /etc/ssh_config line 20: Applying options for *

debug1: /etc/ssh_config line 53: Applying options for *

debug2: ssh_connect: needpriv 0

debug1: Connecting to host1 [] port 22.

debug2: fd 3 setting O_NONBLOCK

debug1: fd 3 clearing O_NONBLOCK

debug1: Connection established.

debug3: timeout: 9894 ms remain after connect

debug3: Incorrect RSA1 identifier

debug3: Could not load “/Users/toreo/.ssh/cargame-ssh.pem” as a RSA1 public key

debug1: identity file /Users/toreo/.ssh/cargame-ssh.pem type -1

debug1: identity file /Users/toreo/.ssh/cargame-ssh.pem-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

debug1: Local version string SSH-2.0-OpenSSH_6.2

debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2

debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 pat OpenSSH*

debug2: fd 3 setting O_NONBLOCK

debug3: load_hostkeys: loading entries for host “host1” from file “/Users/toreo/.ssh/known_hosts”

debug3: load_hostkeys: found key type RSA in file /Users/toreo/.ssh/known_hosts:42

debug3: load_hostkeys: loaded 1 keys

debug3: order_hostkeyalgs: prefer hostkeyalgs: ssh-rsa-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-rsa

debug1: SSH2_MSG_KEXINIT sent

debug1: SSH2_MSG_KEXINIT received

debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1

debug2: kex_parse_kexinit: ssh-rsa-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-rsa,ssh-dss-cert-v01@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-dss

debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se

debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se

debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96

debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96

debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none

debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none

debug2: kex_parse_kexinit:

debug2: kex_parse_kexinit:

debug2: kex_parse_kexinit: first_kex_follows 0

debug2: kex_parse_kexinit: reserved 0

debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1

debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256

debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se

debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se

debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96

debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96

debug2: kex_parse_kexinit: none,zlib@openssh.com

debug2: kex_parse_kexinit: none,zlib@openssh.com

debug2: kex_parse_kexinit:

debug2: kex_parse_kexinit:

debug2: kex_parse_kexinit: first_kex_follows 0

debug2: kex_parse_kexinit: reserved 0

debug2: mac_setup: found hmac-md5-etm@openssh.com

debug1: kex: server->client aes128-ctr hmac-md5-etm@openssh.com zlib@openssh.com

debug2: mac_setup: found hmac-md5-etm@openssh.com

debug1: kex: client->server aes128-ctr hmac-md5-etm@openssh.com zlib@openssh.com

debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent

debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP

debug2: dh_gen_key: priv key bits set: 126/256

debug2: bits set: 494/1024

debug1: SSH2_MSG_KEX_DH_GEX_INIT sent

debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY

debug1: Server host key: RSA 90:5b:60:eb:fd:c3:96:b1:83:b7:9f:5e:b4:bd:c0:b3

debug3: load_hostkeys: loading entries for host “host1” from file “/Users/toreo/.ssh/known_hosts”

debug3: load_hostkeys: found key type RSA in file /Users/toreo/.ssh/known_hosts:42

debug3: load_hostkeys: loaded 1 keys

debug3: load_hostkeys: loading entries for host “” from file “/Users/toreo/.ssh/known_hosts”

debug3: load_hostkeys: found key type RSA in file /Users/toreo/.ssh/known_hosts:42

debug3: load_hostkeys: loaded 1 keys

debug1: Host ‘host1’ is known and matches the RSA host key.

debug1: Found key in /Users/toreo/.ssh/known_hosts:42

debug2: bits set: 519/1024

debug1: ssh_rsa_verify: signature correct

debug2: kex_derive_keys

debug2: set_newkeys: mode 1

debug1: SSH2_MSG_NEWKEYS sent

debug1: expecting SSH2_MSG_NEWKEYS

debug2: set_newkeys: mode 0

debug1: SSH2_MSG_NEWKEYS received

debug1: Roaming not allowed by server

debug1: SSH2_MSG_SERVICE_REQUEST sent

debug2: service_accept: ssh-userauth

debug1: SSH2_MSG_SERVICE_ACCEPT received

debug2: key: /Users/toreo/.ssh/id_rsa (0x7f816ab01280),

debug2: key: /Users/toreo/.ssh/cargame-ssh.pem (0x0), explicit

debug1: Authentications that can continue: publickey

debug3: start over, passed a different list publickey

debug3: preferred gssapi-with-mic,gssapi-keyex,hostbased,publickey

debug3: authmethod_lookup publickey

debug3: remaining preferred: ,gssapi-keyex,hostbased,publickey

debug3: authmethod_is_enabled publickey

debug1: Next authentication method: publickey

debug1: Offering RSA public key: /Users/toreo/.ssh/id_rsa

debug3: send_pubkey_test

debug2: we sent a publickey packet, wait for reply

debug1: Authentications that can continue: publickey

debug1: Trying private key: /Users/toreo/.ssh/cargame-ssh.pem

debug1: read PEM private key done: type RSA

debug3: sign_and_send_pubkey: RSA fd:98:17:09:b8:20:08:c2:66:d9:54:07:e6:04:b9:23

debug2: we sent a publickey packet, wait for reply

debug1: Enabling compression at level 6.

debug1: Authentication succeeded (publickey).

Authenticated to host1 ([]:22).

debug2: fd 6 setting O_NONBLOCK

debug2: fd 7 setting O_NONBLOCK

debug1: channel 0: new [client-session]

debug3: ssh_session2_open: channel_new: 0

debug2: channel 0: send open

debug1: Requesting no-more-sessions@openssh.com

debug1: Entering interactive session.

debug2: callback start

debug1: Requesting authentication agent forwarding.

debug2: channel 0: request auth-agent-req@openssh.com confirm 0

debug2: fd 3 setting TCP_NODELAY

debug3: packet_set_tos: set IP_TOS 0x10

debug2: client_session2_setup: id 0

debug2: channel 0: request pty-req confirm 1

debug1: Sending environment.

debug3: Ignored env VIRTUALENVWRAPPER_PROJECT_FILENAME

debug3: Ignored env TERM_PROGRAM

debug3: Ignored env TERM

debug3: Ignored env SHELL

debug3: Ignored env TMPDIR

debug3: Ignored env Apple_PubSub_Socket_Render

debug3: Ignored env TERM_PROGRAM_VERSION

debug3: Ignored env TERM_SESSION_ID

debug3: Ignored env USER

debug3: Ignored env SSH_AUTH_SOCK

debug3: Ignored env __CF_USER_TEXT_ENCODING

debug3: Ignored env WORKON_HOME

debug3: Ignored env PATH

debug3: Ignored env VIRTUALENVWRAPPER_HOOK_DIR

debug3: Ignored env PWD

debug3: Ignored env XPC_FLAGS

debug3: Ignored env PS1

debug3: Ignored env XPC_SERVICE_NAME

debug3: Ignored env HOME

debug3: Ignored env SHLVL

debug3: Ignored env LOGNAME

debug1: Sending env LC_CTYPE = UTF-8

debug2: channel 0: request env confirm 0

debug3: Ignored env GOPATH

debug3: Ignored env DISPLAY

debug3: Ignored env VIRTUALENVWRAPPER_LOG_DIR

debug3: Ignored env SECURITYSESSIONID

debug3: Ignored env OLDPWD

debug3: Ignored env _

debug3: Ignored env VERSIONER_PYTHON_VERSION

debug3: Ignored env VERSIONER_PYTHON_PREFER_32_BIT

debug1: Sending command: /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=kmkiimakutvlojnbqartowszmpdbndzk] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-kmkiimakutvlojnbqartowszmpdbndzk; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790/apt; rm -rf /home/ubuntu/.ansible/tmp/ansible-tmp-1415964802.72-242263817184790/ >/dev/null 2>&1’”‘"’’

debug2: channel 0: request exec confirm 1

debug2: callback done

debug2: channel 0: open confirm rwindow 0 rmax 32768

debug2: channel_input_status_confirm: type 99 id 0

debug2: PTY allocation request accepted on channel 0

debug2: channel 0: rcvd adjust 2097152

debug2: channel_input_status_confirm: type 99 id 0

debug2: exec request accepted on channel 0

debug1: channel 0: free: client-session, nchannels 1

debug3: channel 0: status: The following connections are open:

#0 client-session (t4 r0 i0/0 o0/0 fd 5/6 cc -1)

debug1: fd 1 clearing O_NONBLOCK

debug1: fd 2 clearing O_NONBLOCK

Killed by signal 1.

I’m checking output of your commands on the phone in the bus so I could miss something, but it looks like ssh is actually fine. Have you try to debug on remote host?

Have you compare ansible with normal ssh?

Cherers,
Wawrzek

Hi Wawrzek,

Thanks for your response. I believe my problem may have been caused by SSH connections being silently broken by NAT. I just added “ServerAliveInterval 50” to my ~/.ssh/config file and will see how it works out. Regular SSH connections would sometimes fail with a broken pipe, but of couse that happened because I would type something. Ansible is just waiting for data from the peer, so it won’t detect this situation.

Unless I follow up with more information, I think we can consider this solved.

-Tore