The short version;
I’m running tests against Fedora 22 Alpha Atomic host and Server, and am seeing various connection failures depending on ssh_connection settings. They all look like something reported against openssh-5.3 with the ControlPersist backport. I played around with various values and found a workaround, remove the ControlPersist value:
[ssh_connection]
ssh_args = -o ControlMaster=auto
in ~/.ansible.cfg allows everything to work. The question is why?
The long version:
Looking up the connection issues (SFTP failures, plain connection resets, etc) I came across this post from last year https://groups.google.com/forum/#!msg/ansible-project/QUdxNK1zEH0/rQKnO827FUgJ which had similar looking failures and lead to this RHT Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1160487.
I went through the various options mentioned in the thread in a local ansible.cfg, disabling pipeline, changing to scp, clearing ssh_args entirely. The last option worked, so I dug further.
The ansible version is:
[fedora@atomic-master ~]$ rpm -q ansible
ansible-1.8.4-1.fc22.noarch
OpenSSH versions on each system:
192.168.122.10 | success | rc=0 >>
openssh-6.7p1-11.fc22.x86_64
192.168.122.11 | success | rc=0 >>
openssh-6.7p1-10.fc22.x86_64
No user based ssh settings, all values default from /etc/ansible: (p1-11 succeeds, p1-10 fails)
<192.168.122.10> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
<192.168.122.10>
fatal: [192.168.122.10] => failed to transfer file to /home/fedora/.ansible/tmp/ansible-tmp-1427209312.33-30225738744475/setup:
Couldn’t read packet: Connection reset by peer
<192.168.122.11> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
ok: [192.168.122.11]
User based ~/.ansible.cfg, Pipelining enabled (p1-11 succeeds, p1-10 fails)
[defaults]
host_key_checking = False
[ssh_connection]
#ssh_args = -o ControlMaster=auto
#ssh_args =
pipelining = True
<192.168.122.10> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=irpxmyfkxjqtjkyqbfmyvogzyeygovsm] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-irpxmyfkxjqtjkyqbfmyvogzyeygovsm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt
<192.168.122.11> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=hdearqulmxcbkpjxjlgwpyiiapebebju] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-hdearqulmxcbkpjxjlgwpyiiapebebju; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
ok: [192.168.122.11]
User based ~/.ansible.cfg, Pipelining with ControlPersist removed (p1-11 succeeds, p1-10 fails)
[defaults]
host_key_checking = False
[ssh_connection]
ssh_args = -o ControlMaster=auto
#ssh_args =
pipelining = True
<192.168.122.10> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=gbcczsohsqeiuyzxezohqbikibenkeun] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-gbcczsohsqeiuyzxezohqbikibenkeun; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt
<192.168.122.11> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=xjcccpqinmchipiubhsmjwdjvgmytbww] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-xjcccpqinmchipiubhsmjwdjvgmytbww; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
ok: [192.168.122.11]
User based ~/.ansible.cfg, Remove ControlPersist and pipelining (p1-11 succeeds, p1-10 succeeds)
[defaults]
host_key_checking = False
[ssh_connection]
ssh_args = -o ControlMaster=auto
#ssh_args =
#pipelining = True
<192.168.122.10> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=dyqtvojqcsjschscszupwxnithfwjhqs] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-dyqtvojqcsjschscszupwxnithfwjhqs; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/ >/dev/null 2>&1’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
ok: [192.168.122.10]
<192.168.122.11> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=gixgtyagbeoctcldxprcqrjwuxdhscvr] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-gixgtyagbeoctcldxprcqrjwuxdhscvr; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/ >/dev/null 2>&1’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
ok: [192.168.122.11]
Manually testing the ControlPersist values on the command line works as expected for both, will time out after 60s
[fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -o ControlMaster=auto -o ControlPersist=60s -S Test_Master_Socket fedora@192.168.122.11 echo Hello
fedora@192.168.122.11’s password:
Hello
[fedora@atomic-master ansible-atomic]$ ps -fu
whoami
| grep “[s]sh.*Test_Master_Socket”fedora 1015 1 0 11:29 ? 00:00:00 ssh: Test_Master_Socket [mux]
[fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -S Test_Master_Socket -O check 192.168.122.11
Master running (pid=1015)
I looked at the Koji page for openssh and don’t see anything particular to ControlPersist in the change log, but I’m not an OpenSSH ControlPersist guru. http://koji.fedoraproject.org/koji/buildinfo?buildID=619696
Bottom line, I’m out of troubleshooting steps, not sure what the impact of the workaround is, and I think someone who has more depth should take a look. I’m cc’ing ansible-devel because I wasn’t sure what the right forum for this sort of issue was. Hopefully this was clear!
Cheers,
-Matt M