Connection issues with openssh-6.7p1-11 in Fedora 22 Alpha

The short version;

I’m running tests against Fedora 22 Alpha Atomic host and Server, and am seeing various connection failures depending on ssh_connection settings. They all look like something reported against openssh-5.3 with the ControlPersist backport. I played around with various values and found a workaround, remove the ControlPersist value:

[ssh_connection]

ssh_args = -o ControlMaster=auto

in ~/.ansible.cfg allows everything to work. The question is why?

The long version:

Looking up the connection issues (SFTP failures, plain connection resets, etc) I came across this post from last year https://groups.google.com/forum/#!msg/ansible-project/QUdxNK1zEH0/rQKnO827FUgJ which had similar looking failures and lead to this RHT Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1160487.

I went through the various options mentioned in the thread in a local ansible.cfg, disabling pipeline, changing to scp, clearing ssh_args entirely. The last option worked, so I dug further.

The ansible version is:

[fedora@atomic-master ~]$ rpm -q ansible
ansible-1.8.4-1.fc22.noarch

OpenSSH versions on each system:

192.168.122.10 | success | rc=0 >>
openssh-6.7p1-11.fc22.x86_64

192.168.122.11 | success | rc=0 >>
openssh-6.7p1-10.fc22.x86_64

No user based ssh settings, all values default from /etc/ansible: (p1-11 succeeds, p1-10 fails)

<192.168.122.10> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s

<192.168.122.10>

fatal: [192.168.122.10] => failed to transfer file to /home/fedora/.ansible/tmp/ansible-tmp-1427209312.33-30225738744475/setup:

Couldn’t read packet: Connection reset by peer

<192.168.122.11> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s

ok: [192.168.122.11]

User based ~/.ansible.cfg, Pipelining enabled (p1-11 succeeds, p1-10 fails)

[defaults]

host_key_checking = False

[ssh_connection]

#ssh_args = -o ControlMaster=auto

#ssh_args =

pipelining = True

<192.168.122.10> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=irpxmyfkxjqtjkyqbfmyvogzyeygovsm] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-irpxmyfkxjqtjkyqbfmyvogzyeygovsm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s

fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt

<192.168.122.11> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=hdearqulmxcbkpjxjlgwpyiiapebebju] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-hdearqulmxcbkpjxjlgwpyiiapebebju; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s

ok: [192.168.122.11]

User based ~/.ansible.cfg, Pipelining with ControlPersist removed (p1-11 succeeds, p1-10 fails)

[defaults]

host_key_checking = False

[ssh_connection]

ssh_args = -o ControlMaster=auto

#ssh_args =

pipelining = True

<192.168.122.10> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=gbcczsohsqeiuyzxezohqbikibenkeun] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-gbcczsohsqeiuyzxezohqbikibenkeun; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto

fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt

<192.168.122.11> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=xjcccpqinmchipiubhsmjwdjvgmytbww] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-xjcccpqinmchipiubhsmjwdjvgmytbww; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto

ok: [192.168.122.11]

User based ~/.ansible.cfg, Remove ControlPersist and pipelining (p1-11 succeeds, p1-10 succeeds)

[defaults]

host_key_checking = False

[ssh_connection]

ssh_args = -o ControlMaster=auto

#ssh_args =

#pipelining = True

<192.168.122.10> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=dyqtvojqcsjschscszupwxnithfwjhqs] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-dyqtvojqcsjschscszupwxnithfwjhqs; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/ >/dev/null 2>&1’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto

ok: [192.168.122.10]

<192.168.122.11> PubkeyAuthentication=no ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=gixgtyagbeoctcldxprcqrjwuxdhscvr] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-gixgtyagbeoctcldxprcqrjwuxdhscvr; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/ >/dev/null 2>&1’”‘"’’ ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto

ok: [192.168.122.11]

Manually testing the ControlPersist values on the command line works as expected for both, will time out after 60s

[fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -o ControlMaster=auto -o ControlPersist=60s -S Test_Master_Socket fedora@192.168.122.11 echo Hello

fedora@192.168.122.11’s password:

Hello

[fedora@atomic-master ansible-atomic]$ ps -fu whoami | grep “[s]sh.*Test_Master_Socket”

fedora 1015 1 0 11:29 ? 00:00:00 ssh: Test_Master_Socket [mux]

[fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -S Test_Master_Socket -O check 192.168.122.11

Master running (pid=1015)

I looked at the Koji page for openssh and don’t see anything particular to ControlPersist in the change log, but I’m not an OpenSSH ControlPersist guru. http://koji.fedoraproject.org/koji/buildinfo?buildID=619696

Bottom line, I’m out of troubleshooting steps, not sure what the impact of the workaround is, and I think someone who has more depth should take a look. I’m cc’ing ansible-devel because I wasn’t sure what the right forum for this sort of issue was. Hopefully this was clear!

Cheers,
-Matt M

The short version;

I'm running tests against Fedora 22 Alpha Atomic host and Server, and
am seeing various connection failures depending on ssh_connection
settings. They all look like something reported against openssh-5.3
with the ControlPersist backport. I played around with various
values and found a workaround, remove the ControlPersist value:

...snip...

Bottom line, I'm out of troubleshooting steps, not sure what the
impact of the workaround is, and I think someone who has more depth
should take a look. I'm cc'ing ansible-devel because I wasn't sure
what the right forum for this sort of issue was. Hopefully this was
clear!

Looks like it might be:
https://bugzilla.redhat.com/show_bug.cgi?id=1203900

Might note there that it's affecting f22 as well and ask for an update?
(Looks like they only updated rawhide)

kevin