template + delegate_to + ssh shared connections = closed connections

Finding when I’m delegating the template function to a single host, I end up with SSH Error Shared connection to xxxxxx: Connection closed.

Maybe one or two servers will make it through fine, but the other 5 or so will fail.

This is between Ubuntu (ansible 1.9.0.1) and RHEL 6.x servers.

The objective is to install a configuration file onto a monitoring server for each host in the inventory. the target file name is {{ ansible_fqdn }}.cfg. I solved this by establishing another role and setting serial: 1 in the playbook, but somewhat of a pain.

And yes, my monitoring server is also in that list to generate a configuration file for itself (as it runs other services).

Anyone else see this or able to reproduce?

I noticed the same issue yesterday with 1.9.0.1 and 1.9.1. I have one role that runs a bunch of tasks from the app server and delegates the tasks to another node. It runs through a handful of tasks and then some or all hosts will randomly fail on various taks, like this:

fatal: [mdmsapp01 → cnsasg2] => SSH Error: Shared connection to cnsasg2 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [mdmclwapp1 → cnsasg2] => SSH Error: Shared connection to cnsasg2 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [mdmngaapp1 → cnsasg2] => SSH Error: Shared connection to cnsasg2 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [mdmkeuapp1 → cnsasg2] => SSH Error: Shared connection to cnsasg2 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [mdmvecapp1 → cnsasg2] => SSH Error: Shared connection to cnsasg2 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed – aborting

this sounds like a probably race condition with the per host forks
interfering with each other, try setting serial: 1 to confirm.

I didn’t try with serial, but I have not had an issue when running the same playbook against one or two delegates.

I can confirm this problem.

In my case it is a playbook which creates a configuration file for the server on the backupserver.

If I run this playbook against 1 host it works.
If I run it against a group of hosts it fails.
If I run it against a group of host using -f 1 as parameter it works.

Error is always the SSH Error: Shared connection to xxxxx closed.

Ansible version is 1.9-0.git201503262158~unstable

I have this issue as well: running a play which delegates some action to the same host pretty consistently fails with the above error. Findings are exactly the same as below, including the -f 1 behavior.

Any ideas on workarounds?

have you tried switching to paramiko as your connection?

I was just looking into that, is that a reliable workaround?

Will try soon and report the results.

I can confirm that switching to paramiko appears to solve the problem in my case. This seems like a bit of a shame openssh is the recommended way to go (more features).

the issue is that under normal operation you are hitting different
hosts, when delegating you may have several forks hitting the same
host, which when you factor in control master ssh sockets, it can get
complicated. paramiko avoids this by not having a shared resource.

in 2.0 we've added some code that does a better job of dealing with
this multiconcurrency when using ssh.

Great to hear, thanks!

FYI, after working with paramiko for a while I still have issues when delegating multiple tasks to the same host, this time with the error “Error reading SSH protocol banner”.

I ran into this today and played around with different server and client versions of OpenSSH. My environment consists of servers running Ubuntu 14.04 and clients running OSX.

If you run OSX, this problem will go away if you brew install openssh.

On the server I tested these openssh versions:

  • Default: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2, OpenSSL 1.0.1f 6 Jan 2014
  • Latest: OpenSSH_6.9p1, OpenSSL 1.0.1f 6 Jan 2014

On the client I tested these openssh clients:

  • Default: OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011
  • Latest (brew): OpenSSH_6.8p1, OpenSSL 1.0.2a 19 Mar 2015

Actually I was wrong. Forgot to tear town my control sockets when swapping server versions.

This is entirely the fault of the ConsoleKit patch, at least for Ubuntu 14.04

https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1334916/

Thanks for pointing out the bug Nathan! I’ll add my vote.

I’m having this same issue. Is there any fix for this yet, or is it just use the workaround of using paramiko or just doing one node at a time?

–user root

For me at least, and I know this won’t be available for everyone seems to get around this particular issue.