# of ssh connections / timeout function?

This is a two part question. I sort of worked around the problem I asked about yesterday (third task crashes on third of 4 machines) by splitting up my playbook into 3 parts. Ugly, but for now it works, at least on four machines at a time, testing bigger inventory now.

Possibly pertinent information: We deploy into a closed network (behind a vpn) which is composed of over 500 vm’s on hardware spread across the US. The ssh connection to these machines is SLOW due to a config problem that I am not, at the moment, allowed to correct. As such, it ca take as long a two + minutes from when I request an ssh connection to when it actually establishes. Again, I know what the issue is and how to fix it, but am not allowed to at the moment.

What I’ve noticed: as ansible moves through the playbook, it moves up and down the inventory list. In other words, it will start task one on machine one (of 5 let’s say) and move sequentially (seemingly in pairs) through to machine 5, then the next task starts on machine 5 and moves up the list from 4 to 3 to 2 to 1. I’ve notice that each task executes quickly on the first two machines in a task, and that they were the last two machines in the previous task. I’ve also noted that each task is executed quite quickly on a consecutive pair of machines (regardless of geographic location btw) but then there is a lonnnnnnng delay between the same task executing on the next pair.

Questions: This leads me to wonder: is there a timeout occurring in ansible? Does it actually open ssh tunnels in pairs?

I can’t fix the built in delay in the vm’s right now, so am hoping to make changes in ansible until this can be fixed in the hosts, but right now my suspicions that the process is timing out due to long delay between the starting pair of one task until they are the end pair of the next task is causing some sort of time out issue.

Then again my theory could be hooey and we’re just cursed. :frowning:

regards, Richard

ps: while I wrote the above I was running a playbook that makes a directory and moves and untars a 14.5 MB file using unarchive. This worked fine on a 4 machine inventory. It crashed on the unarchive task after the third machine.

Another one of those “it depends” answers.
You can configure Ansible to use either Open-ssh client or the parameko client. Parameko is the default and usually performs better, with fewer permissions headaches (though it still uses ~/.ssh/known_hosts and ~/.ssh/id_rsa.pub, etc…)
My observation is that ansible will pick the order in which it connects to managed hosts in a non-deterministic fashion, thus if your playbook performs an task on a host, then, due to timing/selection of host/task to do next/whatever variable in your network causes connections to initiate slowly, it may turn around and immediately reconnect to a host on which it just completed a task to perform the next task.
There are parameters you can tune in your ansible.cfg that will allow you to pipeline multiple tasks through the same SSH connection. I played around with them a little bit when having some issues on our network, but ultimately went back to the defaults due to some mis-guided “requiretty” settings sudo settings on the managed hosts, on the part of SAs that predate my tenure at {POE}.)

See http://docs.ansible.com/ansible/intro_configuration.html#paramiko-specific-settings (And the next two sections on Open-ssh specific settings, and Accelerated Mode settings) for more information.