ansible performance tuning

Hello,

I am trying to speed up my ansible installation. I went from 20 minutes execution to 10 minutes by switching from centos6.4(paramiko) to fedora19(ssh+control persist). Enabling accelerate=true did not bring any speed up (??).

Here are the tests I ended up doing ; Can you help me understand the difference between the debug:, script: and command: measurements and what are my options to speed things up ?

what are your numbers on the same tests ? how can I measure what is a reasonable time from an ansible point of view versus bad configuration / bad vm hosting choices ?

what do you think is taking 3sec+ per task in accelerate mode with command: echo “ping” ? knowing that I have approximately 200 tasks in my actual playbook (output from --list-tasks), 200 tasks * 3 sec ~ 10 minutes which is what I am observing.

thank you for your help !
Jerome

Sounds like you have some interesting DNS or network problems to figure out.

The debug module doesn’t do anything with the remote host, BTW.

The script module pushes a remote script and executes it, the command module pushes the command module and executes some arguments – they are slightly different in the way those work.

Once piece of advice I usually give to people is if managing machines inside EC2, but your control machine inside EC2 – the networking can be quite bad externally.

The fact that accelerate doesn’t offer any benefits for a multi-task playbook is curious, but a sign of other networking difficulties.

Comparing performance between debug, raw, command, shell and script may give you a good view on the different overhead of both SSH, SSH+python, SSH+python+shell, SSH+python+upload+shell.

When using -vvv you can see exactly what the SSH related overhead is. The fact tha Ansible only starts with the next task before the previous has finished (for all hosts) does not make it very fast, but is perfect for orchestration and predictability. If you don't need this, see async to speed it up. Or run ansible in more "slices" with less systems per "slice".

Personally, I think there's some merit in optimizing the paramiko transport, because Ansible knows best when a channel is going to be reused, and when it can be closed. So not adding this layer to reuse SSH-connections in Ansible/paramiko is a lost opportunity to have a great out-of-the-box experience.

Besides, in large enterprises, it is easier to use a newer paramiko library from a different python library path (or ship it with ansible ?!) rather then expecting a current openssh on the system.

(No for security reasons we are not going to update to a different openssh on RHEL, especially not for a system that gives access to our complete datacenter ;-))

yes the control machine is not located on premises (but i do not use ec2)
In fact I try on purpose to control hosts that are located at several distinct providers so I use my laptop as a control machine.

I expect the latency to be greater but I thought that accelerate would.

Just to understand, accelerate works by doing only 1 connection into 5099 and then send all the tasks inside this tunnel or is there still 1 connection on 5099 per task ?

my ping to the remote hosts is around 50ms.

what should I expect to reach as a correct timing for 10 x command: echo on a remote host ?

Thank you

"Personally, I think there’s some merit in optimizing the paramiko transport, because Ansible knows best when a channel is going to be reused, and when it can be closed. "

Keeping these open is possible but requires “fork affinity” the idea that the next time the host is dealt with, it goes to the same fork.

I really think openssh is the one true way, it’s only kept around because RHEL hasn’t updated their OpenSSH clients yet to support CP.