Protocol is slow

I like Ansible's approach to management. Don't need almost anything at
managed hosts. Don't need any central daemon. Copy the module, run, get
results, remove, repeat. Easy.

But unfortunately can be slow.

I have a playbook with ~200 actions. It takes 20 seconds to run with
'-c local'. About a minute if use SSH to localhost. Connecting and
disconnecting for every step really makes things slower... And 12 minutes
if used with remote server. Shared SSH connections (ControlMaster /
ControlPersist) are enabled, servers aren't too busy but located in
different datacenters.

ansible-pull is nice, but not really what I would like. I don't need to
run tasks via Cron, I need to trigger them manually. Also will need to
have central Git repository for playbooks. Being able to run Ansible,
look at output and fix problems if any is much more desirable than
getting reports about errors by e-mail or something similar.

Don't need to tell. Everyone here knows advantages of usual Ansible
usage way :slight_smile:

So. Aren't there any plans to change usual process which now requires
additional connections and steps? I understand that it helps to make
thing more easy, but anyway...

Require Ansible to be installed on remote computer, copy playbook, run
with '-c local' and show results in the same way as now? Or just copy
Ansible there together with playbook and remove when execution finished?
Don't copy full Ansible, but some smaller (and having less dependencies)
script and provide it with playbook already parsed at central server (I
dont't know if this is reasonable)?

Quite big changes. A bit simpler one would be to create temp directory
and upload all modules only once, don't repeat this for every 'action'
line in playbook. As for now one can see in debug (-vvv) output 5
actions related to network activity:
<127.0.0.1> EXEC mkdir -p
$HOME/.ansible/tmp/ansible-1348675481.15-70644186131164 && chmod a+rx
$HOME/.ansible/tmp/ansible-1348675481.15-70644186131164 && echo
$HOME/.ansible/tmp/ansible-1348675481.15-70644186131164
<127.0.0.1> REMOTE_MODULE setup
<127.0.0.1> PUT /tmp/tmpLYtBRt TO
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/setup
<127.0.0.1> EXEC chmod u+x
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/setup
<127.0.0.1> EXEC
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/setup
<127.0.0.1> EXEC rm -rf
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/

As for me, creating directory, uploading module, making it executable
and removing directory can be done only once for playbook, so eliminating
4 of 5 network-related actions. Does this make sense?

You can use a hybrid approach: Install ansible-pull on each system,
but instead of triggering it via cron, trigger it using ansible on
your control node:

Replies inline -- lots of good ideas, some we can't really do, but many we can.

Read on for details.

I've been waiting for things to stabilize with the new module logic
and I think we are at a point where we can start doing a lot of this
kind of stuff to core again.

I like Ansible's approach to management. Don't need almost anything at
managed hosts. Don't need any central daemon. Copy the module, run, get
results, remove, repeat. Easy.

But unfortunately can be slow.

I have a playbook with ~200 actions. It takes 20 seconds to run with
'-c local'. About a minute if use SSH to localhost. Connecting and
disconnecting for every step really makes things slower... And 12 minutes
if used with remote server. Shared SSH connections (ControlMaster /
ControlPersist) are enabled, servers aren't too busy but located in
different datacenters.

BTW -- If you aren't already using with_items with your package
installs, definitely switch to that.

If your 12 minutes includes a lot of package installations I also
would think that's reasonably fast and you may be seeing some speedups
locally because you're using a local yum mirror? (Perhaps not?)

ansible-pull is nice, but not really what I would like. I don't need to
run tasks via Cron, I need to trigger them manually. Also will need to
have central Git repository for playbooks. Being able to run Ansible,
look at output and fix problems if any is much more desirable than
getting reports about errors by e-mail or something similar.

Don't need to tell. Everyone here knows advantages of usual Ansible
usage way :slight_smile:

So. Aren't there any plans to change usual process which now requires
additional connections and steps? I understand that it helps to make
thing more easy, but anyway...

A mode that uses paramiko and leaves connections open in configurable
(Least-Recently-Used way is being considered right now.

Given paramiko and SSH with ControlMaster/ControlPersist are on par
right now, this has a lot of promise if you are willing to set the
number of open
connections reasonably high (i.e. equal to your number of hosts in a play).

Require Ansible to be installed on remote computer, copy playbook, run
with '-c local' and show results in the same way as now? Or just copy
Ansible there together with playbook and remove when execution finished?

This is basically what ansible-pull does, and you wouldn't have to run
it out of cron. You could use ansible to invoke ansible-pull, in
fact.

Don't copy full Ansible, but some smaller (and having less dependencies)
script and provide it with playbook already parsed at central server (I
dont't know if this is reasonable)?

Quite big changes. A bit simpler one would be to create temp directory
and upload all modules only once, don't repeat this for every 'action'
line in playbook. As for now one can see in debug (-vvv) output 5
actions related to network activity:

Yeah this is a non-starter because of modules needing to think about
results to decide what to run next.

<127.0.0.1> EXEC mkdir -p
$HOME/.ansible/tmp/ansible-1348675481.15-70644186131164 && chmod a+rx
$HOME/.ansible/tmp/ansible-1348675481.15-70644186131164 && echo
$HOME/.ansible/tmp/ansible-1348675481.15-70644186131164
<127.0.0.1> REMOTE_MODULE setup
<127.0.0.1> PUT /tmp/tmpLYtBRt TO
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/setup
<127.0.0.1> EXEC chmod u+x
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/setup
<127.0.0.1> EXEC
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/setup
<127.0.0.1> EXEC rm -rf
/home/alopropoz/.ansible/tmp/ansible-1348675481.15-70644186131164/

As for me, creating directory, uploading module, making it executable
and removing directory can be done only once for playbook, so eliminating
4 of 5 network-related actions. Does this make sense?

Actually this is QUITE true. It's not done because failed hosts are
not contacted again, but if cleanup is added in a try/except/finally
sort of way and the decision of what temp dir name to use is made in
PLAYBOOK code and passed to Runner this could be beautiful, and is
QUITE easy to do.

The other hack is that if we know what the interpreter to use is (AND
WE DO, we look at ansible_python_interpreter and the shebang line),
there is no need for the chmod +x as we can just directly execute the
module.

Another trick I have not added yet is that if we know we are executing
only new style modules (those written using the ANSIBLE_MODULE_COMMON
code) and they do not involve file transfer, there is no need for a
directory, we can just move a *file* and execute it.

That would carve out about half of the operations right there.

  - hosts: all
    user: root
    tasks:
      - name: do stuff
        action: shell ansible-pull ...

That way you still get the on-demand control but each system runs
independantly. You still have the (potential) problem of collecting
the logs, but that's an easier problem to solve.

Yep. The fetch module is pretty good for this! Analyzing them is
a bit more of an effort, but also not insurmountable.

There is a patch I need to clean up that will also allow per hostname
top level playbooks for ansible-pull instead of just requiring them to
be named local, or you can also pass in the name of the playbook file
in the repo.

I think I will stick with this as for now. Not ansible-pull but just
ansible, though, to avoid the need to maintain central repository with
playbooks available for all hosts. Central server can copy playbook, run
ansible and wait for results.

Since (hopefully) most of time there will be no errors and Ansible will
let me know in case if an error happen, I will start without collecting
logs.

Thank you for the nice idea.

BTW -- If you aren't already using with_items with your package
installs, definitely switch to that.

If your 12 minutes includes a lot of package installations I also
would think that's reasonably fast and you may be seeing some speedups
locally because you're using a local yum mirror? (Perhaps not?)

I use 'with_items', thank you.
But this is not first run of playbook. Everything is configured, all
packages are installed. No actual actions are executed (except may be
few simple shell commands). Difference between 20 seconds for '-c local'
and 12 minutes for remote machine is all because of network
interactions.

> Require Ansible to be installed on remote computer, copy playbook, run
> with '-c local' and show results in the same way as now? Or just copy
> Ansible there together with playbook and remove when execution finished?

This is basically what ansible-pull does, and you wouldn't have to run
it out of cron. You could use ansible to invoke ansible-pull, in
fact.

Now it would be executed like any other external program: you wait until
ansible on remote side is done and get a lot of output. From first host,
from second and so on. Quite difficult to understand. Usual ansible mode
(with results shown after each task) is much more handy.
Don't know if there is a point in implementing this though.

I think this thread is sufficiently older than fireball mode that I
should mention it:

You should read these:

http://michaeldehaan.net/post/32378722265/ansible-learns-to-fly-0mq-that-sets-up-itself

http://jpmens.net/2012/10/01/dramatically-speeding-up-ansible-runs/

But this is not first run of playbook. Everything is configured, all
packages are installed. No actual actions are executed (except may be
few simple shell commands). Difference between 20 seconds for '-c local'
and 12 minutes for remote machine is all because of network
interactions.

Sounds like you need to be running ansible on a head-node inside your
target network instead of from really far away and over some slow
pipes?