parallel execution

Dear all!

I’m in the process of exploring ansible and already found it pretty cool.

There is one thing, however, which I could not figure out: parallel execution.

I have a simple play:

  • hosts: all

serial: 5

tasks:

  • name: parallel
    command: sleep 10

Which I try to run with ‘ansible-playbook -f 20 -i infra ./paralell-test.ymll’

It seems that the commands are executed in sequential order on all hosts set in the inventory, independently if I give or set the -f or the serial: parameter.

Any clues how to enable parallel task execution?

I’m using 1.2.1.

Thanks!

–forks 20 is indeed parallel.

You should see all of the hosts returning at the same time.

If you have serial set it will say “do all machines in this batch at the same time, then move on”, so if serial is set to 5, and you have --forks 20, it will still only do 5 at a time.

serial says “complete the playbook entirely on these hosts before moving on to the next few”.

Thus, serial: 1 will make things non-parallel. The default is non-serial.

BTW, the default value for --forks is 5.

Hi,

Thanks for the answers.

Still, I don’t get something right, because when I execute the playbook I get (without -f option and without serial var):

7 hosts, command sleep 10: real 1m22.282s
7 hosts, command sleep 1: real 0m25.313s

Based on the time difference it seems that the execution is sequential.

What could I be missing?

Thanks!

user@host:/work/mysql-ansible/mysql-cluster$ time ansible-playbook -i infra ./paralell-test.yml

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [ec2-54-216-187-11.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-216-223-17.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-228-130-185.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-228-154-89.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-216-199-59.eu-west-1.compute.amazonaws.com]
ok: [ec2-79-125-51-180.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-228-46-230.eu-west-1.compute.amazonaws.com]

TASK: [parallel] **************************************************************
changed: [ec2-54-216-187-11.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-228-130-185.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-216-223-17.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-228-154-89.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-216-199-59.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-228-46-230.eu-west-1.compute.amazonaws.com]
changed: [ec2-79-125-51-180.eu-west-1.compute.amazonaws.com]

PLAY RECAP ********************************************************************
ec2-54-216-187-11.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-216-199-59.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-216-223-17.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-228-130-185.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-228-154-89.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-228-46-230.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-79-125-51-180.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0

real 0m25.313s
user 0m0.680s
sys 0m0.532s

user@host:/work/mysql-ansible/mysql-cluster$ time ansible-playbook -i infra ./paralell-test.yml

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [ec2-54-216-187-11.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-216-223-17.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-216-199-59.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-228-130-185.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-228-154-89.eu-west-1.compute.amazonaws.com]
ok: [ec2-54-228-46-230.eu-west-1.compute.amazonaws.com]
ok: [ec2-79-125-51-180.eu-west-1.compute.amazonaws.com]

TASK: [parallel] **************************************************************
changed: [ec2-54-216-187-11.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-216-223-17.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-228-130-185.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-216-199-59.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-228-154-89.eu-west-1.compute.amazonaws.com]
changed: [ec2-54-228-46-230.eu-west-1.compute.amazonaws.com]
changed: [ec2-79-125-51-180.eu-west-1.compute.amazonaws.com]

PLAY RECAP ********************************************************************
ec2-54-216-187-11.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-216-199-59.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-216-223-17.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-228-130-185.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-228-154-89.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-54-228-46-230.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0
ec2-79-125-51-180.eu-west-1.compute.amazonaws.com : ok=2 changed=1 unreachable=0 failed=0

real 1m22.282s
user 0m0.568s
sys 0m0.476s

So with 7 hosts and --forks 5 you should wait 20 seconds on a sleep 10.

Perhaps you have set serial to 1 in your ansible.cfg

You have also mispelled “parallel” so there is a chance you have two playbooks and serial: 1 is set in the other spelling.

It turned out that host key checking feature and my local ssh setup introduced the serial behaviour when using ssh as a transport:

in ssh.py:

if C.HOST_KEY_CHECKING and not_in_host_file:

lock around the initial SSH connectivity so the user prompt about whether to add

the host to known hosts is not intermingled with multiprocess output.

fcntl.lockf(self.runner.process_lockfile, fcntl.LOCK_EX)
fcntl.lockf(self.runner.output_lockfile, fcntl.LOCK_EX)

Now I managed to change my environment, so that everything is run in parallel. Cool.

Maybe a vvv before the actual locking would be useful (as I did not get the ‘add to known hosts prompt’).

Yep, host key checking only introduces serial locking until all host keys are approved.

So it seems your configuration is using a /different/ known hosts location and it wasn’t picking up that location? Would be interested in hearing more about your configuration.

Thanks!

Hello!

I have ssh configured with StrictHostKeyChecking=no for EC2, so my known_hosts is not growing for ever :wink: The downside, besides being less secure, is that ansible will not found the host in the known_hosts file. Fortunately, the ANSIBLE_HOST_KEY_CHECKING=no option works well to disable this feature.

Istvan

Is there are problem I need to help with above?

FYI, if set, ANSIBLE_HOST_KEY_CHECKING=no also passes along StrictHostKeyChecking=no as well, so there’s no extra reason to define it in ansible.cfg or SSH config if you don’t want to.

Reviving because I was recently bitten by this myself, and found it difficult to debug. Like the OP I assumed ansible forking was simply broken. I finally solved the problem via the discussion at http://stackoverflow.com/questions/17958949/how-do-i-drive-ansible-programmatically-and-concurrently - and then I found this thread.

More and more folks are using ansible with AWS and EC2, so I expect this to become a common problem. Folks with large numbers of hosts in EC2, and large churn of those hosts, won’t want ever-growing known_hosts files. Yet they may still want host key checking when talking to non-EC2 hosts. I know some consider it dangerous to disable host key checking under any circumstances, but imagine typing ‘yes’ 50 times for an inventory group that will only exist for a day or two.

The ansible serialization happens automatically, so the first step might be to emit a warning. Could ansible warn when --forks is set to a non-default value, the command or playbook applies to more than one host, and forking is not possible? For bonus points the warning could suggest ANSIBLE_HOST_KEY_CHECKING=no. But I would be happy as long as there is a warning, and searching for the warning text turns up something useful. As it is, the problem is difficult to debug.

It would be even nicer to do something like ssh_config does, and allow ansible.cfg to disable host_key_checking for hosts that match a regex or wildcard pattern. That way I could set up my environment so that *.amazonaws.com never does host key checking, while all other hosts do.

That SO is a old bug with hashed hosts that should no longer apply in recent versions of Ansible.

We can and do fork once host keys are accepted so there’s no need for a warning.

Also Ansible already reads in ssh_config, so you can just put that setting there if you like.

That SO is a old bug with hashed hosts that should no longer apply in recent versions of Ansible.

We can and do fork once host keys are accepted so there’s no need for a warning.

Also Ansible already reads in ssh_config, so you can just put that setting there if you like.

The point isn’t any particular bug or feature. The point is that I’m specifically asking ansible to run concurrently using --forks, and it can’t, but it doesn’t let me know about the problem. If I specifically set --forks, that’s a signal that I’m serious about it and want to know if it can’t be done.

My ssh_config definitely triggers the global lock, using ansible installed from the dev branch, current as of a few minutes ago. Here’s what I have in my ~/.ssh/config file.

Host *.amazonaws.com
PasswordAuthentication no
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
User ec2-user

I’m using this configuration because I create and discard many EC2 instances: I never want to see chatter from ssh about them, and I never want to remember their key info. Like the OP in this thread, I can easily demonstrate that this single-threads, whatever value of --forks I set. But the cause of this behavior was utterly mysterious, and until I happened across that old stackoverflow thread I was at a loss to debug it.

But again the point isn’t any particular bug or feature that triggers the lock. The point is that if the lock is triggered, the user experiences poor performance without knowing why. Even if there is a better configuration option for my use-case, I still think a nice, clear warning is in order. This is a situation where I’ve specifically asked ansible to run concurrently, using the --forks option. It can’t do that, understandably. But equally I need to know when ansible can’t do what I’m asking it to do.

If I create a pull request that patches the ssh connection plugin to do something like this, what are the chances you would accept it? Looking at the code it may be a bit tricky: the ssh connection plugin only seems to know about the current host, so it doesn’t have the full context of the command. But it does have the runner, which should provide enough context to decide whether or not --forks was used.

“The point is that I’m specifically asking ansible to run concurrently using --forks, and it can’t, but it doesn’t let me know about the problem.”

This self resolves after you approve the hosts.

Again, the hashing host key problem is no longer applicable.

The lock should not be a point of contention if there are no questions to ask.

I’ll have James look into it though.

I believe the initial iteration through the hosts is single-threaded, as that occurs before the forks are created, however can you demonstrate that your configuration is causing single-threaded behavior after the forks are running? The only thing that would cause that would be if each task acquires a global lock at he start of its processing and didn’t release it until it was done, which definitely does not happen in the code anywhere that I can see. The only lock in the ssh connection plugin occurs only when user input is requested - we rely on ssh’s built-in file locking around known_hosts (which occurs even if you’re using /dev/null). Incidentally, we have heard reports of that slowing things down even compared to strict host key checking, so that might be something worth looking into.

Yes, I think so. I observe single-threading for every command throughout long playbooks. Setting ANSIBLE_HOST_KEY_CHECKING=no resolves that.

Does the output from this single command help?

$ ansible -i ec2.py tag_Name_test -f 9 -a date
ec2-54-200-43-114.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:33 UTC 2013
ec2-54-200-40-223.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:35 UTC 2013
ec2-54-200-33-219.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:36 UTC 2013
ec2-54-200-40-249.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:38 UTC 2013
ec2-54-200-43-44.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:40 UTC 2013
ec2-54-200-43-42.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:42 UTC 2013
ec2-54-200-40-224.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:41 UTC 2013
ec2-54-200-42-181.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:43 UTC 2013
ec2-54-200-42-164.us-west-2.compute.amazonaws.com | success | rc=0 >>
Thu Sep 12 21:23:44 UTC 2013

With ANSIBLE_HOST_KEY_CHECKING=no, the results return much more quickly and all nine hosts display the same time (within 1-2 sec anyway).

With respect, not for me. It doesn’t self-resolve because I never add these EC2 instances to my known_hosts file. They are ephemeral, and I don’t want thousands of them clogging up my known_hosts file. The effect of my ssh_config is that amazonaws.com host keys end up in /dev/null.

Again, the point is to make it obvious why --forks isn’t doing anything, whatever might cause that to happen in any given situation. Today it is difficult to debug, and has bitten at least three users. There are probably more who simply wondered why ansible feels so slow, but didn’t raise the matter.

I’ve got a crude patch that does more or less what I want:

diff --git a/lib/ansible/runner/connection_plugins/ssh.py b/lib/ansible/runner/connection_plugins/ssh.py
index 02d47e0…c000765 100644
— a/lib/ansible/runner/connection_plugins/ssh.py
+++ b/lib/ansible/runner/connection_plugins/ssh.py
@@ -28,7 +28,7 @@ import pwd
import gettext
from hashlib import sha1
import ansible.constants as C
-from ansible.callbacks import vvv
+from ansible.callbacks import vvv, display
from ansible import errors
from ansible import utils

@@ -170,8 +170,14 @@ class Connection(object):

the host to known hosts is not intermingled with multiprocess output.

fcntl.lockf(self.runner.process_lockfile, fcntl.LOCK_EX)
fcntl.lockf(self.runner.output_lockfile, fcntl.LOCK_EX)

Ok, so I understand now.

Because we do not see it in known hosts AND you have host key checking enabled globally in Ansible but not disabled for these specific hosts, the locking code is anticipating a question from SSH will engage in anticipation of needing to block for these hosts.

The answer seemst o be making an inventory variable that allows disabling host key checking on a per inventory basis.

It wouldn’t be able to tell what your SSH client would do in advance, but could easily not check for that particular host.

I disagree entirely that --forks should be disabled with unknown host keys, as that’s simply not true – most users will just approve the hosts they need.

However, I’m ok with being able to turn them off for a particular group, which you could set for the ec2_tag_foo group.

Ok, so I understand now.

Because we do not see it in known hosts AND you have host key checking enabled globally in Ansible but not disabled for these specific hosts, the locking code is anticipating a question from SSH will engage in anticipation of needing to block for these hosts.

The answer seemst o be making an inventory variable that allows disabling host key checking on a per inventory basis.

That sounds interesting. So that means modifying the ec2.py script to supply that variable, probably as an ec2.ini option? That could work. If the ec2.py output could be set to automatically disable host key checks for each host in my ec2 inventory, then I could still use host key checking for my small inventory of static hosts. That would be nice.

It wouldn’t be able to tell what your SSH client would do in advance, but could easily not check for that particular host.

I disagree entirely that --forks should be disabled with unknown host keys, as that’s simply not true – most users will just approve the hosts they need.

I don’t think I proposed anything of the kind. That isn’t what my crude patch does, either. The text of the message might be misleading: it’s telling you that --forks isn’t doing much of anything. But it isn’t changing the existing functionality at all.

What I’ve observed is that --forks doesn’t do any good if the hosts are unknown, and that it’s difficult to debug the problem if the prompts are disabled. When this happens I think the user should see what’s happening, and why. When users set --forks, I suspect they’re usually targeting hosts that they’ve already accepted - or as in my situation they don’t care about key checks. So it feels about right to display a warning when there are unknown hosts and --forks is set.

However, I’m ok with being able to turn them off for a particular group, which you could set for the ec2_tag_foo group.

For me those groups are also ephemeral. I’d want to disable host key checking for my entire ec2 inventory, probably as an option in ec2.ini.

I like this idea, but I don’t think it addresses the root problem. When host key checks happen, --forks is effectively ignored. Because this happens without any feedback to that effect, the problem is difficult to debug.

With this inventory variable idea, the ec2.ini default would probably be normal host key checking. So if someone like me sets up ssh_config to ignore amazonaws.com hosts, then the same problem will show up and will be just as difficult to track down. All commands will be serialized, and the user won’t have a clue why that’s happening.

What about a more generic message? A single line like “Checking host key for unknown host %s” would not add much noise on top of ordinary host-key acceptance, while still showing some visible sign of a problem when host-key storage is suppressed. If I had had that extra message logged even when I knew that I had customized ssh_config, I might have found the solution more quickly.

If you’re setting the known hosts to /dev/null in your ssh config, you should disable strict host key checking - it makes no sense to leave that enabled in that situation. If you do that, I believe you said you did not see any performance degradation?

Michael,

The message you proposed is unneccessary when we can get better technical solutions.

The group variable is one of those.

One recent proposal by someone else was to make the ec2 plugin always put hosts in an ec2 group, in which case it would be as simple as

group_vars/ec2: