parallel execution

I looked at the code in ssh.py, and I don’t understand why you don’t take /etc/ssh/ssh_known_hosts2 under consideration when deciding if the host is in known_hosts. The only file that is taken under consideration is $USER/.ssh/known_hosts.

Do you think that including standard locations is a good idea?


def not_in_host_file(self, host):
host_file = os.path.expanduser(os.path.expandvars(“~${USER}/.ssh/known_hosts”))
if not os.path.exists(host_file):
print “previous known host file not found”
return True

Thanks!
iordan

I’ve got nothing against these additions to look in both locations.

I’d be happy to see pull requests to this effect if you want to pass them along, or if not, you can file a ticket so we don’t forget.

Thanks!

–Michael

Hi Michael,

Hi again,

I'm not sure how you want to deal with this pull request, so here is a
link to the actual patch:

https://github.com/ansible/ansible/pull/6156.patch

for you to have multiple options.

Cheers,
iordan

I seem to be experiencing the same or similar issue.

My ansible.cfg:

[si-cluster-settings]
host_key_checking = False
hostfile = hosts
forks = 15

When I run a playbook on 10 nodes, they are definitely running serially as I see large delays between results coming back on each node. I also tried setting serial: 10 in the actual playbook.

The group you have in inventory called “si-cluster-settings” is the problem.

The Ansible inventory parser doesn’t know anything about that category of settings.

They go in a section called “defaults” like so: https://github.com/ansible/ansible/blob/devel/examples/ansible.cfg

Just an update at Michael’s request - seeing the exact same situations, with ec2.

Setting this environment variable fixes this.

Any chance I can get a copy of your known_hosts file?

Off list would be preferred.

I’m not sure that’s it, but I suspect it could be.

Not sure how I’d send you a copy of /dev/null, unless ansible is attempting to parse the contents of ~/.ssh/known_hosts outside of ssh.

Hi Vincent, could you share a sample of the playbook you’re running as well as the results of running it with -f1, -f2 and -f4? That should determine if the playbook is indeed being serialized at some point.

Do note, however, if you’re doing something like this:

  • local_action: ec2 …
    with_items:

you will see serialized performance. This is caused by the fact that each pass through with_* loops must complete on all hosts before the next loop begins, and with local_action you’d only be executing on a single host (localhost), so this would constrain the playbook to a serial-like performance.

Thanks!

Exactly like what was described at the start of this thread. :frowning: Setting the environment variable produces the desired parallel execution.

Ansible does read ~/.ssh/known_hosts because it needs to know whether to lock itself down to 1 process to ask you the question about adding a new hosts to known_hosts.

This only happens when it detects a host isn’t already there, because it must detect this before SSH asks.

And this only happens with -c ssh, -c paramiko has it’s own handling (and it’s own issues, I prefer the SSH implementation if folks have a new enough SSH to use ControlPersist).

Vincent, I now use a slightly different workaround. Instead of routing known_hosts to /dev/null I route it to a temp file. This keeps the EC2 noise out of my default known_hosts file, and seems to play well with ansible.

`
Host *.amazonaws.com
PasswordAuthentication no
StrictHostKeyChecking no
UserKnownHostsFile /tmp/ec2_known_hosts
User ec2-user

`

Hope that helps you.

– Mike

Hi James,

Each loop DOES happen within the host loop.

If you have 50 hosts and they are “with_items”'ing, that still happens 50 hosts at a time.

So I’m confused - are you saying you are using known_hosts that are empty?

This seems to be a completely unrelated question.

The mention of /dev/null above seemed to be based on confusion that we didn’t read it, not that it was actually symlinked to /dev/null.

Can each of you clarify?

I took it that Vincent was referring to my message of 2013-09-12. In that post I mentioned using /dev/null for the ssh UserKnownHostsFile configuration key, scoped to Host *.amazonaws.com

This configuration triggers single-threaded behavior from ansible because ssh never stores any record of connecting to the EC2 hosts: not the first time, not ever. Because known_hosts is /dev/null.

– Mike

Ansible does not find your known hosts location from ~/.ssh/config on a per host basis and does read your ~/.ssh/known_hosts.

It does this because it needs to know, in advance of SSH asking, whether it needs to lock.

Assume it’s running at 50/200 forks and needs to ask a question interactively, that’s why it needs to know.

So if you are saying use known_hosts in a different file, that may be EXACTLY the problem. With host key checking on, and the data going elsewhere, it can’t be found, and ansible is locking pre-emptively.

I’m wondering if we can detect configuration of alternative known_hosts locations in the ~/.ssh/config and issue a warning, which should be able to key people in to turn off the checking feature.

This should close this out, I’d think.

Sounds like some great possible solutions.

Either

  1. Reading the SSH config to pick up the correct known_hosts locations (and perhaps setting ‘host_key_checking’ to false if the location is ‘/dev/null’ since that’s a common pattern - for instance, Vagrant does this by default, see https://docs.vagrantup.com/v2/cli/ssh_config.html )

or

  1. A simple warning message when serialization is triggered due to known_hosts in order to save folks from some really tough debugging

Just lost a few hours debugging this issue. For several environments, I have a client’s known_hosts locations set to custom locations in their SSH config, so everything was running serially (a 3 minute process * 20 servers = 60 minutes!). Persistence and sweat finally lead me to try “host_key_checking = False” and it finally ran in parallel - was so nice to finally see since I’d tried just about everything else I could imagine (forks, serial, ssh options, restructuring inventory, removing inventory groups, etc).

Thanks,
Matt

Any updates on this? I took a gander through the GitHub issues but didn’t see one that seemed related.