Scaling Ansible

Fong_Yang · April 19, 2017, 1:21am

We’re evaluating Ansible and other config management tools. I have two issues I would like input from others:

if you have to change ssh keys, what’s the best way to do that across tens of thousands of machines?
if you have tens of thousands of servers under Ansible management, how do scale this to do them all quickly? Ideally, I want to be able run through a playbook across several thousand systems at once (assuming the playbooks will not be downloading additional packages from other hosts). Would be great if Ansible could have multiple controlling hosts but I don’t think this is feature.

Your input is appreciated.

Andrew_Latham · April 19, 2017, 1:25pm

Fong

There are various key management tools, all have their purpose. In CoreOS you could use cloudconfig/cloudinit for example. You can also use Ansible in raw mode to install the keys if needed in a bootstrap method.
Ansible is used in large sites to control great numbers of hosts. I recall several talks from Rackspace siting the running of a single playbook on thousands of hosts some years back now. If you are looking at this, pooling the work will help control the impact at scale of the playbooks.

Ansible is a great tool and I hope it fits your needs.

Fong_Yang · April 19, 2017, 6:00pm

Thanks for the comments. I would be interested to see how others scale out the control node(s). Obviously you can run the playbooks in batches, but this could still take a very long time to execute across tens of thousands of hosts. Plus, if the batch is too large it would overwhelm the control node. Would be nice to see how others are solving this problem.

Greg_DeKoenigsberg1 · April 19, 2017, 6:10pm

Thanks for the comments. I would be interested to see how others scale out
the control node(s). Obviously you can run the playbooks in batches, but
this could still take a very long time to execute across tens of thousands
of hosts. Plus, if the batch is too large it would overwhelm the control
node. Would be nice to see how others are solving this problem.

If you are actually managing tens of thousands of hosts, you're
probably dealing with other issues that would make it worth your while
to consider buying Ansible Tower.

--g

Andrew_Latham · April 19, 2017, 6:11pm

"""Asynchronous Actions and Polling
By default tasks in playbooks block, meaning the connections stay open
until the task is done on each node. This may not always be desirable, or
you may be running operations that take longer than the SSH timeout.

The easiest way to do this is to kick them off all at once and then poll
until they are done.

You will also want to use asynchronous mode on very long running operations
that might be subject to timeout."""

Fong_Yang · April 19, 2017, 6:17pm

I read about that briefly yesterday. Thanks. Will need to read up more about this mode to see how the coordination works. I guess you just keep pulling at the end of all the batches?

sivel · April 19, 2017, 6:27pm

You may also want to look at https://www.slideshare.net/JesseKeating/ansiblefest-rax

It’s a little old, but talks about managing thousands of servers with Ansible.

Topic		Replies	Views
Large-scale deployments with Ansible? Ansible Project	4	30	March 23, 2014
tools to run multiple remote ansible servers? Ansible Project	0	5	January 17, 2015
Serial playbook control is now a core feature! Ansible Project	0	5	August 18, 2012
How Ansible Works? Ansible Project	0	3	January 30, 2018
How Ansible Works? Ansible Project	0	4	February 8, 2018

Scaling Ansible

Related topics