parallel execution of playbook at a time in multiple hosts

Brian_Coca · August 7, 2015, 5:07am

I don't know if this is a lack of memory, that normally gets a kernel
message mentioning killing off processes, this looks like something
much worse that is causing segfaults all over.

[2220566.328031] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:1838!
[2220566.328031] invalid opcode: 0000 [#3] SMP

looks like some nasty kernel bug related to memory allocation.

anandkumar · August 7, 2015, 5:40am

How to upgrade to ansible latest version? and how to solve the the following forks issue? Because i am very much struggling this issue?
ansible-playbook ssh.yml --force-handlers --forks=100

PLAY [Transfer and execute a script.] *****************************************

TASK: [Transfer the script] ***************************************************
changed: [dsrv493 → 127.0.0.1]
changed: [dsrv487 → 127.0.0.1]
changed: [dsrv486 → 127.0.0.1]
changed: [dsrv209 → 127.0.0.1]
changed: [dsrv488 → 127.0.0.1]
changed: [dsrv531 → 127.0.0.1]
Process SyncManager-1:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/usr/lib/python2.7/multiprocessing/process.py”, line 114, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 558, in _run_server
server.serve_forever()
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 184, in serve_forever
t.start()
File “/usr/lib/python2.7/threading.py”, line 745, in start
_start_new_thread(self.__bootstrap, ())
error: can’t start new thread
Traceback (most recent call last):
File “/usr/lib/pymodules/python2.7/ansible/runner/init.py”, line 85, in _executor_hook
Process Process-85:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
Traceback (most recent call last):
File “/usr/bin/ansible-playbook”, line 324, in
self.run()
sys.exit(main(sys.argv[1:]))
File “/usr/bin/ansible-playbook”, line 264, in main
File “/usr/lib/python2.7/multiprocessing/process.py”, line 114, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/pymodules/python2.7/ansible/runner/init.py”, line 81, in _executor_hook
result_queue.put(return_data)
pb.run()
File “/usr/lib/pymodules/python2.7/ansible/playbook/init.py”, line 348, in run
File “”, line 2, in put
if not self._run_play(play):
File “/usr/lib/pymodules/python2.7/ansible/playbook/init.py”, line 789, in _run_play
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 758, in _callmethod
if not self._run_task(play, task, False):
File “/usr/lib/pymodules/python2.7/ansible/playbook/init.py”, line 497, in _run_task
results = self._run_task_internal(task, include_failed=include_failed)
File “/usr/lib/pymodules/python2.7/ansible/playbook/init.py”, line 439, in _run_task_internal
results = runner.run()
File “/usr/lib/pymodules/python2.7/ansible/runner/init.py”, line 1485, in run
Process Process-86:
while not job_queue.empty():
File “”, line 2, in empty
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 755, in _callmethod
conn.send((self._id, methodname, args, kwds))
results = self._parallel_exec(hosts)
File “/usr/lib/pymodules/python2.7/ansible/runner/init.py”, line 1393, in _parallel_exec
IOError: [Errno 32] Broken pipe
prc.start()
File “/usr/lib/python2.7/multiprocessing/process.py”, line 130, in start
self._connect()
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File “/usr/lib/python2.7/multiprocessing/connection.py”, line 175, in Client
Traceback (most recent call last):
self._popen = Popen(self)
File “/usr/lib/python2.7/multiprocessing/forking.py”, line 121, in init
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
changed: [dsrv449 → 127.0.0.1]

Can you please tell how to solve this issue? Because this makes me lot of issues while running in this 100 servers.

Brian_Coca · August 7, 2015, 1:35pm

I'm not sure your issues are ansible related, just triggered by
ansible forks. I don't think upgrading to the lastest version will
solve anything for you, you need to track down why your kernel is
hitting those segfault issues.

Florent_Dutheil · August 10, 2015, 11:36am

@Anand: you clearly have issues with your control host (hadware/OS). Try testing hardware and get rid of these messages before getting another try with ansible (or any software) on this host.

@Brian:
TL;DR: Good news: I have no DNS issue anymore. Bad news: It seems the root cause of my previous observations is that ansible has issues when dealing with errors related to hosts (hosts unreachable, etc), and it hurts its parallelism very badly.

1- Not a DNS issue

I got rid of all faulty hosts and ran a new bunch of tests. Execution times are now identical between an inventory only with hostnames and an inventory with explicit IP addresses in ansible_ssh_host variable. It seems that was environmental. I’ll keep an eye on it on my side.
[reminder: still with pipeling enabled but ControlMaster disabled]

$ time ansible all -i inventory_without_ip -m ping
real 0m9.145s
user 0m8.239s
sys 0m2.787s
$ time ansible all -i inventory_with_ip -m ping
real 0m9.040s
user 0m7.570s
sys 0m2.723s

I additionaly checked the items you suggested:

DNS resolution seems fine (on one host, there is a local DNS cache, on the other host, it does directly interact with DNS servers):
time dig @127.0.1.1 -f inventory.yml → real 0m0.114s
time dig @datacenter_dns_server -f inventory.yml → real 0m0.118s
conclusion: so none of the control machines has DNS issues.- RAM: no problem there (one control host has 1GB, 0 swap used during tests, the other has 8GB).
CPU: one core can briefly be maxed out, but most of the time, CPU usage is <10% of one core. That explains only the slight delta on execution time between my 2 control hosts (first has 1 core, the other one 4).

Anyway, DNS topic closed.

2- Ansible handling hosts errors

What I can easily see though, is that ansible is not handling errors gracefully and that hurts a lot the intended parallelism: there is a time penalty for each host being in error (unreachable, etc). To simulate this, I added fake hosts entries in my inventories (format: ‘<non_existing_hostname> ansible_ssh_host=’) , and that gives the following results (forks=100, everything should be executed in parallel)

0 fake hosts: time ansible all -i inventory_test -m ping
real 0m8.942s
user 0m7.208s
sys 0m2.830s
1 fake hosts: time ansible all -i inventory_test -m ping
real 0m18.951s
user 0m7.337s
sys 0m2.733s
SSH timeout is set to 10 in ansible.cfg. It seems to had 10 secs to previous execution time. Ok, fair enough.
2 fake hosts: time ansible all -i inventory_test -m ping
real 0m21.471s
user 0m7.720s
sys 0m2.910s
Now there is something weird.
3 fake hosts: time ansible all -i inventory_test -m ping
real 0m31.229s
user 0m7.600s
sys 0m2.832s
Ouch!
4 fake hosts: time ansible all -i inventory_test -m ping
real 0m41.139s
user 0m7.591s
sys 0m2.847s
Ok, there is a pattern now.
5 fake hosts: time ansible all -i inventory_test -m ping
real 0m51.172s
user 0m7.563s
sys 0m2.939s
It’s confirmed.

That is not what I expected: the time of the whole run should not increase past a certain value (the maximum time between the slowest host and a host in error). Each additional faulty host adds like the whole timeout time, which strongly suggests some sequential algorithm, not something going into parallel threads.

3- Conclusion

That may explain my original observations: faulty hosts increase execution time linearly, showing a “serial/sequential” behaviour instead of treating all hosts at a time.

Finally consistent facts to work on!

Brian, are you able to reproduce this?

Regards,

Florent.

Brian_Coca · August 10, 2015, 2:15pm

When dealing with the first contacts with hosts, ansible must deal
with it sequentially as it might need to update the known_hosts file,
otherwise the file will be corrupted, that is probably what you are
seeing with your 'invalid hosts'.

Florent_Dutheil · August 10, 2015, 3:37pm

Thank you Brian for the explanation.

Topic		Replies	Views
how to execute one playbook in all hosts at a time. Ansible Project	0	2	August 4, 2015
parallel execution Ansible Project	41	45	June 9, 2016
Running tasks in parallel in a playbook Ansible Project	11	32	September 8, 2017
Execute tasks on multiple hosts in parallel which fall under different group Ansible Project	0	2	July 6, 2016
Parallelism in single host Ansible Project	2	9	July 8, 2016

parallel execution of playbook at a time in multiple hosts

Related topics