parallel execution of playbook at a time in multiple hosts

anandkumar · August 4, 2015, 1:08pm

While executing the playbook the following output i obtained.Tasks execution below are done in sequential manner. This is not the requirement because if i want to run this playbook in multiple hosts at a time it takes more time. I have tried so many methods like forks and serial parameters. But nothing change i observe.So please give a suggestion to execution of one playbook in mulltiple hosts at a time.Because i am running wih hunders of hosts.

root@system:~# ansible-playbook main.yml --limit test

PLAY [test] *******************************************************************

TASK: [mail | service postfix restart] ****************************************
changed: [srv613]
changed: [srv612]
changed: [dsrv143]

TASK: [mail | service dovecot restart] ****************************************
changed: [srv613]
changed: [dsrv143]
changed: [srv612]

TASK: [mail | service opendkim restart] ***************************************
changed: [dsrv143]
changed: [srv613]
changed: [srv612]

TASK: [mail | service apache2 restart] ****************************************
changed: [srv613]
changed: [srv612]
changed: [dsrv143]

TASK: [mail | service cron restart] *******************************************
changed: [srv613]
changed: [dsrv143]
changed: [srv612]

PLAY RECAP ********************************************************************
dsrv143 : ok=5 changed=5 unreachable=0 failed=0
srv612 : ok=5 changed=5 unreachable=0 failed=0
srv613 : ok=5 changed=5 unreachable=0 failed=0

Brian_Coca · August 4, 2015, 2:04pm

playbooks run in a number of hosts in paralel, but the tasks are in
lockstep, all hosts complete task #1 before going to task #2, but each
task is run in parallel in a number of hosts == number of forks
defined (default 5). So with forks = 5 each task will be done in
parallel on 5 hosts at a time until all hosts are done.

serial controls how many hosts go in each batch for the full play, so
if you do serial = 5, 5 hosts will do each task in lockstep until end
of the play, then next 5 hosts start the play.

In 2.0 we introduce strategies that control play executions, so the
default (linear) behaves as per above, a new one called 'free' allows
for each parallel host to run to the end of play w/o waiting for the
other hosts to complete the same task.

anandkumar · August 4, 2015, 2:48pm

Thanks for the response .my ansible version is 1.9.2 so shall i update to new version or does any change in my configuration? Because tasks execution in hosts one after another. I want one particular task in all hosts at a time .But for me particular task in host happening and remaining hosts wait for that host execution.

so please give what is modification to be done?

Brian_Coca · August 4, 2015, 2:51pm

none, the tasks are executing on the 3 hosts in parallel already
(unless you set --forks to 1)

anandkumar · August 4, 2015, 3:05pm

actually it seems like parallel execution but happens one after another. If it is the case happens parallel three outputs of three hosts for particular task cant display at a time .it happen one after another.
timings with forks=1

real 0m45.364s
user 0m0.535s
sys 0m0.246s
timings wyh forks =3

real 0m20.228s
user 0m0.646s
sys 0m0.358s

Brian_Coca · August 4, 2015, 3:10pm

display is serialized, execution is not

anandkumar · August 4, 2015, 3:20pm

Thanks you. And my another issue is that i have one playbook for complete installation of server.while executing this playbook for one server it takes 40 minutes.So with ansible we can create multiple servers with sane time. But for me it takes suppose if i consists 10 servers it takes 10*40=400 minutes taken .So how to optimize this servers creation. my main.yml is

Brian_Coca · August 4, 2015, 3:23pm

-f 10 should take slightly over 40 mins, not 400

Brian_Coca · August 4, 2015, 3:27pm

to be more specific, all hosts will complete in the time of your
slowest host, as they wait for it on each task.

In v2 you can get around this using the free strategy which will allow
each host to complete as fast as it can, though the play itself will
still take as long as the slowest host.

anandkumar · August 4, 2015, 3:45pm

shall i specify forks number in playbook running itself or shall i modify in ansible config file. Because while i modify in ansible.cfg (by default it is forks=5) to forks=20 it shows an errors .So please give clarity about this?

Brian_Coca · August 4, 2015, 3:51pm

it should work either way, i normally use the -f option in the command
line, but ansible.cfg should also work. What specific error do you
get?

anandkumar · August 5, 2015, 5:34am

Thank you very much , i have small doubt about these forks that is , suppose i have 200 hosts, so may i put forks=500 ? I there any problem while assign forks equal more than hosts.
And my another query is that, is there any ssh connections limit? if not how to increase ssh connections? and is there any dependency for ssh from ufw?

anandkumar · August 5, 2015, 9:50am

I have got the following error:

Traceback (most recent call last):
File “/usr/lib/pymodules/python2.7/ansible/runner/init.py”, line 85, in _executor_hook
result_queue.put(return_data)
File “”, line 2, in put
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 758, in _callmethod
conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
Process Process-104:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/usr/lib/python2.7/multiprocessing/process.py”, line 114, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/pymodules/python2.7/ansible/runner/init.py”, line 81, in _executor_hook
while not job_queue.empty():
File “”, line 2, in empty
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 758, in _callmethod
conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
32

can you give an idea to solve this issue?

Florent_Dutheil · August 5, 2015, 1:49pm

I have a similar issue. I’ve take out of the picture the playbooks I usually use, and get to simply use the ping module to isolate the issue.
Execution context:

my inventory has less than 100 hosts, with only their name in it
ssh connection by default to “smart”, but with public key authentication
SSH ControlMaster feature is disabled to be able to get constant connection time.

What I expect: when getting Forks > Number of machines, as you said, the total execution time should be equal to the slowest host, or to the connection timeout (I’ve intentionnally left unreachable hosts in my inventory).

What I have:

time ansible all -i inventory.yml -m ping --forks 5

real 2m40.552s
user 0m3.989s
sys 0m1.760s

time ansible all -i inventory.yml -m ping --forks 100

real 3m6.231s
user 0m8.267s
sys 0m4.404s

This is 100% reproducible, forks = 100 is always slower than forks = 5 or less. Control machine (laptop i5/8Go RAM) is not having RAM or CPU issue while the command runs, and it happens in a ansible dedicated virtual machine too.

Whith forks set to 100, the output highly suggest some kind of sequential bottleneck (very slow output, 1 host at a time), while with forks = 2 I can see results displayed 2 by 2 with, each step taking at maximum the connection timeout for unreachable hosts.

If you have any suggestion concerning any detail I could/should dig deeper, please share

Brian_Coca · August 5, 2015, 1:58pm

@anandkumar so in 1.9 the number of forks will be automatically
reduced to the number of hosts, so specifying a larger number should
not be an issue. You should just adjust it to the resources of your
'master'.

The number of ssh connections from a client is limited only by the
resources available on that machine, on the servers/targets you should
only be making a single connection at a time so no need to tweak
anything there.

The error you get can be caused by hitting a resource limit on your
'master' (check logs, dmesg). Consider that it is not only ssh
connections but ansible forks, which need to copy over the data,
execute it, receive the results and then display them and update
shared variables while still handling all inventory and vars data
provided.

@florent, seems like you are hitting resource constraints on your
ansible machine, try lower number of forks.

Florent_Dutheil · August 5, 2015, 3:53pm

What type of resource constraint would be we be talking of?

Using wireshark, it seems ansible is not firing up enough DNS requests from the beginning (and only small batchs of requests during the whole execution after that) so that it would honor “forks” simultaneous SSH connections.

So I went down the hypothesis that something is wrong about DNS resolution or dealing with unreachable hosts: I added ansible_ssh_host= for each host, removed unreachable hosts from the inventory and ran a ping again:

$ time ansible all -i inventory_test -m ping --forks 5
real 0m21.618s
user 0m3.762s
sys 0m1.391s

$ time ansible all -i inventory_test -m ping --forks 20
real 0m17.872s
user 0m5.063s
sys 0m1.840s

$ time ansible all -i inventory_test -m ping --forks 100
real 0m17.341s
user 0m7.701s
sys 0m2.968s

Ok, there could be a very slow DNS resolver here on my side, but that wouldn’t prevent ansible to put “forks” requests at a time. The good point is that we can see that forks=100 is faster that forks=5 for 73 hosts, which is expected.

Is the handling of DNS requests by ansible “costly” in term of resources, that would imply reducing “–forks” on the control machine?
Is it something that would alter the code path and induced some sort of lock or degrade parallelism compared to the same tasks/modules calls when dealing directly with IP addresses in inventory?

Brian_Coca · August 5, 2015, 4:15pm

no, by resources I meant cpu, ram, bandwith, etc, slow dns resolution
might make some forks and the overall performance degrade but it
should not factor too much on how many ssh connections you can open at
the same time, unless you are using a single proc/thread resolver.

Florent_Dutheil · August 6, 2015, 8:38am

Thanks for your suggestion Brian.

As I stated, I’ve looked at the CPU & RAM usage, they were perfectly fine.
I’ll see if network is limiting anything, but I have some doubts: it’s a entreprise wired LAN, and the logical behaviour of a bandwith limited ansible would be:

phase 1: many DNS requests, some of them would be slow, timeout maybe, and be reemitted
phase2: begin some SSH connection, but still struggling with DNS requests
phase 3: DNS requests have mainly been done, SSH connections now struggles to be established
But it’s not what I’ve seen with wireshark (see previous message, only small batches of dozens of DNS requests at a time, after an initial phase with very few DNS requests).

The idea of a mono threaded resolver is really interesting. You mean at the control machine OS level?

Brian_Coca · August 6, 2015, 3:03pm

The mono threaded resolver was a hypothetical, i doubt any really
exist, unless someone was debugging the resolver and forgot to revert
the concurrency settings.

what do you mean by 'struggles to be established'?

anandkumar · August 7, 2015, 5:04am

Thanks you giving suggestion, I am having small issue that , I am having master node with 1GB ram so shall i extended to this ram to 2GB or more?
Because you mentioned that previosly posted my issue is that cpu and ram usage.so shall i expanded to 2GB enough or more than this?
while check dmesg the following result i obtained.

[2038015.659615] show_signal_msg: 11 callbacks suppressed
[2038015.663931] python[10604]: segfault at 24 ip 0000000000558077 sp 00007ffc1ddaf460 error 6
[2038015.667147] python[10575]: segfault at 24 ip 0000000000558077 sp 00007ffe94bcdbe0 error 6python[10588]: segfault at 24 ip 0000000000557db8 sp 00007ffd66ad0630 error 6
[2038015.706367] python[10594]: segfault at 24 ip 0000000000558077 sp 00007ffcceee5500 error 6python[10586]: segfault at 24 ip 0000000000558077 sp 00007ffe0ba38250 error 6 in python2.7[400000+2bc000]
[2038015.720197] in python2.7[400000+2bc000]
[2038015.770903] python[10625]: segfault at 24 ip 0000000000537388 sp 00007ffe765d3c80 error 6 in python2.7[400000+2bc000]
[2038015.826069] in python2.7[400000+2bc000]
[2038015.828951] in python2.7[400000+2bc000]
[2038015.872324] python[10592]: segfault at 24 ip 0000000000557db8 sp 00007ffd81a04310 error 6 in python2.7[400000+2bc000]
[2038015.920538] Core dump to |/usr/share/apport/apport 10592 11 0 10592 pipe failed
[2038015.933555] Core dump to |/usr/share/apport/apport 10586 11 0 10586 pipe failed
[2038015.950079] in python2.7[400000+2bc000]
[2038523.352829] python[11774]: segfault at 24 ip 0000000000537388 sp 00007ffd64a8bfc0 error 6 in python2.7[400000+2bc000]
[2038523.360200] ------------[ cut here ]------------
[2038523.361416] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:1838!
[2038523.362618] invalid opcode: 0000 [#2] SMP
[2038523.363800] Modules linked in: kvm_intel(X) kvm(X) crct10dif_pclmul(X) crc32_pclmul(X) ghash_clmulni_intel(X) aesni_intel(X) aes_x86_64(X) lrw(X) gf128mul(X) glue_helper(X) ablk_helper(X) cryptd(X) cirrus(X) ttm(X) serio_raw(X) drm_kms_helper(X) drm(X) syscopyarea(X) sysfillrect(X) sysimgblt(X) i2c_piix4(X) lp(X) mac_hid(X) parport(X) psmouse floppy pata_acpi
[2038523.364010] CPU: 0 PID: 11812 Comm: kworker/u2:1 Tainted: G D X 3.13.0-52-generic #85-Ubuntu
[2038523.364010] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[2038523.364010] task: ffff880027801800 ti: ffff880027810000 task.ti: ffff880027810000
[2038523.364010] RIP: 0010:[] [] __get_user_pages+0x351/0x5e0
[2038523.364010] RSP: 0018:ffff880027811d20 EFLAGS: 00010246
[2038523.364010] RAX: 0000000000000040 RBX: 0000000000000017 RCX: 0000800000000000
[2038523.364010] RDX: 00007fffffe00000 RSI: 0000000008118173 RDI: ffff88000dac0780
[2038523.364010] RBP: ffff880027811db0 R08: ffffffff81c3f820 R09: 0000000000000001
[2038523.364010] R10: 0000000000000040 R11: ffff880009492640 R12: ffff88000dac0780
[2038523.364010] R13: ffff880027801800 R14: ffff88000ac1fc00 R15: 0000000000000000
[2038523.364010] FS: 0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[2038523.364010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2038523.364010] CR2: 0000000000435470 CR3: 0000000001c0e000 CR4: 00000000001407f0
[2038523.364010] Stack:
[2038523.364010] 0000000000000000 0000000000000080 0000000000000000 ffff880027801800
[2038523.364010] ffff880027811fd8 ffff880027801800 ffff880027811e38 0000000000000000
[2038523.364010] 0000000000000020 000000170936c888 0000000000000001 00007fffffffefdf
[2038523.364010] Call Trace:
[2038523.364010] [] get_user_pages+0x52/0x60
[2038523.364010] [] copy_strings.isra.17+0x256/0x2e0
[2038523.364010] [] copy_strings_kernel+0x34/0x40
[2038523.364010] [] do_execve_common.isra.23+0x4fc/0x7e0
[2038523.364010] [] do_execve+0x18/0x20
[2038523.364010] [] ____call_usermodehelper+0x108/0x170
[2038523.364010] [] ? generic_block_bmap+0x50/0x50
[2038523.364010] [] ? ____call_usermodehelper+0x170/0x170
[2038523.364010] [] call_helper+0x1e/0x20
[2038523.364010] [] ret_from_fork+0x7c/0xb0
[2038523.364010] [] ? ____call_usermodehelper+0x170/0x170
[2038523.364010] Code: 45 a8 48 85 c0 0f 85 01 ff ff ff 8b 45 bc 25 00 01 00 00 83 f8 01 48 19 c0 83 e0 77 48 2d 85 00 00 00 e9 e5 fe ff ff a8 02 75 ae <0f> 0b 0f 1f 44 00 00 48 8b 55 c8 48 81 e2 00 f0 ff ff f6 45 bc
[2038523.364010] RIP [] __get_user_pages+0x351/0x5e0
[2038523.364010] RSP
[2038523.412198] —[ end trace a315abcee87673a6 ]—
[2038523.433078] python[11666]: segfault at 24 ip 0000000000558077 sp 00007ffff9628860 error 6 in python2.7[400000+2bc000]
[2038523.441081] python[11779]: segfault at 24 ip 0000000000537388 sp 00007ffcacc2e900 error 6 in python2.7[400000+2bc000]
[2038523.478649] python[11368]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5f00 error 6 in python2.7[400000+2bc000]
[2038523.486958] python[10938]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5bc0 error 6 in python2.7[400000+2bc000]
[2038523.492400] python[11313]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5f00 error 6 in python2.7[400000+2bc000]
[2038523.494763] python[11431]: segfault at 24 ip 00000000004c4bce sp 00007ffc59fb5f00 error 6 in python2.7[400000+2bc000]
[2038523.500881] Core dump to |/usr/share/apport/apport 11313 11 0 11313 pipe failed
[2038523.502892] Core dump to |/usr/share/apport/apport 10938 11 0 10938 pipe failed
[2038523.508463] Core dump to |/usr/share/apport/apport 11368 11 0 11368 pipe failed
[2038523.601005] Core dump to |/usr/share/apport/apport 11431 11 0 11431 pipe failed
[2038523.819813] Core dump to |/usr/share/apport/apport 11666 11 0 11666 pipe failed
[2038523.843761] Core dump to |/usr/share/apport/apport 11779 11 0 11779 pipe failed
[2150747.189478] Request for unknown module key ‘Magrathea: Glacier signing key: 1981bc916ffc00599231ec5630e666e0256fd6f1’ err -11
[2150747.217323] Request for unknown module key ‘Magrathea: Glacier signing key: 1981bc916ffc00599231ec5630e666e0256fd6f1’ err -11
[2150747.225605] ip_tables: (C) 2000-2006 Netfilter Core Team
[2150747.236293] Request for unknown module key ‘Magrathea: Glacier signing key: 1981bc916ffc00599231ec5630e666e0256fd6f1’ err -11
[2220566.325680] ------------[ cut here ]------------
[2220566.328031] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:1838!
[2220566.328031] invalid opcode: 0000 [#3] SMP
[2220566.328031] Modules linked in: iptable_filter(X) ip_tables(X) x_tables(X) kvm_intel(X) kvm(X) crct10dif_pclmul(X) crc32_pclmul(X) ghash_clmulni_intel(X) aesni_intel(X) aes_x86_64(X) lrw(X) gf128mul(X) glue_helper(X) ablk_helper(X) cryptd(X) cirrus(X) ttm(X) serio_raw(X) drm_kms_helper(X) drm(X) syscopyarea(X) sysfillrect(X) sysimgblt(X) i2c_piix4(X) lp(X) mac_hid(X) parport(X) psmouse floppy pata_acpi
[2220566.328031] CPU: 0 PID: 22634 Comm: python Tainted: G D X 3.13.0-52-generic #85-Ubuntu
[2220566.328031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[2220566.328031] task: ffff88000e970000 ti: ffff88000f3e6000 task.ti: ffff88000f3e6000
[2220566.328031] RIP: 0010:[] [] __get_user_pages+0x351/0x5e0
[2220566.328031] RSP: 0018:ffff88000f3e7d40 EFLAGS: 00010246
[2220566.328031] RAX: 0000000000000040 RBX: 0000000000000017 RCX: 0000800000000000
[2220566.328031] RDX: 00007fffffe00000 RSI: 0000000008118173 RDI: ffff88003b3f8540
[2220566.328031] RBP: ffff88000f3e7dd0 R08: ffffffff81c3f820 R09: 0000000000000001
[2220566.328031] R10: 0000000000000040 R11: ffff880004ac4740 R12: ffff88003b3f8540
[2220566.328031] R13: ffff88000e970000 R14: ffff880004a10a80 R15: 0000000000000000
[2220566.328031] FS: 00007fe4f2285740(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[2220566.328031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2220566.328031] CR2: 00000000187a0000 CR3: 0000000004b56000 CR4: 00000000001407f0
[2220566.328031] Stack:
[2220566.328031] 0000000000000000 0000000000000080 0000000000000000 ffff88000e970000
[2220566.328031] ffff88000f3e7fd8 ffff88000e970000 ffff88000f3e7e58 0000000000000000
[2220566.328031] 0000000000000020 0000001701958b40 0000000000000001 00007fffffffefea
[2220566.328031] Call Trace:
[2220566.328031] [] get_user_pages+0x52/0x60
[2220566.328031] [] copy_strings.isra.17+0x256/0x2e0
[2220566.328031] [] copy_strings_kernel+0x34/0x40
[2220566.328031] [] do_execve_common.isra.23+0x4fc/0x7e0
[2220566.328031] [] SyS_execve+0x36/0x50
[2220566.328031] [] stub_execve+0x69/0xa0
[2220566.328031] Code: 45 a8 48 85 c0 0f 85 01 ff ff ff 8b 45 bc 25 00 01 00 00 83 f8 01 48 19 c0 83 e0 77 48 2d 85 00 00 00 e9 e5 fe ff ff a8 02 75 ae <0f> 0b 0f 1f 44 00 00 48 8b 55 c8 48 81 e2 00 f0 ff ff f6 45 bc
[2220566.328031] RIP [] __get_user_pages+0x351/0x5e0
[2220566.328031] RSP
[2220568.259276] —[ end trace a315abcee87673a7 ]—
[2220568.453180] python[21669]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.496568] python[21668]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.635851] python[21687]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.647618] python[21653]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.858297] python[21657]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.869836] python[21801]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220568.890070] python[22088]: segfault at 24 ip 00000000005377c7 sp 00007ffc4ac1ba70 error 6 in python2.7[400000+2bc000]
[2220569.041938] Core dump to |/usr/share/apport/apport 21668 11 0 21668 pipe failed
[2220569.088505] Core dump to |/usr/share/apport/apport 21669 11 0 21669 pipe failed
[2220569.194507] python[21679]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220569.208311] python[21821]: segfault at 24 ip 000000000052a36c sp 00007ffc4ac1ddb0 error 6 in python2.7[400000+2bc000]
[2220569.234079] Core dump to |/usr/share/apport/apport 22088 11 0 22088 pipe failed
[2220569.252954] Core dump to |/usr/share/apport/apport 21657 11 0 21657 pipe failed
[2220569.388225] python[21692]: segfault at 24 ip 00000000004c4bce sp 00007ffc4ac1bea0 error 6 in python2.7[400000+2bc000]
[2220569.390181] Core dump to |/usr/share/apport/apport 21801 11 0 21801 pipe failed
[2220569.536051] Core dump to |/usr/share/apport/apport 21653 11 0 21653 pipe failed
[2220569.939338] Core dump to |/usr/share/apport/apport 21679 11 0 21679 pipe failed
[2220570.153310] Core dump to |/usr/share/apport/apport 21692 11 0 21692 pipe failed
what does this error mean?

And how to upgrade ansible 1.9.2 version to latest 2.0 version?

Topic		Replies	Views
parallel execution Ansible Project	41	76	June 9, 2016
Not forking parrallel tasks? Ansible Project ubuntu	20	3	August 11, 2015
errors regarding ansible task execution on multiple hosts Ansible Developer ubuntu	0	0	September 23, 2015
How can I troubleshoot the speed of ansible-playbook runs? Ansible Project	6	12	December 3, 2014
Ansible fact gathering is slow with host_key_checking = True Ansible Project ubuntu	11	5	September 29, 2014

parallel execution of playbook at a time in multiple hosts

time ansible all -i inventory.yml -m ping --forks 5

time ansible all -i inventory.yml -m ping --forks 100

Related topics