I have an inventory of about 3k+ devices which I am running a playbook with a serial of 10.
I do not know exactly where the playbook stops running but it’s less then 1k.
Is it known issue? feature?
What are the solutions to resolve this issue?
Hi Eliezer. Your playbook stops in what way and with what error message? Without more info we can’t even make an educated guess .
In my experience, dealing with large number of hosts/devices can be memory intensive so you can check your memory consumption during the execution of Ansible and see if Ansible is being killed by OOM killer.
serial ‘batches’ hosts and makes a logical failure group (look at forks instead if you want to avoid this), if all hosts in a ‘serial batch’ fail, the play will fail. You probably have such a batch at one point.
Hey,
Thanks for the detailed response.
I have verified that there is enough ram and swap for my use case so no oom killer is present.
The ram is not even filled ie 1.3 gb from 16gb is in use.
The issue was found to be a batch of 10 failed hosts.
Eventually I decided to run the task with a set of scripts I wrote myself and it’s much more efficient in both memory and other resources such as cpu and network.
The time it took to finish this task using the customized scripts took about 8 minutes compared to more then 30 minutes with ansible and which I stopped since it was a longer then expected run.
So unknown 30+ compared to 8-9 minutes is a no brainer for me.
Again thanks!