I am a newbie to ansible but I got to explore how to run a tasks in parallel by spawing a thread for each task instead of a process. My requirement is to run the playbook on my localhost and there is no remote task execution needed.
I also would like to wait for all threads to complete before I move on to a task that has to be serialised.
Can I chose thread vs process when it comes to parallel task execution?
If it is possible to spawn threads from ansible, are they equivalent to python greenthreads or pthreads or something else?
The only current process model is forking. There has been some work done to add a threaded process model, but there are some large hurdles to overcome.
In practice, it is not necessarily more performant, and in many cases it was less performant, as it causes more CPU contention on a single core that is already resource constrained.
Thank you for the prompt reply… Just a curious question: Is the threading work that is underway based on python threads or pthreads or any other threading mechanism? As you mentioned that the threading model is not going to be performant, was the reason being the python’s GIL?
Yes, it would utilize the threading library in Python. The GIL is a primary cause to the CPU restrictions. Our main process that orchestrates all of the task executions is already heavily CPU bound, so adding additional threads to the same core can cause a decrease in performance. Assuming we create a process model plugin type, other process models are possible, such as using asyncio, concurrent.futures, gevent, etc. But I don’t expect this work to be complete any time soon.
So for now, consider forking the only process model for the near future.
Got another question in concurrency support in Ansible.
Is there any way to limit the number of processes that could be spawned on a given host?
My requirement is not to execute the commands/scripts remotely. In my case, the whole play needs to be executed on locahost only.
I have tried a simple test program and noticed that there are as many as 6 processes are spawned to execute ‘sleep 20’ asynchronously.
Thank you Matt!
In the above example I have explicitly passed --forks=1 but still there are 2 worker processes(PIDs 69484 and 69520) were spawned, that means there will be minimum 2 workers get spawned and we can’t limit that to one? I understand that there is no control to limit the total number of processes will be spawned by the workers.