Multi threading support in Ansible

I am a newbie to ansible but I got to explore how to run a tasks in parallel by spawing a thread for each task instead of a process. My requirement is to run the playbook on my localhost and there is no remote task execution needed.
I also would like to wait for all threads to complete before I move on to a task that has to be serialised.

Can I chose thread vs process when it comes to parallel task execution?
If it is possible to spawn threads from ansible, are they equivalent to python greenthreads or pthreads or something else?

Thank you in advance!

The only current process model is forking. There has been some work done to add a threaded process model, but there are some large hurdles to overcome.

In practice, it is not necessarily more performant, and in many cases it was less performant, as it causes more CPU contention on a single core that is already resource constrained.

Thank you for the prompt reply… Just a curious question: Is the threading work that is underway based on python threads or pthreads or any other threading mechanism? As you mentioned that the threading model is not going to be performant, was the reason being the python’s GIL?

Yes, it would utilize the threading library in Python. The GIL is a primary cause to the CPU restrictions. Our main process that orchestrates all of the task executions is already heavily CPU bound, so adding additional threads to the same core can cause a decrease in performance. Assuming we create a process model plugin type, other process models are possible, such as using asyncio, concurrent.futures, gevent, etc. But I don’t expect this work to be complete any time soon.

So for now, consider forking the only process model for the near future.

Thank you Matt for the detailed and quick reply… Much appreciated the support from the community.

@Matt,

Got another question in concurrency support in Ansible.
Is there any way to limit the number of processes that could be spawned on a given host?
My requirement is not to execute the commands/scripts remotely. In my case, the whole play needs to be executed on locahost only.
I have tried a simple test program and noticed that there are as many as 6 processes are spawned to execute ‘sleep 20’ asynchronously.

Please kindly revert. Thank you inadvance.

Command: ansible-playbook test_playbook.yml --forks=1

Processes:

root 69484 34309 9 04:50 pts/10 00:00:00 /usr/bin/python2 /usr/bin/ansible-playbook test_playbook.yml --forks=1

root 69509 1 0 04:50 ? 00:00:00 /usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/async_wrapper.py 198806654079 50 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/command.py _

root 69510 69509 0 04:50 ? 00:00:00 /usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/async_wrapper.py 198806654079 50 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/command.py _

root 69511 69510 0 04:50 ? 00:00:00 /usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1591876209.82-38354017880191/command.py

root 69512 69511 1 04:50 ? 00:00:00 /usr/bin/python2 /tmp/ansible_f9ckPD/ansible_module_command.py

root 69520 69484 3 04:50 pts/10 00:00:00 /usr/bin/python2 /usr/bin/ansible-playbook test_playbook.yml --forks=1

Code:

[root@oracle-siha file_copy_test]# cat test_playbook.yml

  • name: Testing processes

gather_facts: no

hosts: localhost

tasks:

  • name: run sleep command

async: 50

poll: 0

command: sleep 20

register: res

  • name: wait for the completion

async_status:

jid: “{{ res.ansible_job_id }}”

register: output

until: output.finished

delay: 5

retries: 10

There are a number of steps involved here.

  1. The primary playbook process spawns a worker
  2. The worker executes the async_wrapper for the command module
  3. The async_wrapper forks to daemonize
  4. The async_wrapper executes the transferred module
  5. The actual module is contained within what we call AnsiballZ which is a compressed archive, and it extracts and executes the actual python code
  6. Actual module executing.

forks only limits how many workers can be launched by the primary playbook process, not how many processes will be spawned as a result of the worker.

Thank you Matt!
In the above example I have explicitly passed --forks=1 but still there are 2 worker processes(PIDs 69484 and 69520) were spawned, that means there will be minimum 2 workers get spawned and we can’t limit that to one? I understand that there is no control to limit the total number of processes will be spawned by the workers.

You have 1 worker process. One ansible-playbook process is the control process, the other is the worker.