Threadsafety with Ansible Python API

We are currently using Ansible rev 1.5.4 and would like to know if Ansible is thread safe when using it with their python bindings (http://docs.ansible.com/developing_api.html)? In our environment we have a python application with a thread pool and we have integrated Ansible’s python bindings. We have observed that when executing with threads an error occurs in a playbook the execution hangs.

The execution flow is the following:

  1. Generate ssh keys and copy them to managed hosts
  2. Execute ansible run

When the thread pool is 1 there are zero failures. However when greater than 1 and an error occurs in a playbook, the execution stalls for one of the cluster and never completes the other clusters usually proceed without error.

Debugging this I have validated that the inventory an playbook objects are correctly populated. In addition to isolate if this was an issue with those hosts themselves, I took 3 clusters and randomly one would fail others pass it was not failing on just a specific set of hosts rather it was random. In run x cluster 1 may hang in run y cluster 1 may pass. Retaining the Ansible scripts on the hosts shows that setup can run without error on any host. Below I am simulating an error on the single playbook task, if no error occurs all 3 clusters succeed.

[THREAD 1]
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************

<— HANGS HERE FOR CLUSTER A —>

[THREAD 2]
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
ok: [x.x.x.211]
ok: [x.x.x.212]
ok: [x.x.x.210]
TASK: [common/pre | Update hostname -] ***********************
ok: [x.x.x.212] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.210] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.211] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’

<— SUCCEEDS FOR CLUSTER B THREAD 2 EXITS—>

[THREAD 3]
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
ok: [x.x.x.215]
ok: [x.x.x.216]
ok: [x.x.x.217]
TASK: [common/pre | Update hostname -] ***********************
ok: [x.x.x.216] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.215] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.217] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’

<— SUCCEEDS FOR CLUSTER C THREAD 3 EXITS—>

Other Important Info:

  1. Sudo = no
  2. Transport = paramilko
  3. Strict host key checking is False

We can reproduce this error every time. As stated above if we set the thread count to 1 (no threads) we can orchestrate all clusters whether they have errors or not. I also believe I have ruled out any issues in our code based on inspection I am showing that the inventory and playbook objects are correct. To keep things simple I am testing with a playbook that only has one task, which is setting the hostname.

So I am trying to ascertain if ansible as executed through the python API is thread safe?

Thank You

Hi Ryan,

Sounds like you are spawning Ansible via multiple threads?

Ansible as a program actually uses forks underneath via the multiprocessing library. However it also uses flock at various points - and it’s known that you don’t get working flocks inside threads, but do with forks. So, I’m not sure what you are seeing – and it might totally NOT be related – but I’m also not really surprised.
I do suspect your “hang” issue is quite different though, and your “host_names” issue might be totally different to. Without seeing playbooks there is no way to tell what you have going on.

I can’t say what you are really doing is supported (i.e. not something we’d spend time on), however if you use Ansible tower you have a very very nice REST API for spawning parallel jobs and monitoring them to completion which is already written.

http://www.ansible.com/tower

This is absolutely the best solution for doing this sort of thing, and takes the CPU capabilities you have into account when deciding how many parallel jobs to take on at once.

–Michael

Hi Michael,

Thank you for the reply. This now makes sense as to why I am seeing this behaviour. With respect to the playbooks, my methodology of testing this issue was to rule out any issues with the playbook first. What I did was create a single playbook with one single task (setting the hostname). The error messages from the playbook execution are expected, I am forcing an error because that is when I see the concurrency issues with the hanging. When playbook execution succeeds I never see the issue, only when an error occurs inside the playbook do I encounter the issue.

We will continue to take a stab at this thank you again for your reply and help. The clarification is very much appreciated.

Ryan