We are currently using Ansible rev 1.5.4 and would like to know if Ansible is thread safe when using it with their python bindings (http://docs.ansible.com/developing_api.html)? In our environment we have a python application with a thread pool and we have integrated Ansible’s python bindings. We have observed that when executing with threads an error occurs in a playbook the execution hangs.
The execution flow is the following:
- Generate ssh keys and copy them to managed hosts
- Execute ansible run
When the thread pool is 1 there are zero failures. However when greater than 1 and an error occurs in a playbook, the execution stalls for one of the cluster and never completes the other clusters usually proceed without error.
Debugging this I have validated that the inventory an playbook objects are correctly populated. In addition to isolate if this was an issue with those hosts themselves, I took 3 clusters and randomly one would fail others pass it was not failing on just a specific set of hosts rather it was random. In run x cluster 1 may hang in run y cluster 1 may pass. Retaining the Ansible scripts on the hosts shows that setup can run without error on any host. Below I am simulating an error on the single playbook task, if no error occurs all 3 clusters succeed.
[THREAD 1]
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
<— HANGS HERE FOR CLUSTER A —>
[THREAD 2]
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
ok: [x.x.x.211]
ok: [x.x.x.212]
ok: [x.x.x.210]
TASK: [common/pre | Update hostname -] ***********************
ok: [x.x.x.212] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.210] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.211] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
<— SUCCEEDS FOR CLUSTER B THREAD 2 EXITS—>
[THREAD 3]
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
ok: [x.x.x.215]
ok: [x.x.x.216]
ok: [x.x.x.217]
TASK: [common/pre | Update hostname -] ***********************
ok: [x.x.x.216] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.215] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
ok: [x.x.x.217] => One or more undefined variables: ‘dict object’ has no attribute ‘host_names’
<— SUCCEEDS FOR CLUSTER C THREAD 3 EXITS—>
Other Important Info:
- Sudo = no
- Transport = paramilko
- Strict host key checking is False
We can reproduce this error every time. As stated above if we set the thread count to 1 (no threads) we can orchestrate all clusters whether they have errors or not. I also believe I have ruled out any issues in our code based on inspection I am showing that the inventory and playbook objects are correct. To keep things simple I am testing with a playbook that only has one task, which is setting the hostname.
So I am trying to ascertain if ansible as executed through the python API is thread safe?
Thank You