Hello,
Apologies in advance for the long mail. I’ve been testing out Ansible 2.0 and ran into some performance issues, especially with large inventories. My test playbook takes 10 seconds to complete on ansible 1.9. In ansible 2.0, I killed the run after 22 minutes when it only got through 10 hosts. Given that pace, it would have completed in about 3.5 hours. (The playbook was targeting 70 hosts in a 2,000 host inventory). I got the run down to 2.5 minutes after making some code changes, but still can’t get to ansible 1.9 performance levels.
Test Case #1:
- name: performance_repro
hosts: repro
gather_facts: False
tasks:
- name: get uptime
shell: uptime
Test Case #2:
- move “shell: uptime” to a different file and include it
Then inventory file contains 20 groups, with 2,000 hosts. The “repro” group contains 70 hosts.
Command:
ansible-playbook playbooks/openstack_ops/repro.yml -i inventory/openstack_ops -f 50 --limit repro
I traced this down to a few issues:
- 
Vars caching is commented out (is this intentional?) see (vars/init.py #L318). Uncommenting this line reduced the test case run time from 3+ hours to 2 minutes, 26 seconds. The get_vars() call can take 200ms+ without caching in my test case. 
- 
“Include” tasks get sent to the worker pool. In test case #2, this costs the penalty of adding an extra task to the run. I short-circuited this in the linear execution strategy by returning a TaskResult for “includes” directly in the run loop, like what is done for “meta” tasks. This was originally handled in the worker pool by task_executor. This turned a 2 minute, 43 second run to 2 minutes, 24 seconds. (And would greatly improve run times on plays with many hosts or includes) It seems to work, but I’m not sure if this has other implications. 
- 
Worker pool serialization – After applying the var caching and task include changes, a basic playbook run is still 15x slower than in Ansible 1.9. After running ansible under cProfile, I found that most of the time is spent serializing data structures to send to the worker processes. A lot more data is being sent to the workers over queues compared to 1.9 where workers were forked with the data they needed. I’m hoping someone on this list can chime in on if this is a known issue and if there are any ideas for improving the serialization overhead. 
Our real playbooks rely heavily on “includes” to take advantage of roles and code re-use, and they usually target hundreds of hosts, sometimes thousands. The new overhead slows down a playbook enough to be unusable for large scale application deployments etc. We are really excited about using Ansible 2.0 features like the OpenStack modules and execution strategies. Thanks for any help you can provide to help us get there.
-Christian