Salvete all,
we are still clinging to 1.9.x ever since 2.0 came out but we’ve been planning to move to 2.1 once it comes out.
We had already figured out that ansible 2.x starting up with parsing our dynamic JSON inventory (7MB of text, almost 9000 groups, excluding ‘_meta’ hostvars) takes over 20 seconds versus 5 seconds before on 1.9. (Please see my PR #13957 on that)
Now we’ve started running stable-2.1 with --check on some of our playbooks with the incompatible changes cleaned up (backslash escapes, etc).
Ansible branch stable-2.1 takes more than 5 times longer to complete than 1.9.4!
The test playbook rolls stuff out on about 150 hosts out of a couple thousand in the inventory. There are a lot of tasks on each host and already visually it is apparent that something is very different compared to before. Even skips tick by at a snails pace (up to 0.5s per skip). The main ansible-playbook process holds 500MB res memory.
On ansible 1.9.4 this completes in 23 minutes. On 2.1 in 2 hours, 9 minutes.
This is ansible.cfg:
[defaults]
forks = 55
force_color = 1
roles_path = roles
hostfile = inventory
#library = library
retry_files_enabled = False
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=61s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=yes -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey,keyboard-interactive
pipelining=True
Since people were suggesting it, I tried fact caching with redis or the free strategy. This did nothing significant to the runtime (as expected since the time is not spent in gathering, it’s spent forking the main process or something):
ansible 2.1.0.0 (stable-2.1 be28443943)
15125.40user 6640.76system 2:08:29elapsed 282%CPU (0avgtext+0avgdata 3262800maxresident)k
304inputs+1046376outputs (5major+613591630minor)pagefaults 0swaps
ansible 2.1.0.0 (stable-2.1 be28443943) – fact_caching = redis
14469.46user 6497.10system 2:04:38elapsed 280%CPU (0avgtext+0avgdata 3052820maxresident)k
728inputs+1028496outputs (18major+606686383minor)pagefaults 0swaps
ansible 2.1.0.0 (stable-2.1 be28443943) – strategy = free
16394.78user 8026.57system 2:19:29elapsed 291%CPU (0avgtext+0avgdata 1231688maxresident)k
80inputs+1084288outputs (0major+626770671minor)pagefaults 0swaps
We really want to run 2.x but a this point, we are considering maintaining 1.9.x for ourselves with backported modules. I wanted to bring this up on the mailing list first, to see if there is anyone on the dev team who has a grip on what is going on with 2.x at all? What is happening that throttles playbook execution so much? Is this going to improve anytime soon? If I wanted to explore this further, what would be the approriate tools to profile/benchmark/analyze the problem further with our workload? Apparently this has not come up before, otherwise — I assume — 2.0 would not have been released like this. So I wonder what is different in our setup that slows Ansible 2.x to a crawl. I suspect it is again our large inventory, but i’d like some help to figure it out.
cheers,
Tobias