Developers: Optimizing external inventory scripts

A few folks have pointed out that the inventory script mechanism calls “–host” for every host, that can’t be parallelized, and that’s a bit slow for large numbers of hosts, especially when the inventory script doesn’t do it’s own caching. Previously this resulted in generating a static inventory file being sensible, especially when you have thousands of hosts.

Good news! I’ve added something for this.

Details in the following commits:

https://github.com/ansible/ansible/commit/bcaa983c2f3ab684dca6c2c2c8d1997742260761
https://github.com/ansible/ansible/commit/8955ac1eda048eeda7e4126ee0cc26c2a4b06147

If you’ve written an external inventory script for Ansible (they live in the “plugins/inventory” directory of the checkout, you may wish to update your inventory script to support the above, as it will offer some nice speedups.

This is however not required, if you don’t return any “hostvars” in the “_meta” element, or there is no “_meta” element, everything will continue to work just as before.

Rock! Thanks! I'm going to test this out in our environment soon.

-jlk

Thank you, Michael! It’s very very much appreciated.

I’m considering using Ansible in an environment with potentially 20,000+ hosts utilizing a centralized CMDB, and I’m concerned about how Ansible manages inventory. From what I read, even with the optimization you’ve made with this change, my external inventory script would still have to fetch the entire hosts/groups list on each invocation before whittling it down with patterns specified on the command-line.

What are the recommended strategies for managing inventories of this size? Local caching of the inventory content by the inventory script? We have a dynamic environment with hosts moving in and out of groups at any given time, so local caching could be very problematic. I had hoped I could pass in group names to the inventory script for more targeted fetching of CMDB content. That doesn’t appear to be possible.

Suggestions?

Thanks,

Dan

The recommendation in this case is in fact to let the inventory script cache to disk and know when to update it in many cases where APIs are slow. The EC2 script, included in plugins/inventory, in fact does the same.

However, if your CMDB is in the database, I wouldn’t think it would be that slow. (Have you perhaps considered memcache or a NoSQL store for the exact document you wish to return to
Ansible?)

You are of course welcome to also carve things up into smaller inventories if you would feel more comfortable with that.

I have been thinking that carving up into smaller inventory scripts is called for if I want to be able to pull down smaller chunks. Prior to the _meta[“hostnames”] update I was caching things locally because of the individual --host calls, but it caused some potential race conditions that I didn’t like, and I’ll be glad to be rid of it.

I’ve been tossing around the idea, along similar lines, of having my inventory script detect what it’s been invoked as and provide different inventories accordingly:

ansible-playbook -i /etc/ansible/lab-hosts
ansible-playbook -i /etc/ansible/prod-hosts

where
hosts.py
lab-hosts → hosts.py
prod-hosts → hosts.py

Regards,
Eugene

yep, interestingly enough our AWX inventory script (that we use behind the scenes in AWX to interface with Ansible) works a lot like that.

We pass in the database inventory ID via the environment and the system returns a subset of the overall inventory.