More efficient inventory lookups

Trond_Hindenes · December 14, 2017, 1:16pm

We’re using a custom inventory script that calls a webservice which again calls the aws ec2 api to list instances.

The reason we do this, is that we store auto-generated and unique instance credentials, and the webservice retrieves these, decryptes them using aws kms and presents them to ansible.

From testing, it looks like Ansible will always grab every host in the inventory, even if ansible-playbook is started with the “limit” parameters. So for example:

I run ‘ansible-playbook site.yml --limit server10’. My (dynamic) inventory constist of servers server01 - server99.
Ansible’s behavior in this case seems to be:

Call the inventory script with the ‘–list’ parameter to get all hosts/groups.
For each host, call the inventory script with the ‘–host ’ param to get host metadata.

I would argue that the second step is inefficient, because after step 1, Ansible already has enough info to filter out the “unneeded” hosts, so it should only have to perform a single ‘–host’ lookup in order to grab the hostvars of ‘server10’

As we use kms to decrypt credentials, every api call to kms is an exensive operation, which means that ansible starts quite slowly. Before I start making something myself to “pre-populate” an inventory file and just hand that over to Ansible, I’d like to know if there are any plans to improve this behavior.

We do not consider caching an alternative - we want credentials safely tucked away, so storing them randomly in an “open” cache file is not an option for us. We also use Kubernetes pods to trigger ansible jobs, so each ansible “job container” only lives a few minutes - nothing gets persisted after it’s deleted when the job has run.

Kai_Stian_Olstad · December 14, 2017, 2:57pm

They might be needed in the playbook, and at this stage Ansible doesn't now that.
The playbook might use variables from other hosts.

system · December 14, 2017, 6:29pm

please read https://github.com/ansible/ansible/issues/33840#issuecomment-351508201

or http://docs.ansible.com/ansible/latest/dev_guide/developing_inventory.html#tuning-the-external-inventory-script

Trond_Hindenes · December 14, 2017, 9:09pm

Thanks Brian,
I’m aware of the current functionality in regards to the “_meta” object vs list/host.
My question is simply if its possible to not populate a full hostvars dict for every node in the inventory in situations where the majority of the nodes are filtered out using the ‘–limit’

Its not a huge problem for us to write some custom pre-generator that runs before ansible-playbook is invoked, but imho its definetely cleaner to avoid it. From the comments so far I’m guessing the functionality I’m after isn’t present in Ansible right now.

system · December 14, 2017, 9:38pm

No, as `--limit` does not affect hostvars[anyhost][itsvar], since you
need to run the play before you know what variables are consumed we
cannot preemptively avoid querying those.

You CAN return an empty `_meta` (disables --host calls) and then add a
vars plugin that queries that info, vars plugins are 'on demand',
instead of doing this all in inventory up front.

Trond_Hindenes · December 15, 2017, 10:27pm

Gotcha, that makes sense. Thanks!

Topic		Replies	Views
pass --limit to inventory to improve speed and remove unnecessary variables Ansible Developer	17	29	September 21, 2015
Dynamic inventory script (--host flooding) Ansible Project	5	0	April 21, 2015
with_inventory_hostnames and --limit Ansible Project	5	20	October 30, 2013
Scoping dynamic inventory best practices Ansible Developer	4	10	June 24, 2014
Naming hosts in ansible inventory Ansible Project ansible-project	0	9	August 3, 2012

More efficient inventory lookups

Related topics