More efficient inventory lookups

We’re using a custom inventory script that calls a webservice which again calls the aws ec2 api to list instances.

The reason we do this, is that we store auto-generated and unique instance credentials, and the webservice retrieves these, decryptes them using aws kms and presents them to ansible.

From testing, it looks like Ansible will always grab every host in the inventory, even if ansible-playbook is started with the “limit” parameters. So for example:

I run ‘ansible-playbook site.yml --limit server10’. My (dynamic) inventory constist of servers server01 - server99.
Ansible’s behavior in this case seems to be:

  1. Call the inventory script with the ‘–list’ parameter to get all hosts/groups.
  2. For each host, call the inventory script with the ‘–host ’ param to get host metadata.

I would argue that the second step is inefficient, because after step 1, Ansible already has enough info to filter out the “unneeded” hosts, so it should only have to perform a single ‘–host’ lookup in order to grab the hostvars of ‘server10’

As we use kms to decrypt credentials, every api call to kms is an exensive operation, which means that ansible starts quite slowly. Before I start making something myself to “pre-populate” an inventory file and just hand that over to Ansible, I’d like to know if there are any plans to improve this behavior.

We do not consider caching an alternative - we want credentials safely tucked away, so storing them randomly in an “open” cache file is not an option for us. We also use Kubernetes pods to trigger ansible jobs, so each ansible “job container” only lives a few minutes - nothing gets persisted after it’s deleted when the job has run.

They might be needed in the playbook, and at this stage Ansible doesn't now that.
The playbook might use variables from other hosts.

please read https://github.com/ansible/ansible/issues/33840#issuecomment-351508201

or http://docs.ansible.com/ansible/latest/dev_guide/developing_inventory.html#tuning-the-external-inventory-script

Thanks Brian,
I’m aware of the current functionality in regards to the “_meta” object vs list/host.
My question is simply if its possible to not populate a full hostvars dict for every node in the inventory in situations where the majority of the nodes are filtered out using the ‘–limit’

Its not a huge problem for us to write some custom pre-generator that runs before ansible-playbook is invoked, but imho its definetely cleaner to avoid it. From the comments so far I’m guessing the functionality I’m after isn’t present in Ansible right now.

No, as `--limit` does not affect hostvars[anyhost][itsvar], since you
need to run the play before you know what variables are consumed we
cannot preemptively avoid querying those.

You CAN return an empty `_meta` (disables --host calls) and then add a
vars plugin that queries that info, vars plugins are 'on demand',
instead of doing this all in inventory up front.

Gotcha, that makes sense. Thanks!