pass --limit to inventory to improve speed and remove unnecessary variables

Hello!

Our current situation:
-We have a python script that builds the inventory with a bunch of variables per each group, we can notice that the bigger the JSON the inventory generates, the slower ansible runs (about 60 seconds with the full size)
-usually we use --limit and run ansible against just one server, but still the inventory generates the JSON for all of the hosts (and groups)

We would like to pass the --limit argument to the inventory script so we can only generate the inventory for the hosts we are running ansible against. I would like to know what do you think of this idea and if you like it I can work on the PR

cheers

I would like to remind of the other usecase, where having all of the basic inventory variables available allows you to access them using the hostvars variable, even when running on only specific hosts using --limit. Always passing --limit to the inventory script will make this impossible.

Another advantage to using limit in this way would mean that you don’t have to decrypt variables for groups you aren’t deploying to when using vault.

Though this could potentially cause issues with delegate_to as that host might not be in the initial limit.

The biggest problem is that the limit spec is “complicated” and without just reusing the functionality in your inventory, since that isn’t likely to speed things up, the API or whatever your script queries for data would also have to support it.

I know this shouldn’t really be the concern of Ansible to decide, but for other than the most simple case, I can’t imagine it will be very useful without you doing some heavy lifting to actually make it useful.

We actually have a wrapper application that intercepts the limit and passes it to the inventory script by way of an environment variable, which is pretty much suiting our needs currently. Although it is another step, you can implement this behavior now by use of env vars. although potentially duplicating input.

I think there should be an option for dynamic inventory scripts, too.

At the moment if a script doesn’t provide the vars for all hosts with “–list” in the “_meta” section the script gets called for all hosts with “–host”.
http://docs.ansible.com/ansible/developing_inventory.html#tuning-the-external-inventory-script

Maybe there could be an option to pass the “–limit” to the inventory collection, too (–limit-inventory). Or maybe to implement an inventory limit (–limit-inventory )?

esco

Hi

We would like to pass the --limit argument to the inventory script so we
can only generate the inventory for the hosts we are running ansible
against. I would like to know what do you think of this idea and if you
like it I can work on the PR

This would break backwards compatibility because the host variable can
also be used in templates (which is damn powerful)

e.g.

{% for host in groups['all'] %}
   {{ hostvars[host]['your_variable'] }}
{% endfor %}

or in tasks

- debug: msg="{{ item }} having var {{ hostvars[item
['your_varibable_name'] }}"
  with_items: groups['all']

Hello René,

I think everyone is aware of this and making this “inventory limit” default isn’t an option.

But if you have a huge generated inventory from a CMDB with a lot of locations, this would make really sense. For example if you want to configure some hosts in DMZ 1 at location A you wouldn’t need all hostvars loaded from all locations and zones…

Maybe another option would be to load hostvars dynamically when needed.

esco

Given the number of cases where this could go awry, I believe the best way is to let each plugin implement the support for this kind of limiting, much how Matt Martz has described.

Lazy loading of variables is already supported isnt it? The plugin should just not provide the _meta item and handle being run with the --host argument. Maybe plugins should support doing that based on config option/env variable.

hostvars is already lazy, the issue is that it relies on inventory
hosts, which is not lazy.

Hello Brian,

since when (maybe in 2)? With 1.9.3 this doesn’t seem to be lazy…

Short example:

#cat hosts
host[1:3] [ansible] ansible ansible_ssh_host=localhost

#cat test.sh

`
#!/bin/bash

case “$1” in
–list)
echo {}
;;
–host)
echo {"foo": "bar"}
sleep 1
;;
esac
`

#time ansible -m debug ansible

`
ansible | success >> {
“msg”: “Hello world!”
}

real 0m4.265s
user 0m0.190s
sys 0m0.078s

`

esco

@esco, not sure what you are trying to show here, as I said, inventory
is not lazy, but hostvars are, hostvars are populated partially from
inventory, but it is not the only source.

In the example I only run Ansible to “debug” hosts matching the expression “ansible”. But it runs the script “test.sh” for every host (host1 - host3 and ansible: sleep 1 in the script when called with “–host” and run time 4 seconds…). So if you have a script that returns vars for hosts (with --host) every time Ansible runs this script gets called for every host in the inventory sequentially.

But I think it would be nice if Ansible would call the inventory script with “–host” only when the hostvars are needed (and maybe in parallel). That’s what I would call “lazy” in this case.

esco

again, you are confusing inventory with hostvars (which includes
inventory and more), also there is a workaround for this in inventory
scripts, use the _meta facility to populate all host variables at once
and it will avoid calling the script over and over.

Hello Brian,

Yes, I am aware of the “workaround” (some posts before…). And I thought hostvars are the vars from the inventory. So yes, I am bit confused with this. Thought that only facts (terminology from the documentation but in fact that are “vars”, too) for the host are loaded lazy. Or what other vars do you mean with “more”? With “ansible -m debug -a “var=hostvars” ansible” I don’t get any other?

So lazy loading of the inventory hostvars (in the meaning of “vars of a host in the inventory”) or some other options for limitation isn’t a good idea?

esco

hostvars is a class that contains all vars associated with the host,
if you don't add any vars outside of inventory you'll only have the
inventory vars there. hostvars itself is lazily loaded, inventory is
not. When you query hostvars['myhost'] it will see if it is populated,
if not it will ask the inventory object (not script or file) which
already loaded and flattened all the vars for all hosts, for the vars
in that host to combine with the other sources (play, facts, special
vars, etc).

As mentioned above, limiting inventory can make a lot of data
unavailable that users and plays expect to be there, it should not be
done w/o strict control of what runs and knowning that scope well, I
would not make this a default option and would hesitate to make it
generic as most inventory scripts do not support it. There are some
workarounds mentioned above that could be made to work, but i have yet
to see a PR that does this cleanly.

thank you all for your replies, interesting discussions,

One solution we came up that would be backwards compatible is to pass a boolean flag (for now --pass-limit-to-inventory ) when running ansible-playbook that would run the inventory passing the --limit as an argument so.

Case 1: my plugin would fail if --limit is pass
solution: don’t use --pass-limit-to-inventory

Case 2: my plugin expects --limit
solution: use --pass-limit-to-inventory when running the ansible-playbook

what do you think?

Hello again,

PR done, https://github.com/ansible/ansible/pull/12451, please check it out