Scoping dynamic inventory best practices

Hello,

So I recently opened a PR against ansible with the intention of passing the ‘host-pattern’ command line argument to my dynamic inventory script in order to scope responses to just the set of hosts I intended to work on. I was able to modify ansible-playbooks to do the same by passing the ‘args’ variable of ansible-playbook to my inventory script and then building my json response by inspecting the YAML files and scoping the response to what was represented in the ’ - hosts’ key of my playbooks.

The reason I wanted this functionality is because I use an inventory system that returns data as needed / on demand. We have 10s of thousands of hosts across a variety of environments/datacenters and getting all hostnames / clusters back with every call to ansible is not scaleable. The suggestions in my PR are to use local caching, which is obviously a good suggestion, it would certainly speed up getting a response, and limit API calls to my inventory system, but the problem is then consistency and adding an additional layer of cache to a system that is already cached.

Here’s an example of what I’m talking about:

/etc/ansible/hosts --list %prod.xyz-service

{
“%prod.xyz-service”: [
“host1”,
“host2”,
“host3”
],
“_meta”: {
“hostvars”: {
“host1”: {},
“host2”: {},
“host3”: {}
}
}
}

which allows me to call ansible like:

ansible -r %prod.xyz-service -m ping

likewise, if I have playbooks, it works similarly:

cat ~/test-playbook.yaml

  • hosts: ‘%prod.xyz-service’
    sudo: True
    tasks:
  • name: install keyczar
    yum: name=python-keyczar state=latest

/etc/ansible/hosts --list /home/loren/test-playbook.yaml /home/loren/test-playbook2.yaml

{
“%prod.xyz-service”: [
“host1”,
“host2”,
“host3”
],

“%prod.foo-service”: [
“foo-host1”,
“foo-host2”,
“foo-host3”
],

}

ansible-playbook -r /home/loren/test-playbook.yaml /home/loren/test-playbook2.yaml

So my question is – is anyone using some method of scoping their inventory scripts with success? I have a fork with the changes described above (note the ‘-r’), but I get a real “icky” feeling when using tools with hacks. I’d love to use ansible vanilla without hacking around my problem, the suggestions in my PR were to use environment variables, which is perfectly reasonable, except I would have to then be issuing 2 commands every time I wanted to invoke ansible, and it would make the host-pattern argument itself redundant. The only other option, as far as I can tell, is to use local caching in my inventory script and actually make a large call to my inventory backend upon first invocation. This is less than ideal because now I (and the other 100+ members of my org) now have to maintain a local cache of an already cached inventory system that is consumed by 100s of other tools, simply shifting the hackery from ansible to the inventory system, instead of working in harmony in some way.

Thanks!

“So I recently opened a PR against ansible with the intention of passing the ‘host-pattern’ command line argument to my dynamic inventory script in order to scope responses to just the set of hosts I intended to work on”

In the comments to the PR, I had replied that people usually handle this by setting environment variables, like CUSTOMER_ID=37 which just returns the inventory for customer 37.

"So my question is – is anyone using some method of scoping their inventory scripts with success? "

Lots of people. I’m not just making that up.

He’s really not making that up. Many of us do have large inventories, and we use launch time environment variables to pare down the amount of hosts returned by our plugin. In our particular case, we have the ability to limit our inventory return by data center, as we rarely ever attempt to address hosts across all data centers at once. Reducing the return to just one data center is a drastic improvement in execution time.

-jlk

Yeah, of course! I didn’t think anybody was making anything up, hah. As the subj line states, I’m merely looking for best practices here :slight_smile:
The inventory system that I use (outside of ansible) is by nature scoped to individual clusters, if I were to run ansible with an environment or datacenter env var, it would still be far too much data and would take far too many queries to the inventory backend in order to return the results in an easily consumed format. Every action we take is typically scoped to the format of “{environment + datacenter} . { service }. { cluster # of said service}”, and the format we use to query the inventory system uses that same convention (something like %west.nginx.1). To get all of %west, it would return thousands of hosts in the datacenter, and actually there’d be 3-4 levels of duplication due to colocating services.

I think my best course of action to use ansible without any modification would be to write a wrapper around the ansible and ansible-playbook commands to do basically exactly what my fork does (send the ‘pattern’ var as an argument to my dynamic inventory script). Perfectly reasonable course of action, imo.

Thanks for the input all!

This is what I ended up doing, so far it’s working super well, just wanted to add this to the thread in case others stumble upon it with the same issue
https://github.com/sixninetynine/ansible-wrapper/blob/master/ansible-wrapper.py