Using inventory variables from external inventory script to specify which other variables to use (like hiera)

Our current configuration management system stores hostnames and attributes for each of our systems, including things like “project” and “site”. I’d like to use these attributes when determining which values to provision a system with using ansible, just like Puppet does with hiera.

Right now I have an external inventory script working pretty well. Here’s an example output:

{
“generic”: [“swrgen1.example.com”, “swrgen2.example.com”],

“_meta”: {
“hostvars”: {
“swrgen1.example.com”: {
“site”: “california”
},
“swrgen2.example.com”: {
“site”: “newyork”
}
}
}
}

What I’d like to do is for ansible to be able to override default variables if a site or project-based file exists that contains this variable. Right now, I have the following setup working:

example_playbook.yml

I played around with 'flat file' variables and a dynamic inventory a while back;
group variables seemed to override the inventory level ones.

YMMV, check the precedence levels for vars for the version of Ansible
you're running.

Yeah, I’m currently playing around with exactly these things. I guess I’m looking for some best-practice guidance for how to handle this situation. I see a couple of ways people are handling these kinds of things…but none seem to be very elegant.

I’m trying to talk my team into using Ansible over Puppet because it, initially, seemed a lot easier to get started with. Now that I’m digging deep into this, though, Puppet is much easier when it comes to managing hierarchical data (via hiera).

I don't think there's a One True Way, it really boils down to the team
you're working
with and what makes sense for them given the mix of environments
they're managing.

Some people use one inventory for ALL their servers, if you need cross-site
plays that might be necessary. If you're lucky enough to not need that...

Personally I'm a huge fan of one inventory per site. I think I posted
it earlier,
but the gist is this layout:

.
├── california
│ ├── group_vars
│ │ ├── all
│ │ ├── group1
│ │ └── group2
│ └── hosts
├── newyork
│ ├── group_vars
│ │ ├── all
│ │ ├── group1
│ │ └── group2
│ └── hosts
└── site.yml

___ (other ad hoc playbooks etc. up at this level too)

Each playbook run just needs a '-i environment_name' and you're set.

It works great if you run isolated environments (prod / dev / staging)
which you want broad consistency between, but also are likely
to have differences in number of servers, types of users with SSH access,
levels of encryption, etc.

When it comes to setting vars, my general rule of thumb is to use
as few 'tiers' of variables as possible, otherwise it gets messy and its
hard to find where they're set. There are about a dozen places to set vars
in Ansible, you don't need to use them all. Generally, I go with:

1. environment-wide globals in $inventory/group_vars/all
    (I used to set them inline in static inventories, but group_vars
support yaml syntax
     and it plays nicer with dynamic inventories)

2. roles have 'sane' defaults in $rolename/defaults/main.yml.
    Logfile paths, service account names, etc.
    If no sane default exists, that's where it is documented
    (via a commented out example var, with a suggestion where to set this).
    This is essentially the 'README' for the role.

3. $inventory/group_vars/groupname - relevant to roles those
    groups are performing. Easy to 'cat' to see how e.g. preprods
mongodb is setup.

4. host_vars/hostname is a Last Resort, usually on brownfield
    sites to track inconsistencies.
    "ls */host_vars/*" returns a handy list of servers we probably
need to rebuild.

In theory, this approach should mean diffing 2 inventory directories
gives a good idea of what's different between them.
I've built pretty complex stacks with
this approach from a single site.yml and it can work really well.