Adding vars_files to include

Hello list,

Id like to add the functionality to specify vars_files statements to “include” so we could use vars_files that depend on $item (with the understanding that since includes are “unrolled” when the task list is built, I would not get access to host facts in the filename).
This doesn’t look too hard if I tweak _update_vars_files_for_host to tell it which dictionary to update (optional additional argument to the function, defaults to self.vars, pass it task_vars when called from the task).

This is do do something like my second idea in this post: https://groups.google.com/forum/#!topic/ansible-project/Ic3IIW3vBXs

Do you think that’s a good idea, is it something we could merge (once written, and post 1.1 I guess).

This could actually probably be made to work for all tasks and use host facts in the filenames, but that would require more in-depth changes (ie adding a vars_files member to tasks that would be resolved at run time, but that could turn into making the host cache a mess if people use it badly).

I still am having trouble understanding this.

If you include a playbook file it can have it's own vars_files inside
the playbook.

Ultimately I tend to shy away from concepts I find difficult to
explain or talk about, and this seems to be one of those things -- and
seek to find the best possible ways to express those things.

It always helps to go back to use cases and talk about those things
first, rather than talking about syntax.

I think a better approach is to find the ideal syntax to handle a
problem/question, or the way the software should work and make it work
that way.

(It remains true that if you have things that need loops inside task
includes, you need to loop over those variables inside the task
includes)

So, the question is really this...

what is the use case you are trying to achieve, in terms of what you
are installing and setting up?

The use case is what I described in here: https://groups.google.com/forum/#!topic/ansible-project/Ic3IIW3vBXs

Basically, I need to have “clusters” of “applications”, several of which can be deployed on the same host.
The applications have settings which can be set at the cluster level, but overridden at the application level (kind of like we have with group_vars/host_vars).

When the play executes on an host, I need it to handle each application linked to that host, with each application getting its setting either from its cluster settings or its application overrides.
In my application playbook, I want to use simple variables like username, not “clusterid.applicationid.username if it exists, otherwise clusterid.username if it exists, otherwise default username”.

So, I can’t define them in host_vars as application2 settings would override application1 settings.
I can’t use merged nested hashes of group_vars because “application level” settings would not be as the same depth as “cluster level” settings.

Right now, vars_files variables are host-scoped, not task-scoped, unless I missed something, so that doesn’t solve my issue.
What I was proposing was to make at least a subset of it’s functionality available as task-scoped when using includes (like we have with “vars” currently).

You’re wanting to use the same variable, just have it return different values based on the scope?

Yes, I want to use the same variable, which will be different for application1 and application2 on the same host, and can be specified at either the application or the cluster level.

I am not sure about overriding variable, I am assuming that if you redefine a variable, it overrides its previous definition. It that is the case, would something like this work for you:

vars_file:
   - ../blah/cluster.vars.yaml
   - ../blah/${inventory_hostname}.vars.yaml

We use something similar to have variables define per environment.

Yes, group_vars/, host_vars/ and so on all exist here, as do using
variables in vars_files. Those all work.

The only complexity is when using an include + with_items, you can't
use a variable that is group or host scoped, because task includes
create task objects, which are run (with different variables) for each
host in the group. The with_items here though doesn't set
variables, it runs the include multiple times. Since this happens at
a higher level, it can only be interpreted one way -- different hosts
in the same play cannot get different numbers of tasks.

However, if you can codify your host by group purposes, having
different plays that do include + with_items is reasonable.

This is why I want to talk more about the use case without talking
about ansible syntax at all.

Yes, the complexity here is that I’m working from an “application group” (cluster) point of view, I don’t really care on which host they’re deployed until the deployment has to happen.
But I need to translate this to the “host-centric” point of view for it to work on Ansible (or Chef or Puppet, they’re all made to do stuff on a host so it makes sense).

This is why I’m inclined to work with include + with_items, looping on the different clusters, then at another level on the different application instances for the corresponding cluster.

The difficulty is that my variables (username to keep the same example) are not host-specific nor are they group-specific, they are application-specific with a default in the cluster configuration.

When I’m running the playbook to deploy the application, all I really care about is not the host, it’s which application I am, and in which cluster.
It’s for that part that I thought task-scoped (actually item-scoped) vars_files would be useful, as I could just reference files corresponding to my cluster / application instance.
We already have this for “vars” in 1.1, the only difference is when people except host/group variables or fast to be available (but this is still useful, just needs a box in the include documentation that says “Since this is pre-processed when building the tasks list, you cannot use any host or group specific variables for this”).

Sidenote: I know I would need a “this application instance is deployed on this host” mapping for this, and that each host would get a task list with all instances with a check “am I on the right host?”, this is OK for me to stat with, and I could probably some pre-processing later on to reduce the task list.

Ansible isn't host centric, it does things to groups of hosts.

Putting hosts in more than one group if it performs more than one role
is the best possible way to assign multiple behaviors to that host.

Key here is "the goal of playbooks are to map groups of hosts to roles
they perform".

A good playbook should be pretty minimal, and mostly rely on task includes.

If you do this, you shouldn't really need the parameterization, your
host would just get talked at by multiple plays.

Yes, but the “group/role” paradigm applies to a group of hosts (which is why I say it’s host-centric).

To simplify my example even more, let’s say we have a “cluster1” group which will encompass “app1” and “app2” (two instances of the same kind of application that we want to deploy with the same playbook), both deployed on the same host.
At some point, when we deploy each instance of “app”, the playbook has to get variables that are specific to that instance, or it’s parent cluster. The variables are neither host-specific nor group-of-hosts-specific.

  • loop through all clusters (include + with_items), giving all subtasks the cluster-specific variables
  • netsted loop through all apps for each cluster (another include + with_items), giving all subtasks the app-specific variables
  • on each host, if this is the host specified for this instance of app, play a playbook with a merge of cluster and app variables

For a huge number of app instances, this would be very inefficient since on all hosts we’d have a play related to every cluster/app combination with a conditional on hosts (which could probably be improved by pre-filtering the task list for each host when the conditionals can be resolved “upstream”, but that’s another story). But with task/include scoped vars_files, that would have been very easy to configure.

The only way I found to resolve this so far was to unroll everything: for each element of my cluster+app matrix, I do an include of the application playbook with clusterid, appid and host parameters.
This is also pretty inefficient (the tasks are not grouped by host at all, each play opens a new connection), and not really maintainable, I pretty much have to generate my top-level playbook and all its includes+parameters from external files describing my cluster/app structures.

I guess the meta-issues here are:

  • do we want to have more complex nested loops (ie: with variables that can come from “external” files, and that are scoped to the specific loop)
  • do we want to have optimizations based on resolving conditionals master-side when possible (ie: for a play, the list of tasks can be different for every host)

IIRC this is already fine due to the way with_items flattens:

vars:
    packages:
         - $other_package_list
         - $yet_other_package_list

- action: yum install $item
  with_items: $packages

with_nested: is also available, but is different.

"each play opens a new connection"

you should also be using -c ssh with ControlPersist.