Sets arithmetic, variable precedence

Hello list,

how would you approach the following problem, which I have
simplified for the purpose of this list?

Let's assume you want Ansible to configure NTP clients on your
hosts.

Let's assume furthermore that your infrastructure consists of
a heterogeneous set of Debian and Fedora nodes, some running as
VServers, others as KVM instances. VServers cannot set time&date, so
they cannot be NTP clients.

Finally, imagine that your hosts are situated either in the office,
or in data centres in Munich and Zurich, and you want to parametrise
the NTP server to use by location. Since there is no NTP server near
the office, you just want to use the distribution defaults
(debian|fedora).pool.ntp.org.

Therefore, the following group vars get defined for the
distribution groups:

  debiannodes::ntp_server = debian.pool.ntp.org
  fedoranodes::ntp_server = fedora.pool.ntp.org

I would like to let the location-based variables have precedence
over the distro-based ones. This means, however, that I have to make
the location be subgroups of the distro-groups, hence requiring me
to make four new groups, and to duplicate the information between
them (I am assuming multiple inheritance is not possible):

  debiannodes@zurich::ntp_server = ch.pool.ntp.org
  debiannodes@munich::ntp_server = de.pool.ntp.org
  fedoranodes@zurich::ntp_server = ch.pool.ntp.org
  fedoranodes@munich::ntp_server = de.pool.ntp.org

Is there any other way to approach this? I would really prefer not
to have to maintain redundancy in the inventory or the playbooks.

Also, am I right in assuming that the best way to handle VServers is
to create yet another group for all VServers, and then to use
'only_if' in the NTP-client playbook, e.g.

Hi,

Why don’t you introduce new groups like eg.:

  • hosts (for all hardware nodes)
  • vservers (for all VServer containers)
  • kvms (for all KVM instances)

?

And than you can have plays that target specific groups and install just things for those groups (like NTP for hosts and kvms).

Greetings,
gw

also sprach GW <gw.2013@tnode.com> [2013.06.12.1345 +0200]:

- hosts (for
all hardware nodes)
- vservers (for all VServer containers)
- kvms (for
all KVM instances)

And than you can have plays that target
specific groups and install just things for those groups (like NTP for
hosts and kvms).

Sure, but if you spin this further, it'll result in a massive number
of groups and become unmaintainable, I think.

Following up to my own post, I would like to talk about "reclass",
which is an "external node classifier" that I wrote many years ago
for Puppet,¹ have since ported to Salt,² and am now also considering
for Ansible, because I think it would address some of the issues
I am facing with Ansible.

The purpose of this message is to find those who think similarly to
me, and who may have already figured out Ansible-native ways to do
what I want/need to do. Quite frankly, I'd love to use pure Ansible,
rather than porting "reclass" over.

Here are a number of paradigms/aspects of Ansible. If any of these
ring a bell in relation to problems you are having with Ansible,
then I think you should read on and learn about "reclass":

  - The inventory is a list of groups of hosts, and hosts may appear
    in more than one group. The inventory is not a group of hosts
    with associated groups.

  - Groups can be nested, but a group can only ever have one parent
    (no multiple inheritance);

  - Variable precedence depends on group inheritance: host variables
    override those variables set in subgroups, which override those
    variables set in parent groups;

  - If a host belongs to two groups that are not in the same
    inheritance graph, and both groups define different values for
    the same variable, it is undefined which value the variable will
    have for the given host;

  - A playbook hard-codes the group of hosts to which it will be
    applied. While the set may be reduced using --limit, it is not
    possible to apply a playbook to a host not in the target set
    (without changing the set);

Let's illustrate the way reclass would work (if I ported it to
Ansible), using the NTP client example.

Let's assume:

  1. we have Debian and Fedora hosts, some of which run as VServers
     (which don't allow setting date/time, so NTP makes no sense).
     The distribution defaults for NTP servers are
     (debian|fedora).pool.ntp.org and we like that;

  2. our nodes are either in the office basement, or in datacentres
     in Munich or Zurich. At the office, we are fine with the
     distribution defaults, except the host "red" should use
     "example.org". In the datacentres, we would like to use
     (de|ch).pool.ntp.org instead. Yes, this is maybe a bit
     contrived, but it'll do as an example.

Reclass works based on the concepts of nodes and roles, but since
"role" is already taken within Ansible, let's use "class" instead.
Conceptually, classes are similar to Ansible host groups, but
I think it's better to keep them separate for now.

Fundamentally, nodes and classes use identical data structures with
three fields — classes, playbooks, and variables. Their difference
is only in the way they are processed by "reclass".

A node usually only defines a set of classes, and optionally
variables (host_vars), but may also define playbooks directly, e.g.
for quick tests or single-host exceptions. A node is identified by
its FQDN (inventory_hostname).

A class defines other classes from which it inherits, playbooks that
apply to this class, and variables (group_vars). A class may also
just define variables to override other classes, as we shall see
shortly.

Therefore, reclass deals with tree graphs rooted at a node (a host),
while all other vertices in that graph are classes. The further away
a class vertex is from the root node, the more generic the class is
said to be, e.g.

                 node
                / | \ `--- backuppc.client
   debian@wheezy | `--- postfix.satellite
           / |
      debian munich
       / \
    unix molly-guard
    /
ntpclient

For each node, reclass performs a depth-first walk of all the
classes (in order of their definition), and their parent classes.
When it reaches a leaf (a class), it reads the list of playbooks and
stores the variables defined. As it ascends back up the tree, it
merges the set of playbooks and the set of variables.

Put differently: the last class to set a variable takes precedence
over earlier classes, and the node has the final say. In the above
graph, if all classes defined $foo, then the final value of $foo
would be whatever backuppc.client set, unless overridden by the node
itself. If both the 'debian' and the 'munich' class defined an NTP
server, the server from the 'munich' class would win.

Let's introduce six hosts for our example.

  black Debian Munich
  yellow Debian Munich VServer
  blue Debian Zurich
  white Fedora Zurich
  red Debian Office ntp_server: example.org
  green Fedora Office

Here's what the reclass data structures would look like to achieve
our NTP client goal (not as complex as the graph above). First, the
classes:

  --- ---
  name: unixnode name: vservers
  playbooks: playbooks:
  - ntp_client - ~ntp_client
  --- ---
  name: debiannode name: fedoranode
  classes: classes:
  - unixnode - unixnode
  playbooks: playbooks:
  - apt - yum
  variables: variables
  - ntp_server: debian.pool.ntp.org - ntp_server: fedora.pool.ntp.org
  --- ---
  name: hosted@munich name: hosted@zurich
  playbooks: playbooks:
  - motd@servus - motd@gruezi # whatever …
  variables: variables:
  - ntp_server: de.pool.ntp.org - ntp_server: ch.pool.ntp.org

And now the nodes:

  --- ---
  name: black name: yellow
  classes: classes:
  - debiannode - debiannode
  - hosted@munich - hosted@munich
                                       - vservers
  --- ---
  name: blue name: white
  classes: classes:
  - debiannode - fedoranode
  - hosted@zurich - hosted@zurich
  --- ---
  name: red name: green
  classes: classes:
  - debiannode - fedoranode
  variables:
  - ntp_server: example.org

Since the 'hosted@' classes appear later in the classes list for
each node, their variables take precedence, and so the NTP servers
are set according to location, unless there are no 'hosted@'
classes, when the values from the distro-classes are used. In the
case of "red", the host-specific variable definition trumps
everything else.

And since the 'vservers' class negates the ntp_client playbook, it
would cause that playbook to be removed from the list of playbooks
for the host "yellow", since the class is listed after 'unixnodes'

* * *

At this point, I would be really curious to hear from people who
think alike, and how they do it.

Or if you think that I am completely down the wrong track, tell me
why.

As I said before, I have better things to do that to port reclass,
but I will port it to Ansible if I cannot find another way to
achieve the paradigm I am striving for.

If I were to port this to Ansible, then it would be an external
inventory script. It would put hosts into groups that correspond to
playbooks, e.g.

  "ntp_servers_hosts" : ['blue','red','green','white','black']
  "apt_hosts" : ['blue','red','green','black','yellow']
  "yum_hosts" : ['green','white']

Finally, I would create a playbook for each of these groups,
targetting that group, and also a site playbook that combines them
all.

Thoughts?

-martin

Footnotes:
(¹) http://projects.puppetlabs.com/projects/hiera/ seems like it
    grew out of the ideas behind reclass, but I had left Puppet
    before Hiera was started, so I don't know for sure.
(²) https://github.com/madduck/salt-reclass/blob/master/README

I use a play ‘dynamic_groups.yml’ which creates vm based, pkg manager based and locatoin based (from standard hostnames) groups, yes you get tons of groups but you don’t need to maintain the lists. I have empty [newyork] group that this populates via group_by, then set the ntp in group_vars/newyork.

this is a bit messy and i want to change to a dynamic_groups_inventory.py that should make it much cleaner (dynamic_groups always shows changed)

also sprach Brian Coca <briancoca@gmail.com> [2013.06.12.1749 +0200]:

I use a play 'dynamic_groups.yml' which creates vm based, pkg manager based
and locatoin based (from standard hostnames) groups, yes you get tons of
groups but you don't need to maintain the lists. I have empty [newyork]
group that this populates via group_by, then set the ntp in
group_vars/newyork.

Would you share that with us? It sounds interesting and I'd love to
look at an example.

Dear list,

I found some time today to port reclass to Ansible. Actually,
I rewrote it, but you shouldn't care.

There's an elaborate README, so head on over to
https://github.com/madduck/reclass if you are interested.
Integration with Ansible is a breeze. And problems with scope and
variable precedence a thing of the past. :wink:

Comments, ideas, patches, flames, and cookies welcome,

Cool. Thanks for sharing.

http://pastebin.com/4QTMn5ey

working on moving this to an inventory script still.

also sprach Brian Coca <briancoca@gmail.com> [2013.06.16.1750 +0200]:

http://pastebin.com/4QTMn5ey

working on moving this to an inventory script still.

Try reclass! https://github.com/madduck/reclass

Thanks, I saw your previous announcement, it was already on my queue to test.

This weekend was the first time i had some free time, long list to catch up on.