publish-subscribe in ansible

Hello,

I’m wondering what might be a good Ansible way to achieve the kind of publish-subscribe relationship Puppet is relatively good at (typically done either by using extconfigs, or as I was in a home-rolled pushing scheme, virtual resources).

For example: a backup system might be subscribed to by various other systems which need to be backed up. A firewall might be subscribed to by applications which need open ports. A Nagios server may be subscribed to by systems needing monitoring. And so on. The publishing system needs to be configured at some phase after all the subscribers have had the opportunity to make themselves known, and to then be able to access the list of subscriptions and the (arbitrary collections of) information they contain. And it needs to be potentially run on an entirely different machine to the subscribing systems.

Reading around, there’s the “register variable” feature, but I can’t really find much information in the docs about the properties of such variables, other than the trivial example in which command output gets stored in ${mod_contents.stdout}. A second attempt to register this variable would presumably overwrite earlier attempts, I guess? And there’s no way I know of ensure that all registrations happen before the system which acts on them is invoked.

There’s also the “notify” directive. However, that seems to be purely a notification with no content (like, what to monitor/back-up/open). All you can do is ask for some predefined action to be triggered.

Presumably this is a common thing to want to do? Comments, clues and (ideally) examples of prior art solicited…

Thanks

N

Hello,

I'm wondering what might be a good Ansible way to achieve the kind of
publish-subscribe relationship Puppet is relatively good at (typically done
either by using extconfigs, or as I was in a home-rolled pushing scheme,
virtual resources).

I wouldn't say Puppet does event-based publish subscribe, more so than
it does periodic checkins.
Nothing wrong with that, though.

Many people will trigger it from some external system, as you say,
'hand rolled pushing' and so on.

Ansible-pull is essentially masterless Puppet+git made for Ansible but
doesn't enable much of the behavior you really want below.

For example: a backup system might be subscribed to by various other systems
which need to be backed up. A firewall might be subscribed to by
applications which need open ports. A Nagios server may be subscribed to by
systems needing monitoring. And so on. The publishing system needs to be
configured at some phase after all the subscribers have had the opportunity
to make themselves known, and to then be able to access the list of
subscriptions and the (arbitrary collections of) information they contain.
And it needs to be potentially run on an entirely different machine to the
subscribing systems.

This is unneccessary in Ansible. Orchestration and multi-tier
management are the big power features -- in fact, this multi-node use
case being /hard/ in other tools is one of the major reasons Ansible
exists.

What you do is write plays, and talk to groups of machines in order.

Assume you have a playbook that describes a rollout or configuration
where (arbitrarily) your database servers need to know what
webservers/appservers can talk to them.

Write two plays. First talk to your webservers in the first play,
then your database servers.

facts/variables from your database servers are accessible via the
hostvars variable, i.e.

{{ hostvars["otherhost"]["some_fact_or_variable"] }}

That's actually not 100% true, variables are always available, but
facts are only available from hosts you've already talked to in the
playbook.

As such, Ansible eliminates the need for anything like storeconfigs
and difficult synchronization issues about how to wait for X if it
doesn't have all the info it needs about how to define something else.

Reading around, there's the "register variable" feature, but I can't really
find much information in the docs about the properties of such variables,
other than the trivial example in which command output gets stored in
${mod_contents.stdout}. A second attempt to register this variable would
presumably overwrite earlier attempts, I guess? And there's no way I know
of ensure that all registrations happen before the system which acts on them
is invoked.

There's also the "notify" directive. However, that seems to be purely a
notification with no content (like, what to monitor/back-up/open). All you
can do is ask for some predefined action to be triggered.

notify in Ansible is exactly like notify/subscribe in Puppet, and
kicks off tasks.

Hopefully this answers the question.

To clarify, what you want is a playbook that contains multiple plays,
like this. The roles of individual plays in a playbook is to map
individual tasks to groups of systems. So your multi-tier rollout
where you need to talk to webservers before dbservers looks about like
this:

hosts: webservers
tasks:
   - ...

hosts: dbservers
tasks:
   - ...

Hi,

To clarify, what you want is a playbook that contains multiple plays,
like this. The roles of individual plays in a playbook is to map
individual tasks to groups of systems. So your multi-tier rollout
where you need to talk to webservers before dbservers looks about like
this:

hosts: webservers
tasks:
    - ...

hosts: dbservers
tasks:
    - ...

Thanks, I think I can see a way to do it like you describe, but it seems fragile. Perhaps you have a better way than I'm thinking of. Might you or anyone else have a working example they could show?

My preference would be to try and define a yaml schema which grouped related bits of information together - so that adding a new system is a matter of adding a single yaml block. Whereas, if I am guessing what you mean correctly, I would need to add elements in different places all across my schema.

Here's a specific stripped-down example: granting ssh access to a backup server for various and arbitrary system users on different machines. Access requires allowing the client's IP through the firewall server, and adding its public key to the backup account's authorized_keys file (meaning, there may be more than one component and more than one host which needs to be orchestrated).

What I imagine I'd need to do in Ansible currently is to describe the backup ssh account in one place, including a list of allowed public keys. Something like this:

- hosts: backup servers
   vars:
     backups:
       pubkeys:
         - <webserver X key ...>
         - <dbserver Y key ...>
         - <reposerver Z key ...>

   tasks:
     - name: enable ssh access for backup clients
       action: authorized_key user=backup key=${item}
       with_items: ${backups.pubkeys}

On the firewalll server I'd need a list of IP addresses which should be allowed:

- hosts: firewall servers
   vars:
     firewall:
         allowed_hosts:
           - <webserver X addr>
           - <dbserver Y addr>
           - <reposerver Z key>

   tasks:
     - name: configure firewall
       action: template src=templates/iptables.j2 dst=/etc/sysconfig/iptables
       # ...implicitly uses allowed_hosts variable within template

Elsewhere, I'd describe the systems which need access to this account to perform their rsync backups:

- hosts: webservers
     # ... remember to add an entry to pubkeys list and allowed_hosts list
   tasks:
     # ...

- hosts: dbservers
      # ... remember to add an entry to pubkeys list and allowed_hosts list
   tasks:
      # ...

- hosts: reposervers
      # ... remember to add an entry to pubkeys list and allowed_hosts list
   tasks:
      # ...

In each case, each new system that wants to access backup space requires an item added to an appropriate list in the first two plays. New systems need alterations to three disconnected places in the schema, probably more once you factor in other things like monitoring. With this distributed arrangement, it seems it'd be easy to forget, when one system is retired, that it also needs to be removed from all the other places it is mentioned.

Is it actually what you had in mind?

I'm not sure how one might allow grouping the information associated with webservers/dbservers/reposervers together. Thinking out loud, perhaps one could invoke an xpath-style query of the yaml schema to get items to iterate over (in which case the pubkey and allowed host info could be put in webservers.yml, dbservers.yml, etc, and still found)?

- hosts: backup servers
   vars:
     pubkeys: query("//backups/pubkey")

Or, perhaps have the webservers / dbservers/ etc. plays' variables somehow append or insert items into the relevant lists in backup-server play's variables?

- hosts: webservers
   vars:
     backups:
       pubkeys:
         - <webserver X key ...>
   tasks:
     # ...

- hosts: dbservers
   vars:
     backups:
       pubkeys:
         - <dbserver Y key ...>
   tasks:
     # ...

In either case, the point is that if a new (class of) server is added, the pubkeys and host addresses are listed in the new play for this server, not in the existing plays. And it'd be handy if you could add more complicated data structures, as well as just single values.

(This is just off the top of my head, I don't think either is currently possible in Ansible - I just want to illustrate the kind of thing I mean.)

Obviously whatever scheme is used, it'd be better to avoid ones in which typos or incomplete refactoring of variable names results in things silently breaking. I'm thinking specifically of the xpath-style query: it would only take a typo to have an empty list returned. The latter scheme might be safer, if you couldn't append data just anywhere, but only to sites marked as receivers for this kind of addition.

Cheers,

Nick

Nick wrote:

Hi,

To clarify, what you want is a playbook that contains multiple plays,
like this. The roles of individual plays in a playbook is to map
individual tasks to groups of systems. So your multi-tier rollout
where you need to talk to webservers before dbservers looks about like
this:

hosts: webservers
tasks:
    - ...

hosts: dbservers
tasks:
    - ...

Thanks, I think I can see a way to do it like you describe, but it seems
fragile. Perhaps you have a better way than I'm thinking of. Might you
or anyone else have a working example they could show?

My preference would be to try and define a yaml schema which grouped
related bits of information together - so that adding a new system is a
matter of adding a single yaml block. Whereas, if I am guessing what
you mean correctly, I would need to add elements in different places all
across my schema.

It sounds to me like you just need to look at putting data in host_vars,
and using ${groups.<group>} and ${hostvars.{host}...} to access it, as well
as delegate_to. Your (snipped) design below is hideous at best and seems to
attempt to explicitly not use any of ansible's features.

Daniel

Thanks for answering. Actually, that is the point of the examples: they're not
very nice, and I want to see how other people would do it better (no examples
forthcoming, unfortunately). I also left out features like host_vars which
didn't seem to help the explanation as I understood them.

The "delegate_to" keyword is new to me - I'm pretty certain I have read about
this but must have simply not understood its use and passed it over. Even
re-reading it, it's not very apparent how it is used. For instance, the
keyword's parameter isn't described and I am left to infer what it can be from
the example. Can I use any hostname or do they need to be ones from the
inventory? Can I use groups? And what are "outage windows", anyway?

Now what I know to search for, I can find better descriptions. Such as Michael
DeHaan's original announcement here:

    https://groups.google.com/forum/?fromgroups=#!topic/ansible-project/ZrSuwsqv1m8

And Jan Piet Mens' blog post here:

    http://jpmens.net/2012/09/01/delegation-of-tasks-with-ansible/

But neither explicitly answers my questions, unfortunately.

So, anyway back to the main point: I can see how delegate_to would help with the
example of adding public keys to an authorized_keys file, since that can be
accomplished by invoking a task on the backup host once per key. Fine.

But it doesn't help if I want to add *values* for iteration over in a template
expanded *once* on another host, as in the example of the firewall config - not
unless perhaps we resort to some some Puppet-style contortions and expand a
template per delegated action, and then include or otherwise stitch these
together into something usable. (I confess I dislike that approach, it's clumsy
to use and doesn't work in every case.)

Iterating over another host's host_vars variables as you suggest only works if
they're all in a well defined place known in advance. So you cannot exploit
this in the same way that delegate_to can invoke a task at arbitrary places.
Plus it's easy to break when refactoring: a mis-named or mis-placed variable on
one or more hosts would be silently ignored.

I suppose what I'm groping for is something akin to delegate_to, but rather than
physically executing some task on the target host, values are added to a list in
the other host's model (and absence of the list would be an error). This is not
necessarily a task; additions would ideally be computed prior to any task
invocations.

Is that feasible?

Cheers,

N

This series is all a bit long for me to follow, and seems to make lots
of claims that things are "not possible", when I know they are, so
I'll answer just one specific point that should correct those
assumptions:

"Iterating over another host's host_vars variables as you suggest only works if
they're all in a well defined place known in advance."

This is what facts are for. Speak to hosts in previous plays, and
their facts will be available in later plays, including places
like hostvars.