subset/limit option in play definition (for applying a play to an intersection of host groups)

First and foremost our inventory script creates *a lot of* groups based on CMDB information, including

Then we have:

  - appl (application code, points to a team in the company)
  - environment (dev, test, qa, prod)
  - securityclass (dmz, fta)
  - location (dc1, dc2)
  - hardwaretype (vmware, kvm, blade, standalone)
  - status (to-be-provisioned, provisioned, accepted, production, maintenance)

And we have some combined groups:

  - location-environment
  - securityclass-environment

This is mostly based on the need to have variables specific to any of these combinations, or the specific use we have to limit on the command line.

What you can do then is something like:

   hosts: webservers:!redhat:!fedora

Which means all the webservers, except the redhat and fedora servers. Which is in set theory the complement. You can make unions too:

   hosts: debian:fedora

But what is missing is an intersection option, like (made-up syntax):

   hosts: webservers#debian

My preference is to implement union, intersection and complement from set theory and create a syntax for priority rules, etc...

E.g.:

hosts: webservers#(debian:london)

hosts: (webservers#debian):london

?

Speaking personally, I’d like to see as much consistency and predictability as possible.

This might be crazy or stupid, so please be gentle when telling me off, but I’d like it if you could do everything from the command line and a play file. Not sure how you would represent all the levels of a yaml file on a cmd line, but I keep getting confused by things that can be set one place but not another. EG --connection works on cmd line but not playbook and hosts: works in playbook but you use --limit on cmd line.

Maybe something like

ansible-playbook --hosts webservers --connection ssh —tasks ----name wheeee ----command “echo this is nuts” an_almost_empty_playbook.yaml

It wouldn’t be all that useful in itself, but it would give us a consistency that would be really powerful (and I suspect easier to code in the long run).

Okay, you may now throw tomatoes at me.

-Dylan

This syntax runs counter to my sensibilities and will not happen.

You can generally do most of these things through feeding variables in
through --extra-vars.

This prevents bloating the command line options.

I'm generally NOT in favor of supporting this as it discourages reuse
and recording what you want to do, and doesn't make any sense when
describing multi-tier operations.

Don't dismiss based on syntax only, being able to do the above is very powerful and avoids having to make groups just to be able to set a limit. The fact we don't have the set theory symbols makes it harder to come up with an acceptable syntax that people understand out of the box.

The current syntax Ansible is using today:

     ':' means union
     ':!' means intersection (with complement)
     '!' means complement

whereas:

  - there are no priority rules (left-to-right only ?)
  - union+complement symbols means intersection of complement
    (contrary to what one would expect)
  - one cannot do everything that's useful

There are other symbols to be used.

     '&' meaning union
     '|' meaning intersection
     '^' meaning complement

This would avoid the old convention (so we could have backward compatibility) and it might look more readable:

     hosts: (webservers|production|dmz) -> all webservers in production and in dmz
     hosts: (webservers&proxyservers)|^production -> all webservers and proxyservers that are not in production
     hosts: webservers|(debian&ubuntu) -> all webservers running debian and ubuntu
     hosts: webservers|^dbserver -> all webservers not running a database
     hosts: (blade&standalone)|(rhel&fedora) -> all physical boxes running RHEL and Fedora

Would that be more acceptable ?

This would rock, I currently use extravars to set host for most of my playbooks, this is much more elegant.

Brian Coca

I vote in favor, if anyone’s vote counts.
However, I’d like to note that ‘&’ is usually used to express ‘and’, which is closer to intersection (present in both operands), while ‘|’ is usually used to express ‘or’, which is closer to union (present in either operand).

So, I propose a modified version:

‘&’ intersecion (present in both)
‘|’ union (present in either)
‘^’ complement (not present)

I also second the call for making the feature available both in playbooks and the command line. Perhaps by accepting this syntax in the --limit argument. This leaves variables. Would there be a way to set variables based on such constructs?

Note also that Dag’s proposal involves another operator: grouping using ‘( )’ to control precedence.

That Would be Real Powerful, and would solve a LOT of problems trivially.

While I agree that set theory operations sound powerful, we are
required to add things in ways that don't break existing usage, and
don't seem redundant or different. Thus if --limit has to continue
to work the way it works now, and does hosts, I don't want another
syntax that feels completely different in ways where you have to be
able to read both. That's a mess.

When I created Ansible, I desired it to not be a programming language,
and to be maximally auditable. As such, I don't feel that having
stuff like:

hosts: ((webservers|dbservers)&production

is particularly readable or something I want to encourage, especially
when existing systems of host specs are intentionally simpler.

The way this is best handled, in my opinion, is maintaining a seperate
inventory file for your environments, such that inventory for
production and inventory for stage/development is kept seperate, and
then it's not a function of --limit at all.

I'd prefer if we had this conversation first in terms of concrete real
world use cases, and discussed how they could be modelled, rather than
first saying "here's a language feature I want" and adding it.

Maybe I misunderstand how things work but this seems to assume you're
primarily using static inventory files. I'm trying to use ec2.py right now
as my inventory script and I want to be able to the following:

tag_Group_webservers AND tag_environment_production
tag_Group_webservers AND NOT tag_environment_production
tag_Group_webservers AND tag_environment_production AND tag_variant_test

There's some examples where I feel it's impractical to generate dozens of
inventory files to pick and choose from. Maybe I've misunderstood exactly
how things work today but this seems difficult to do as things stand. With
auto scaling groups it's not very practical to do anything but real time
inventory discovery.

I just wanted to get some real use cases in to make sure I understand what
we're talking about here.

So if you're using external sources, you could have scripts that keep
your environments seperate by using seperate config files -- you
wouldn't have to follow exactly what the included EC2 example does.

That all being said my point is the syntax we have established for
hosts and limit needs to continue to work, and this means not
introducing additional newness into it without a way that you are
using the newness, and I'd like to first understand a use case where
the newness is required, so we design appropriately.

This could mean a new "hosts_set:" directive incompatible with the
latter, but I don't like if it doesn't also answer ways to use it with
the existing CLI options.

This might mean "set(...)" as some way of designating the newness,
etc, but I'd like to craft ideas first most around use cases that
*can't* be solved today -- far before we fit an implementation to it
-- and only then, if we need it.

Maybe I misunderstand how things work but this seems to assume you're
primarily using static inventory files. I'm trying to use ec2.py right now
as my inventory script and I want to be able to the following:

tag_Group_webservers AND tag_environment_production

hosts: webservers
limit: production

tag_Group_webservers AND NOT tag_environment_production

hosts: webservers
limit: !production

tag_Group_webservers AND tag_environment_production AND tag_variant_test

I think this is the sticky one.

Currently "hosts" says, "be in one of these groups, and explicitly NOT
in any negated groups"

limit says "in addition to what hosts says, it must also be matched by
something in the limit"

It seems in this case, the proper way to do it in existing ansible
would be to have a production_test flag that selected those systems.
Maybe.

If one wanted to go in the direction of adding intersection to the pattern syntax, without making it too complicated, an alternative that retains the current operators and (basic) semantics could be as follows:

‘:’ = ‘add hosts in the following group to the current set’ (union, as current)
‘:!’ = ‘remove hosts in the the following group from the current set’ (complement, as current)
‘:!!’ = ‘remove hosts not in the following group from the current set’ (intersection, new)

These patterns could have left-to-right precedence and be read as a simple list of “set building instructions” (i.e. not a complex set-theoretic equation). E.g.

webservers:!!debian

Can be read as:

  1. take all hosts in webservers
  2. remove hosts not in debian

I’m not sure that every possible combination can be built this way (although maybe they can?), but it certainly adds a few more options and is simpler than having a fully blown expression language.

Whether that pattern looks like it should do that is a different question, I guess!

To add my two-cents to this one too… This could work:

hosts: webservers
limit:

  • production
  • test

I.e. in play objects limit params can be lists, which would be combined using intersection rules.

From the implementation side this is not a lot different to having to combine a CLI --limit and single play object limit param.

Michael DeHaan wrote:

So if you're using external sources, you could have scripts that keep
your environments seperate by using seperate config files -- you
wouldn't have to follow exactly what the included EC2 example does.

That all being said my point is the syntax we have established for
hosts and limit needs to continue to work, and this means not
introducing additional newness into it without a way that you are
using the newness, and I'd like to first understand a use case where
the newness is required, so we design appropriately.

This could mean a new "hosts_set:" directive incompatible with the
latter, but I don't like if it doesn't also answer ways to use it with
the existing CLI options.

This might mean "set(...)" as some way of designating the newness,
etc, but I'd like to craft ideas first most around use cases that
*can't* be solved today -- far before we fit an implementation to it
-- and only then, if we need it.

So when we had talked about this before, we had discussed simply extending
the hosts: (and ansible target) to accept &group to limit to perform the
intersection. E.g. webservers:!debian:&datacenter1 to limit it to webservers
not running Debian in datacenter1. This would be rather easy to implement,
looks like the hosts declaration already does, and allows expressing most of
the common scenarios.

Adding a limit on the play seems odd and doesn't quite fit with everything
else.

Daniel

Daniel Hokka Zakrisson wrote:

Michael DeHaan wrote:

So if you're using external sources, you could have scripts that keep
your environments seperate by using seperate config files -- you
wouldn't have to follow exactly what the included EC2 example does.

That all being said my point is the syntax we have established for
hosts and limit needs to continue to work, and this means not
introducing additional newness into it without a way that you are
using the newness, and I'd like to first understand a use case where
the newness is required, so we design appropriately.

This could mean a new "hosts_set:" directive incompatible with the
latter, but I don't like if it doesn't also answer ways to use it with
the existing CLI options.

This might mean "set(...)" as some way of designating the newness,
etc, but I'd like to craft ideas first most around use cases that
*can't* be solved today -- far before we fit an implementation to it
-- and only then, if we need it.

So when we had talked about this before, we had discussed simply extending
the hosts: (and ansible target) to accept &group to limit to perform the
intersection. E.g. webservers:!debian:&datacenter1 to limit it to
webservers
not running Debian in datacenter1. This would be rather easy to implement,
looks like the hosts declaration already does, and allows expressing most
of
the common scenarios.

(https://github.com/dhozac/ansible/commit/246472951cb8221aeaf5fc8b5b717909ccb3575a
implements this.)

Daniel

So when we had talked about this before, we had discussed simply extending
the hosts: (and ansible target) to accept &group to limit to perform the
intersection. E.g. webservers:!debian:&datacenter1 to limit it to webservers
not running Debian in datacenter1. This would be rather easy to implement,
looks like the hosts declaration already does, and allows expressing most of
the common scenarios.

I like this syntax very much.

Great job at simplifying requirements.

--Michael

Michael DeHaan wrote:

So when we had talked about this before, we had discussed simply
extending
the hosts: (and ansible target) to accept &group to limit to perform the
intersection. E.g. webservers:!debian:&datacenter1 to limit it to
webservers
not running Debian in datacenter1. This would be rather easy to
implement,
looks like the hosts declaration already does, and allows expressing
most of
the common scenarios.

I like this syntax very much.

I figured you might, it's your original suggestion :wink:

Daniel

I figured you might, it's your original suggestion :wink:

Daniel

Was it?

You should not tell me these things, as I forget them quickly.

--Michael