include statement behavior, also example of an involved playbook collection

I've been doing some experimenting to figure out the exact nature of
the include syntax. Here's something that surprised me (because I was
wrong about what I assumed it was going to do):

- include: tasks/java-app.yaml
  tags:
    - update
    - rollbounce

I assumed that syntax would associate the update and rollbounce tags
with the include statement, the same way it does with action type
tasks. I guess the syntax is slightly different for include
statements. The correct syntax is include: tasks/java-app.yaml
tags=update,rollbounce

When I saw the "include: $file $tags" syntax initially my hopes were
that something like the following would work:

- include: tasks/java-app.yaml tags=stop
  tags: [update,rollbounce]

- include: tasks/puppet.yaml tags=run
  tags: update

- include: tasks/java-app.yaml tags=start
  tags: [update,rollbounce]

$ ansible-playbook services/production/java-app.yaml --tags=rollbounce

Expected (wishful) behavior:

The include's tagged with rollbounce are selected for evaluation.
Those are the first and last in this case. When the include is
evaluated "tasks/java-app.yaml tags=stop" would include the
java-app.yaml file and select all tasks tagged 'stop' for evaluation.
Likewise, "tasks/java-app.yaml tags=run" would run the necessary
start-up commands.

$ ansible-playbook services/production/java-app.yaml --tags=update

Expected (wishful) behavior:

All three includes are selected for evaluation. When each is evaluated
the only the tags they specify (or set globally) will be selected.

$ ansible-playbook services/production/java-app.yaml

Expected (wishful) behavior:

Same as --tags=update. Because no tags are selected in the 'top level'
all three include statements are selected for evaluation.

Why would I want this behavior? It lets let me group my tasks in a way
that lets me very succinctly express my goals in a play. I have a lot
of clusters with very similar release processes, so this would mean
much greater code reuse potential. And it would work without having to
break the tasks files into dozens of more little files. All the tasks
related to similar actions can be stored together and subsets from
each yaml file are selected to be ran when the file is imported.

I've assembled a playbook collection that has the *theoretical*
required functionality to handle releases on my primary JBoss web
application clusters. It was designed with supporting releases in
multiple data centers (not at the same time) in mind. The playbooks in
the java directory have two run modes: tags=update and
tags=rollbounce. No tags means an update run is executed.

With a little tweaking and maybe a wrapper script for setting the
hosts parameter, I could consolidate the three into one single
playbook to and manage my qa servers in the phx1 datacenter, and the
stage/prod servers in my phx2 DS from the same playbook.

The only problem right now is how I originally misinterpreted the
behavior of the include statement (and hopefully nothing else!).

I'll probably try to write a patch which provides the described
functionality in some way (if there are no objections). I'm really
keen on how it works out for organizing files and the expressive power
it gives me in playbooks. It's like a much simpler version of the
nasty hack that I'd be required to write if I was trying to implement
all of this with 'only_if'. Though I'm not sure include statements
currently support only_if anyway.

Here's the playbook collection if anybody would like to take a look at it:

http://tbielawa.fedorapeople.org/ansible-playbooks/

I've been doing some experimenting to figure out the exact nature of
the include syntax. Here's something that surprised me (because I was
wrong about what I assumed it was going to do):

- include: tasks/java-app.yaml
  tags:
    - update
    - rollbounce

I probably should re-explain parameterized includes a bit -- and
explain why the following:

    include: tasks/java-app.yml tags=foo

Doesn't do what you think it does.

When you pass things to include in the following form:

include: tasks/java-app.yaml arbitary_parameter=foo

What you are doing is making the template parameter
${arbitrary_parameter} usable within that particular file. It allows
you to template out the included file before it is processed.

The analogy here of parameterized includes is a simplication of what
puppet calls a define -- which was really a great concept, just
semantically a bit tricky to use, and they had the mess of "classes
versus types versus defines vs parameterized classes"... and it was
kind of it's whole own objecr oriented programming ecosystem with
declarativeness all mixed in at the same time. This is something I
very much wanted to allow in Ansible, but I wanted it simplified, so a
discussion didn't evolve into an OO-programming like conversation.

The use case for parameterized includes is like this:

Imagine I have a wordpress.yml file that installs an instance of
wordpress to create a blog on a server. The idea is I can reference
this same include file multiple times for different users to set up
multiple wordpress instances:

include: tasks/wordpress.yml user=bob
include: tasks/wordpress.yml user=timmy

So that is what the existing parameterized include stuff does.
However, it doesn't technically do tags. Here's a gross hack:

- include: tasks/foo.yml tags=asdf

===== inside: foo.yml ===

- action: blarg
  tags: $tags

And that might actually work. Alas, it doesn't support lists.

So this might work too:

tags: {{ '$tags'.split(",") }}

but that's ugly as sin and I think we can do better. Which gets me
back to your original question...

When I saw the "include: $file $tags" syntax initially my hopes were
that something like the following would work:

- include: tasks/java-app.yaml tags=stop
  tags: [update,rollbounce]

Alas it does not, however I think that would be reasonably easy/small
patch to add and I'd love to have it.

Totally agree that would be nice to have.

If you include a play, it should probably do the same.

(Note that the included thing may already have tags, so it should only
supplement those tags, not override them.)

--Michael

I probably should re-explain parameterized includes a bit -- and
explain why the following:

    include: tasks/java-app.yml tags=foo

Doesn't do what you think it does.

When you pass things to include in the following form:

include: tasks/java-app.yaml arbitary_parameter=foo

What you are doing is making the template parameter
${arbitrary_parameter} usable within that particular file. It allows
you to template out the included file before it is processed.

The analogy here of parameterized includes is a simplication of what
puppet calls a define -- which was really a great concept, just
semantically a bit tricky to use, and they had the mess of "classes
versus types versus defines vs parameterized classes"... and it was
kind of it's whole own objecr oriented programming ecosystem with
declarativeness all mixed in at the same time. This is something I
very much wanted to allow in Ansible, but I wanted it simplified, so a
discussion didn't evolve into an OO-programming like conversation.

Thanks a whole BUNCH for clearing that up.

The use case for parameterized includes is like this:

Imagine I have a wordpress.yml file that installs an instance of
wordpress to create a blog on a server. The idea is I can reference
this same include file multiple times for different users to set up
multiple wordpress instances:

include: tasks/wordpress.yml user=bob
include: tasks/wordpress.yml user=timmy

I understand now how it's directly analog to a puppet define. Very cool.

So that is what the existing parameterized include stuff does.
However, it doesn't technically do tags. Here's a gross hack:

- include: tasks/foo.yml tags=asdf

===== inside: foo.yml ===

- action: blarg
  tags: $tags

And that might actually work. Alas, it doesn't support lists.

So this might work too:

tags: {{ '$tags'.split(",") }}

but that's ugly as sin and I think we can do better. Which gets me
back to your original question...

When I saw the "include: $file $tags" syntax initially my hopes were
that something like the following would work:

- include: tasks/java-app.yaml tags=stop
  tags: [update,rollbounce]

Alas it does not, however I think that would be reasonably easy/small
patch to add and I'd love to have it.

Totally agree that would be nice to have.

If you include a play, it should probably do the same.

(Note that the included thing may already have tags, so it should only
supplement those tags, not override them.)

--Michael

Awesome, I'm glad you're open to this!

Also, on syntax:

The way I showed this in the original post vs the current Ansible
syntax would break existing behavior. But it would also make it
consistent with the 'action' syntax.

To summarize the changes (please excuse the verbosity, I'm writing
this with future documentation in mind):

1. Using 'tags=foo,bar' in the include line would no longer apply tags
TO the include statement. Instead, that would work much as a filter()
function works on lists in python. Only tasks that match the given tag
expression would be selected.

2. And for applying tags TO an include statement you would specify
them as they are in the 'action' type. These tags would augment any
existing tags supplied in the included document.

Syntax: Optional values are shows in 's, '...', indicates 'one or
more of', $parameter order is not significant:

- include: afile.yaml [$tag_select_spec] [$arg_spec ...]

And tags is specified like it is for action, a single string, or a
YAML list of tags.

- include: foo.yaml puppet=labs r=path tags=ibm,motorola,redhat
  tags:
    - companies

Have some thoughts on how you'd prefer I approach implementing this?
I'm thinking of implementing it as a dumb preprocessor somewhere
around the part of the stack where parse_yaml_from_file() is invoked,
probably (maybe) after setup happens and facts have been gathered.

When an include statement is reached the $tag_select_spec is parsed,
the included file internalized into a datastructure, variable
substitution and injection would happen, then the list of tasks is
filtered for items matching the $tag_select_spec. Finally, the
filtered datastructure is substituted in place of the original include
statement.

The obvious shortcoming of this method is how it only discusses
'include:'s in tasks lists and doesn't take includes at the top-level
into account.

Here's a commented example showing how this would work out:
http://pastebin.com/dc4kWrrZ

(Note that the included thing may already have tags, so it should only
supplement those tags, not override them.)

--Michael

Awesome, I'm glad you're open to this!

Also, on syntax:

The way I showed this in the original post vs the current Ansible
syntax would break existing behavior. But it would also make it
consistent with the 'action' syntax.

To summarize the changes (please excuse the verbosity, I'm writing
this with future documentation in mind):

1. Using 'tags=foo,bar' in the include line would no longer apply tags
TO the include statement. Instead, that would work much as a filter()
function works on lists in python. Only tasks that match the given tag
expression would be selected.

Confused. It doesn't actually apply tags to the include statement now
either. It just sets a variable.

I do not support the idea of filtering based on tags here, seems non-intuitive.

2. And for applying tags TO an include statement you would specify
them as they are in the 'action' type. These tags would augment any
existing tags supplied in the included document.

Do not understand.

Syntax: Optional values are shows in 's, '...', indicates 'one or
more of', $parameter order is not significant:

- include: afile.yaml [$tag_select_spec] [$arg_spec ...]

And tags is specified like it is for action, a single string, or a
YAML list of tags.

- include: foo.yaml puppet=labs r=path tags=ibm,motorola,redhat
  tags:
    - companies

Double usage of tags here seems very awkward.

Have some thoughts on how you'd prefer I approach implementing this?
I'm thinking of implementing it as a dumb preprocessor somewhere
around the part of the stack where parse_yaml_from_file() is invoked,
probably (maybe) after setup happens and facts have been gathered.

I'm not sure I agree on the feature yet.

What I was saying was ok, if you specify "tags=abc,def,ghi" then those
tags on the include
are automatically applied to included tasks.

Thus nothing more than:

include: tasks/wordpress.yml user=timmy tags=timmy_blog

(Alternatively, don't allow that and make tags a seperate element --
but don't do both)

Result = While wordpress.yml does not mention timmy, I could run
everything in it with --tags timmy_blog

This is what I would expect it would do if I tried it.

I don't think the above is super-critical, but it would be useful.
I'm not convinced we need the filtering logic, because if you wanted
to include just a few tasks, you could break the tasks into multiple
files and just include the ones you need.

--Michael

I think I might have been approaching this in the wrong way. I'm going
to see what kind of results I can get with smarter tagging at the task
end of things.