RFC: with_nested_dependents plugin (and the future of nested loops)

I wrote a loop (== lookup) plugin to solve a problem I had, and felt it
may be worth explaining the motivation in some detail. My examples here
are AWS-based, but please note that neither the problem nor the solution
I describe are in any way AWS-specific.

Suppose I start with a list of regions:

    regions: ['us-east-1', 'us-west-2', 'eu-west-1']

With the ec2 module, I can do this (don't worry about how the instance
ids for a single region are retrieved for now):

    - ec2:
        state: absent
        instance_ids: …list of instances in this region…
        region: item
      with_items: regions

But with ec2_vpc or ec2_key (or many other modules), I can't pass in a
list of ids to remove, only one at a time. So I want to do this:

    foreach (regions) {
        vpcs = vpcs_in_region(«this region»)
        foreach (vpcs) {
            remove(«this vpc» in «this region»)
        }
    }

But there's no way to write a nested loop of this kind in Ansible, where
evaluating the sub-list requires the value of the current item of the
outer list(s). Ideally, I would like to be able to do this:

    - block:
        - …get list of vpcs in current region…
        - ec2_vpc:
            state: absent
            vpc_id: "{{ vpc }}"
            region: "{{ region }}"
          with_items: vpcs
      with_items: regions

This has two problems: (1) the loop variable is always named "item", so
there's a conflict; (2) blocks can't currently take with_items, and the
PlayIterator would need significant changes to allow nesting using that
construct. But a solution to those problems would be the ideal: easier
to explain and use than with_nested/with_subelements etc., and more
general. I wish that's how it had worked from the start.

My first attempt to solve this involved a custom "with_dict_of_arrays"
lookup plugin. If I could build a dict like this:

    { 'us-east-1': [x,y,z],
      'us-west-2': [p,q,r],
      'eu-west-1': [a,b,c] }

where a…z are vpc_ids, then I could do this:

    - ec2_vpc:
        state: absent
        region: "{{ item.key }}"
        vpc_id: "{{ item.value }}"
      with_dict_of_arrays: vpcs_by_region

We would iterate over (us-east-1,x), (us-east-1,y), …, (eu-west-1,c) in
this case, and it would work fine.

But building the required dict is a bit horrid. I have to find the
intersection of all hosts in the cluster (via a unique tag) and the
hosts in each region I'm interested in, and then extract the vpc_id
from each one in turn. That looks like this:

    - set_fact:
        vpcs_by_region: "{{ vpcs_by_region|default({})|combine({item: groups[cluster_tag]|intersect(groups[item])|map('lookup', hostvars, 'ec2_vpc_id')|unique|list}) }}"
      with_items: regions

groups[cluster_tag] is all my hosts. groups[item] is all the hosts in
the "item" region. The intersection is the hosts in the cluster and the
region. Then I look up hostvars[h].ec2_vpc_id for each host, via another
custom plugin (the details of which aren't important here).

Note that all this is coming from the inventory (i.e. ec2.py), so I
can't statically define a dict-of-lists or list-of-lists beforehand. I
have to build it up somehow (and set_fact/combine in a loop is the only
way I could find to do that).

With the new nested_dependents lookup plugin, the above becomes a little
easier to follow:

    - ec2_vpc:
        state: absent
        region: "{{ item.0 }}"
        vpc_id: "{{ item.1 }}"
      with_nested_dependents:
        - regions
        - groups[cluster_tag]|intersect(groups[item.0])|map('lookup', hostvars, 'ec2_vpc_id')|unique|list

The basic idea is that:

    - foo: …
      with_nested_dependents:
        - expr_a
        - expr_b
        - expr_c

translates to this:

    item =
    list = evaluate(expr_a, item)
    for l in list:
        item.append(l)
        nextlist = evaluate(expr_b, item)
        for l2 in nextlist:
            item.append(l2)
            nextnextlist = evaluate(expr_c, item)
            for l3 in nextnextlist:
                …run task foo…

In other words, when evaluating expr_b, you can use item.0 to refer to
the current element in the expr_a; when evaluating expr_c, item.1 refers
to the current element in expr_b, etc. The expression is re-evaluated
each time the outer loop variable changes.

Here's another (real) example from one of my playbooks, which would be
quite painful to express in some other way:

    - name: Remove VPC subnets in each region
      ec2_vpc_subnet:
        state: absent
        region: "{{ item.0 }}"
        vpc_id: "{{ item.1 }}"
        cidr: "{{ item.2 }}"
      with_nested_dependents:
        - regions
        - groups[cluster_tag]|intersect(groups[item.0])|map('lookup', hostvars, 'ec2_vpc_id')|unique|list
        - instances|selectattr('region', 'equalto', item.0)|map(attribute='subnet')|unique|list
      tags: ec2_vpcs

The plugin itself is pretty simple, just straightforward recursion. I've
attached the source here for anyone who is interested.

Comments and suggestions welcome.

-- Abhijit

(attachments)

nested_dependents.py (3 KB)

Ideally, I would like to be able to do this:

    - block:
        - …get list of vpcs in current region…
        - ec2_vpc:
            state: absent
            vpc_id: "{{ vpc }}"
            region: "{{ region }}"
          with_items: vpcs
      with_items: regions

This has two problems: (1) the loop variable is always named "item", so
there's a conflict; (2) blocks can't currently take with_items, and the
PlayIterator would need significant changes to allow nesting using that
construct. But a solution to those problems would be the ideal: easier
to explain and use than with_nested/with_subelements etc., and more
general. I wish that's how it had worked from the start.

Note that (1) is a problem that many people have complained about over
the years, and I believe the maintainers have said that they would like
to fix it someday.

I think it would be best to leave the existing with_ loop plugins alone,
and instead suggest the use of a new language construct like this:

    - block:
        - ec2_vpc: state=absent vpc_id="{{ v }}" region="{{ r }}"
          foreach: v in vpcs
      foreach: r in regions

Maybe it should be called "loop" instead of "foreach"; maybe we want to
use «['v', vpcs]» instead of «v in vpcs». Whatever. Those are relatively
minor details. What's important is that:

(a) the looping construct looks entirely different from with_x,
(b) that it has a way to specify the name of the loop variable, and
(c) that it integrates with the nesting already provided by blocks.

Since with_x loops can only apply to tasks, there are no additional
backwards-compatibility problems to doing it this way, and existing
loops can work unmodified.

In terms of implementation, (a) and (b) are relatively simple. The real
changes would be in PlayIterator/executor to support the looping.

Thoughts?

    item =
    list = evaluate(expr_a, item)
    for l in list:
        item.append(l)

(Aside: this pseudocode is a bit broken, in that it keeps appending
stuff to item forever. The real code doesn't do that.)

-- Abhijit

Heh, I wrote something similar a while ago: http://grokbase.com/t/gg/ansible-devel/156s9cqqth/with-recursive-a-lookup-plugin-that-chains-other-lookups.

Why should loops over blocks be diffrerent than loops over include? They’re the same thing basically…

I don't think they should be different.

But the naïve approach to allowing with_items on a block would simply
push the loop clause down into each task in the block, i.e.

    - block:
        - a
        - b
        - c
      with_items: [1,2]

…would result in a being executed twice, followed by b being executed
twice, and so on, rather than "a,b,c" being executed twice. (I didn't
try very hard to implement this yet; I'm taking @jimi-c's word for it
that it would require extensive PlayIterator changes following an IRC
discussion a couple of weeks ago.)

Instead, I *want* this to work in the same way that a loop over an
include would work: execute all tasks for each element in the list. But
I definitely want to be able to do this without creating another include
file for each level of nesting. In both cases, the 'item' naming problem
would need to be solved.

If this is easier to accomplish than I had thought, I will be thrilled.

-- Abhijit