Absurd run_once behavior, skipping entirely if first node fails a when test

When running a task with run_once, if the first node is skipped, the entire task is skipped, rather than running on the first host that is not skipped.

This behavior is not what is intuitively understood, this behavior is not mentioned in the docs, and this behavior is almost certainly not what most of people want it to do. There are discussions of this in multiple github issues, the most detailed of which is at https://github.com/ansible/ansible/issues/19966, but there are also at least https://github.com/ansible/ansible/issues/11496, https://github.com/ansible/ansible/issues/13226, and https://github.com/ansible/ansible/issues/23594.

I was told by @bcoco to take this here, rather than discuss it in the existing github issues.

Below is an untested simple example of a scenario that would skip the run_once task, when it should (according to the docs, and common sense) run on one of either host2 or host3.

Inventory

`
[all]
host1
host2
host3

`

Playbook

`

  • name: Test Play
    hosts: all
    tasks:
  • include: outer-task.yml
    `

outer-task.yml

`

  • name: Outer task
    include: inner-task.yml
    when: inventory_hostname != ‘host1’

`

inner-task.yml

`

  • name: Inner task
    command: do_something
    run_once: True

`

This issue is exacerbated by the fact that the inner task may have no idea why the first host is skipped (IE: we’re including a reusable task that may get run many times in different ways). In those cases, there is no way to work around the issue with a simple when: inventory_hostname == something, since we don’t know what to check against.

In https://github.com/ansible/ansible/issues/19966, @bcoco proposes a scenario where one would rely on the existing behavior, but in my opinion and that of the other commenters on that ticket, that use case is an incredibly bad practice, as it relies on the specific order of the inventory file. I can think of no sane reason to want the current behavior. If the user is doing crazy things like this, they should just stay on the old broken versions forever, as any update is likely to break their fragile buggy code.

When running a task with run_once, if the first node is skipped, the entire task is skipped, rather than running on the first host that is not skipped.

This behavior is not what is intuitively understood, this behavior is not mentioned in the docs, and this behavior is almost certainly not what most of people want it to do. There are discussions of this in multiple github issues, the most detailed of which is at https://github.com/ansible/ansible/issues/19966, but there are also at least https://github.com/ansible/ansible/issues/11496, https://github.com/ansible/ansible/issues/13226, and https://github.com/ansible/ansible/issues/23594.

It may confuse some people, but it’s both the documented behaviour and the least surprising way to do things. Conditionals should not affect the number of times a run_once task is evaluated even if they result in the task being skipped.

https://docs.ansible.com/ansible/latest/playbooks_delegation.html#run-once says: “When “run_once” is not used with “delegate_to” it will execute on the first host, as defined by inventory, in the group(s) of hosts targeted by the play - e.g. webservers[0] if the play targeted “hosts: webservers”.”

Below is an untested simple example of a scenario that would skip the run_once task, when it should (according to the docs, and common sense) run on one of either host2 or host3.

Inventory

`
[all]
host1
host2
host3

`

Playbook

`

  • name: Test Play
    hosts: all
    tasks:
  • include: outer-task.yml
    `

outer-task.yml

`

  • name: Outer task
    include: inner-task.yml
    when: inventory_hostname != ‘host1’

`

inner-task.yml

`

  • name: Inner task
    command: do_something
    run_once: True

`

This issue is exacerbated by the fact that the inner task may have no idea why the first host is skipped (IE: we’re including a reusable task that may get run many times in different ways). In those cases, there is no way to work around the issue with a simple when: inventory_hostname == something, since we don’t know what to check against.

You’re mixing different ways of limiting where a task runs, with predictable results (the task is assigned to one host, and the conditional results in it being skipped). If you don’t care which host it runs on, use run_once without a conditional. If you want to run it on a specific host, use delegate_to with run_once or a conditional without run_once.

I think you’re confused by what the issue is. Whether I use delegate_to or not is irrelevant. I don’t care which host it runs on, and if I did, I would use the delegate_to. Even if I use delegate_to, it will still be skipped, since it evaluates whether to run the task at all based on the first host. I’m sorry I didn’t include a delegate_to in my example, which lead to this confusion.

http://docs.ansible.com/ansible/latest/playbooks_delegation.html#run-once makes no mention of the fact that even with delegate_to it decides whether to run at all based on the first host. The mention of delegate_to actually makes this more confusing, since that The delegate_to should control where the execution happens (something irrelevant to this issue), not whether it runs at all. That part is at least consistent, since delegate_to does not control whether to run it.

The issue is that run_once is not actually running once. It is “run only if the first node in the play says to run it”, not “run one time if it should run for any host in the play”. The latter is intuitive behavior. You talk of predictable results, and it is not predictable to have behavior that changes based on the order of hosts in your inventory file (the current behavior).

Please note that in my example, the when clause is NOT on the task with run_once. If we make reusable code, we may be including that piece in many places, with or without the when clause.

The behavior is documented via that information provided above. run_once in it’s current form is designed to be consistent and predictable in which host is picked to execute the task against.

When “run_once” is not used with “delegate_to” it will execute on the first host, as defined by inventory, in the group(s) of hosts targeted by the play

If the first host is failed, it is removed from the play list, and run_once will therefore be skipped.

Using delegate_to allows you to define what you believe is consistent or predictable. If you don’t care what host it executes on, using delegate_to can be made to do what you want:

  • command: whoami
    run_once: true
    delegate_to: “{{ ansible_play_hosts|first }}”

ansible_play_hosts is updated as hosts fail.

So if it started as:

“ansible_play_hosts”: [
“host0”,
“host1”,
“host2”,
“host3”,
“host4”
]

and host0 failed, that delegate_to above will utilize host1. Instead of first, something like random could be used too.

If you wish to add further, constructive, clarification to the docs, and potentially examples such as the one I provide above, feel free to submit a documentation pull request.

The issue has nothing to do with delegate_to. The issue has to do with whether it decides to run at all, which delegate_to has no effect on. I don’t care which host it runs on, and if I did, I could use delegate_to as you have noted. The delegate_to directive works properly.

It is totally fine for it to execute on the first host in the inventory, as long as it runs when that host is skipped for the included task book. I apologize if I’m not being clear about what the issue is.

Here’s the same example, updated with a delegate_to, since everyone seems to think that matters.

Inventory

`
[all]
host1
host2
host3

`

Playbook

`

  • name: Test Play
    hosts: all
    tasks:
  • include: outer-task.yml
    `

outer-task.yml

`

  • name: Outer task
    include: inner-task.yml
    when: inventory_hostname != ‘host1’

`

inner-task.yml

`

  • name: Inner task
    command: do_something
    run_once: True

delegate_to: ‘host2’

`

In this example, it should run on host2, but it does not, since host1 skips the entire inner-task.yml. This is the problem.

In my original example, I didn’t care if it ran on host1, as long as it ran, but it doesn’t run at all.

You mean the tasks should run in the first node available? If host1 is
unavailable the tasks should run in the host2, but if host2 is also
unavailable should run in host3 and so on?
I think is a valid concern. Even delegate_to could be set to a host
which may be is unavailable. The idea here is the tasks must run,
don't care which host, but must run anyway. May be an option?

I fully understand what you are saying. However the difference here is that you have a misunderstanding about the feature. You have an idea in your head, that doesn’t match the implementation.

The way run_once works, is that it defaults to execute on the first host in the list of hosts on the play, as defined by inventory. If that host is failed, that task is then skipped. Using delegate_to offers you a way to avoid your specific scenario, as it permits you to change what host ansible targets. Take special care to re-read what I wrote, instead of ignoring it. I recommend using ansible_play_hosts in delegate_to to ensure it always targets an available host. But that may not meet every persons requirements. You will have to implement a delegate_to on that host that properly reflects what host to operate on if the “first” host is not available.

Unfortunately, your expectation doesn’t align with the implementation and our definition of what is expected here.

I’m telling you how to do what you want, within the context of how run_once actually works. We have no intentions on changing how run_once works. You’ll have to operate within the confines of how run_once actually operates.

That’d be a perfectly fine solution, yes. I honestly don’t even care if it always chooses to run on host1, as long as it doesn’t only use host1 to determine if it should run at all.

@Matt Martz I have no problem with how delegate_to works. I don’t care if it executes on host1. The stuff you wrote does not actually change if it gets run at all, only where it would be run if host1 had failed a prior task. It isn’t that it tries to run it and fails, but that it doesn’t even try to run. The host1 is still in the list of play_hosts, even if it is skipped, so it is still used to determine if we should run. I have actually run the code you wrote, and it does not solve this issue.

If you’re worried about breaking some obscure code that relies on skipping the task entirely based on the order of the hosts in the inventory, that’s fine, but the community needs a way to reliably decide to run exactly one time. It’s totally fine for it to be a new directive “actually_run_once”.

Where it executes doesn’t matter, but that it executes at all, does. It is trivial to use the properly working “delegate_to” clause to control where the task is actually run, but it has no effect on if the ansible tries to run it in the first place.

My interpretation of your code is that you are trying to supply a host to execute the task on in the case that host1 has failed out of the execution due to a prior task failure. In my example, host1 is reachable, working properly, and has not failed any tasks. It is simply skipped due to a when clause that is not attached to the task with run_once. It would be perfectly acceptable and in line with the documentation for the task to execute on host1, but instead the entire task is skipped.

The problem is that the decision to run the task is tied to the first host in the play, not that the execution defaults to the first host in the play.

Heres some actual execution output for my second example, and for the one from @Matt Martz.

Mine

`
(.env) [exabeam@ip-10-10-2-162 test]$ ll
total 16
-rw-rw-r-- 1 exabeam exabeam 79 Mar 19 21:57 inner-task.yml
-rw-rw-r-- 1 exabeam exabeam 206 Mar 19 21:59 inventory
-rw-rw-r-- 1 exabeam exabeam 83 Mar 19 21:57 outer-task.yml
-rw-rw-r-- 1 exabeam exabeam 70 Mar 19 21:56 play.yml
(.env) [exabeam@ip-10-10-2-162 test]$ cat inventory
[all]
host1 ansible_host=10.10.2.162
host2 ansible_host=10.10.2.173
host3 ansible_host=10.10.2.206

[all:vars]
ansible_port=22
ansible_ssh_private_key_file=/home/exabeam/devkey.pem
ansible_ssh_user=exabeam
(.env) [exabeam@ip-10-10-2-162 test]$ cat play.yml

  • name: Test Play
    hosts: all
    tasks:
  • include: outer-task.yml
    (.env) [exabeam@ip-10-10-2-162 test]$ cat outer-task.yml
  • name: Outer task
    include: inner-task.yml
    when: inventory_hostname != ‘host1’
    (.env) [exabeam@ip-10-10-2-162 test]$ cat inner-task.yml
  • name: Inner task
    command: hostname
    run_once: True
    delegate_to: ‘host2’
    (.env) [exabeam@ip-10-10-2-162 test]$ ansible-playbook -i inventory play.yml

PLAY [Test Play] ***************************************************************

TASK [setup] *******************************************************************
ok: [host1]
ok: [host3]
ok: [host2]

TASK [Inner task] **************************************************************
skipping: [host1]

PLAY RECAP *********************************************************************
host1 : ok=1 changed=0 unreachable=0 failed=0
host2 : ok=1 changed=0 unreachable=0 failed=0
host3 : ok=1 changed=0 unreachable=0 failed=0
`

@Matt Martz’ (the only difference is the delegate_to line):

`
(.env) [exabeam@ip-10-10-2-162 test]$ cat inner-task.yml

  • name: Inner task
    command: hostname
    run_once: True
    delegate_to: “{{ ansible_play_hosts|first }}”
    (.env) [exabeam@ip-10-10-2-162 test]$ ansible-playbook -i inventory play.yml

PLAY [Test Play] ***************************************************************

TASK [setup] *******************************************************************
ok: [host1]
ok: [host3]
ok: [host2]

TASK [Inner task] **************************************************************
skipping: [host1]

PLAY RECAP *********************************************************************
host1 : ok=1 changed=0 unreachable=0 failed=0
host2 : ok=1 changed=0 unreachable=0 failed=0
host3 : ok=1 changed=0 unreachable=0 failed=0
`

In both cases, the task in inner-task.yml is skipped, since host1 does not match the when clause in outer-task.yml. The delegate_to makes no difference. If I left that out, it would behave the same. The issue is not where it runs, but that it doesn’t run at all. It would be perfectly acceptable for it to execute on host1, which is in line with the docs, but it doesn’t run at all.

My use case involved roles. I had something like

  • hosts: web:app:db
    roles:
  • role: myrole
    when: color == “blue”

In the role, there was a task that ran on localhost (via delegate_to), but only once (via run_once) for the whole batch of hosts.

Everything worked fine, except that if the first host in inventory happened not to be blue, the run_once caused the localhost task to be skipped. The order of the hosts in inventory was completely arbitrary – these were EC2 instances at AWS.

The eventual workaround was to add the when to every single task in the role except the run_once one, which made both the playbook and the role less readable.

I don’t have any hope that the Ansible team will ever address this; for whatever reason, this use case is relatively common among people who aren’t on the Ansible team, and impossible to explain to the Ansible team in a way that anyone finds convincing. We haven’t yet found a blocker that we couldn’t work around in one ugly-ass way or another.

My use case involved roles. I had something like

  • hosts: web:app:db
    roles:
  • role: myrole
    when: color == “blue”

In the role, there was a task that ran on localhost (via delegate_to), but only once (via run_once) for the whole batch of hosts.

Everything worked fine, except that if the first host in inventory happened not to be blue, the run_once caused the localhost task to be skipped. The order of the hosts in inventory was completely arbitrary – these were EC2 instances at AWS.

The eventual workaround was to add the when to every single task in the role except the run_once one, which made both the playbook and the role less readable.

Disregarding the implementation details, run_once: true is effectively the same as adding when: inventory_hostname == ansible_play_batch.0 to the task . As I said before, if that’s not what you want you should instead write a conditional that expresses your actual intent.

Here’s one approach:

`

  • group_by:
    key: color_{{ color | default(‘octarine’) }}

  • name: Run on localhost once
    delegate_to: localhost
    debug:
    msg: His pills, his hands, his jeans

when: inventory_hostname == groups.color_blue.0

`

Or you might opt for something like this, which is overly clever and requires a very recent version of Jinja:

`

  • name: Run on localhost once
    delegate_to: localhost
    debug:
    msg: Suddenly I was a lilac sky
    vars:
    first_host: “{{ ansible_play_hosts | map(‘extract’, hostvars) | selectattr(‘color’, ‘defined’) | selectattr(‘color’, ‘equalto’, ‘blue’) | first }}”
    when: inventory_hostname == first_host.inventory_hostname

`

Or you might restructure the playbook so it only runs the role on blue hosts and doesn’t need a separate conditional, and use run_once on the task. The best approach depends on personal taste and other decisions made in writing the playbook and setting up your Ansible environment.

Yep, your suggestions there are the kind of things I had in mind with the phrase “ugly-ass workaround”. :^) (They’re task-level, and can’t be applied to the inclusion of the role in the playbook; they require baking logic about the way you manage colors into the role; etc.)

run_once: true is effectively the same as adding when: inventory_hostname == ansible_play_batch.0 to the task .

This is a very clear and concise way to put it, and highlights exactly how run_once works, and why it doesn’t mean “run this task once”, but “run this task on the first host in the list of hosts in the play, not the first host that you’re actually running tasks on”. It’s not a guarantee that a task will run once, it’s an alias for a common when pattern.

I’m juggling too many other things to want to put in a documentation PR right now, but if anyone else does, I think this would be useful to clarify. In particular, where the docs say

When “run_once” is not used with “delegate_to” it will execute on the first host, as defined by inventory, in the group(s) of hosts targeted by the play - e.g. webservers[0] if the play targeted “hosts: webservers”.

This approach is similar to applying a conditional to a task

I think it’d be clearer if this said something about how it always executes in the context of the first host, as defined by inventory, in the group(s) of hosts targeted by the play – the delegate_to part doesn’t change that, it just changes which host actually runs the task – and that this isn’t just “similar” to applying a conditional to a task, it’s identical to supplying a conditional to a task, and that in particular, this condition is logical-AND-ed with any other conditions on the task, such that if the task has conditions that cause it to get skipped on the first host in inventory, this condition will cause it to get skipped on all the other hosts as well.

A couple of clarifications, these are important when you hit the
corner case in which it matters:

- its not 'run on the first host in play/inventory' its 'run on the
first host that reaches the task' which means that hosts that fail
and/or are removed in previous tasks are not considered. Normally (in
the absence of failure) this does mean the first host in
play/inventory, changing to other strategies can affect this.

- it is 'mostly' equivalent to `when: inventory_hostname ==
ansible_play_batch.0` but there is one major difference, other hosts
are not 'skipped', they are all given the same status/return from the
single execution.

The feature should really be named
'only_first_host_tries_to_run_and_applies_status_to_rest'. To make it
work as 'run_first_host_that_matches_when' would make the part of
applying the status to all hosts a lot more difficult to do sanely ...
do we set 'skipped' for the ones we skipped? do we set same status for
all hosts?

At this point I don't see us modifying the feature (maybe clarifying
docs?), but I'm open to create a new set of keywords that allows for
the difference in range of behaviors not already available via
conditional construction.

Thanks everyone, I think the root of the issue is finally clear.

I’d love to have a ‘run_first_host_that_matches_when’ keyword, though I understand there might be technical issues getting in the way. I think the most intuitive way to set the status would be to set it for all hosts that match the when, but that’s a bit of an arbitrary thing.

The current set of keywords makes it very difficult to make reusable code. If we want to include roles or tasks conditionally, we simply can’t use run_once safely. Even the “ugly-ass workarounds” mentioned above are not possible in some instances, since the conditions might be completely different depending on where it gets included.

Another alternative would be to expose the list of hosts skipped by outer when clauses in another variable that can be used in inner when clauses. Something like “ansible_not_already_skipped_hosts”. I know the logic is to skip the task, not the host, but when using includes, it’s effectively the same. If we haven’t done any includes with when clauses, it’d be effectively the same as ansible_play_hosts.

If we had a variable like that, I could put something like this in the inner-task.yml:

- name: Inner task command: hostname when: inventory_hostname == ansible_not_already_skipped_hosts[0]`

`

If I needed to save the result to all hosts, I could register the result and then follow up with a set_fact using the var from that one host.

This isn’t as clean as having a ‘run_first_host_that_matches_when’ keyword, but still prevents the “ugly-ass workarounds” from being needed on EVERY included task.

Thank you everyone who chimed in on this discussion, it’s really helped me understand how run_once works.

As for workarounds - Alex I think you were on the right track with the include, but it needs to be dynamic. I think this does what you want:

inventory:

machine-a machine-b

playbook.yml:

`

  • hosts: all
    gather_facts: no
    tasks:

  • name: first task
    ping:

  • name: include for second task
    include_tasks: task.yml
    when: inventory_hostname == ‘machine-b’

`

task.yml:

`

  • name: second task
    ping:
    run_once: true

`

Because this include is dynamic instead of static, the first host - “machine-a” - doesn’t encounter the “run_once” task at all, so it runs when the second host reaches it. (The reason this doesn’t happen with your example in the first message on this thread is because the import is static: the include task runs for all machines, and the condition applies to the tasks it includes, so it’s equivalent to having the “when” written on the run_once task.)

This example is trivial, but in a larger setup this could be used to run a task only once, on a host matching certain conditions, in a way that is far less fragile than depending on inventory order. Hopefully this is helpful to anyone trying to combine run_once and conditions.

I updated the docs in an effort to clarify this,
https://github.com/ansible/ansible/pull/37754, any suggestions that
help avoid more confusion on this subject are welcomed.

Thank you Brian for updating the docs. That makes it much more clear.

And thank you James for the workaround!!! The change to dynamic include_tasks rather than the older static include statement seems to work great.

Note for people on older releases, you have to be running at least Ansible 2.4 to have include_tasks.

Further note for people on older releases (2 or newer I believe, < 2.4):

You can achieve the same thing as include_tasks with include and static: no, like so:

`

  • name: include for second task
    include: task.yml
    static: no
    when: inventory_hostname == ‘machine-b’

`

Hope that helps!