Hi.
I’m working on some orchestration where I need to run a task across sets of N remote nodes. If that task fails on any one of the remote nodes, the orchestration needs to halt (or be handled somehow). In my test, I cause one node to fail and I expected the entire ansible run to bomb out, but that’s not what happened. The failed node is reported, but the playbook continues on.
How can I make ansible exit upon the failure of any one of these nodes?
Or, how can I have some kind of handler to pause the run before continuing? (I’ve not yet looked into handlers)
Playbook, plays, tasks, and output are shown below. One question about the output: for the node that failed, the task “debug: var=output” is absent. That task only fires for the successful node. Should I expect that task to also fire for the failed node? I was surprised by that.
Thanks!
kallen
`
$ cat testplaybook.yml
Hi
Hi.
I'm working on some orchestration where I need to run a task across
sets of N remote nodes. If that task fails on any one of the remote
nodes, the orchestration needs to halt (or be handled somehow). In my
test, I cause one node to fail and I expected the entire ansible run
to bomb out, but that's not what happened. The failed node is
reported, but the playbook continues on.
That is by design.
How can I make ansible exit upon the failure of any one of these
nodes?
http://docs.ansible.com/ansible/playbooks_delegation.html#maximum-failure-percentage
You can set mail_fail_percentage: 0
Or, how can I have some kind of handler to pause the run before
continuing? (I've not yet looked into handlers)
Don't think so ....
Playbook, plays, tasks, and output are shown below. One question about
the output: for the node that failed, the task "debug: var=output" is
absent. That task only fires for the successful node. Should I expect
that task to also fire for the failed node? I was surprised by that.
No - once a node fails (without "ignore_errors: True"), it is no longer
part of the remainder of the play, so no further tasks will be executed
on the failed node.
Hope this helps
Ah! Fantastic. Thank you. I put in max_fail_percentage, and the thing I wanted to happen happened.
I do wonder about how to more elegantly handle one of the nodes failing, with a handler. Like something simple to start: “prompt: pause here, go fix that node if you can. If you can’t, ctrl-c now.” Perhaps I should add “ignore_failures: true” and experiment?
It’s strange … I do have another task that runs per webapp node that runs a local check script – it’s a ruby program that will exit non-zero upon error condition. When any node has failed that check, the ansible run comes to a screeching halt. That play contains no max_fail_percentage and no ignore_failure: true.
We use ansible 1.8.2.
I’ll move forward with your advice. And, FWIW … this bombs out the entire run when any node fails:
`