Given the following inventory:
`
[common:children]
groupA
groupB
[groupA]
hostA
[groupB]
hostB
`
And the following playbook (site.yml):
`
(attachments)
0001-playbook-should-continue-to-the-next-play-even-if-th.patch (1.25 KB)
Given the following inventory:
`
[common:children]
groupA
groupB
[groupA]
hostA
[groupB]
hostB
`
And the following playbook (site.yml):
`
0001-playbook-should-continue-to-the-next-play-even-if-th.patch (1.25 KB)
This is intended behavior.
Ansible removes a host from a pool and will stop a deployment if no members in a group were successful.
This number can actually be made more strict with max_percentage_fail adjustments - which is not exclusively restricted to rolling updates and the serial keyword.
http://docs.ansible.com/playbooks_delegation.html#maximum-failure-percentage
If you include plays in the second part, just run that second playbook directly and you can skip the offending portion.
I also have a use case where I would like the playbook to continue even though I have plays that act on one host and that host may have failed earlier. I tried to set the max_fail_percentage = 100% which I expected could not be exceeded so it would continue, but the playbook still fails out. Would that be expected behavior or is there no way to continue on in the playbook?
So let’s make sure I’m understanding the use case – you are using a rolling update and you want it to continue on and update as many hosts as possible, and keep going even if an entire previous “batch” fails?
This seems a bit dangerous so wanting to understand the “why”, which may help answer the “how”.
In the future this is probably a good question for ansible-project list, as it is usage related versus about developing code for Ansible.
Thanks!
I am actually not using a rolling update. I currently use a single large playbook ( with lots of roles and includes ) to install our entire environment. The system includes 5 different server types and some different clients as well. But for us, if one of the components fails a part of the install ( like setting up on the apache servers ) it doesnt affect the installation of the rest of the system. We can go back and fix that one component after all the other components install. We cannot always do this, and all the components are installed together because there is a level of coordination that happens sometimes. For instance you cannot setup apache until you enroll the machine with the pki system which you cannot do until the pki system has been installed, however there are multiple apache servers and if one of them fails to be configured correctly, there is no component that would fail later in the playbook. Obviously the system will not work correctly until the failure is fixed, but the one host failing kills all the other hosts from finishing. I guess I was just looking to let all the others finish, and fix the one failure seperately. I have included a snippet of our playbook with where I want to put the max_fail_percentage: 100. I do not want to ignore the failures, I just would like to move on in the playbook ( without the host that failed ).
`
roles:
{ role: common, tags: [ install ] }
auditd
ssh
hosts: common:&windows
sudo: yes
roles:
I’ve run into the same thing and am curious if there is known better way to do it.
In my case, I’ve separated my playbooks and site.yml includes them all.
The problem is that if all hosts from play1 fail, then play2 never executes.
cat stuff-pass.yml
`
Never mind my last post, I didn’t see this one when I replied.
I think the confusion is that ‘stop deployment’ apparently means ‘stop deployment for all groups’. The documentation is ambiguous and my interpretation was always that it meant ‘stop deployment for that group’.
I’m going to assume that there is no mechanism to change this behavior.