workaround for serial: 1 failures stopping the entire playbook?

vampyreapocalypse · April 2, 2019, 3:41pm

Hello,

This has probably been addressed 1000 times before, but I can’t seem to find an answer (if this is even possible) on how, when running a play within a playbook on serial: 1, to have a node fail a task that would be fatal for the node, but not for the remaining nodes that have not run yet, and Ansible skip the rest of the play for just that one node, moving on to the next node in the batch.

I have a scenario where I want to perform OS patching on a large-ish group of servers in a hadoop cluster with no downtime to the cluster itself. So I am using serial: 1 when performing the patching tasks for each node - put it in maintenance mode, take it out of the cluster, patch, reboot, re-join the cluster, and do some basic health checks.

However if any one of these tasks fails in serial: 1 mode, Ansible considers the entire play failed and will not run against any remaining nodes. Since this is a large cluster (50 nodes), a failure on a single node isn’t a showstopper and shouldn’t stop the rest of the nodes from performing their OS patching.

I’d like to know if there is a way around Ansible stopping an entire play for all nodes if a single node fails when running in serial: 1. From what I’ve read on the google there doesn’t seem to be a way to do this short of setting serial: 2(+), but I thought I’d ask.

system · April 3, 2019, 7:49pm

there are several ways, the simplest might be putting the whole thing
in a 'block' with a 'rescue' that always succeeds so it will go to the
next host.

vampyreapocalypse · April 10, 2019, 5:07pm

Brian,

Thanks for the reply on this. I will definitely test this out in my plays.

Andrew

rjwagner.dba · October 22, 2019, 2:37pm

Hey Andrew - were you able to get anywhere with this? I tried adding a block/rescue without any luck. Searching all morning for a way to make ansible move onto the next host in a serial strategy even if one task on one host fails. I’m thinking it’s not possible.

Rob

Kai_Stian_Olstad · October 22, 2019, 3:32pm

It is possible.

Topic		Replies	Views
Serial failure, fail remaining hosts Ansible Project	0	9	May 9, 2019
Using serial strategy for playbook imported in a toplevel playbook causes entire play termination if some host is unreachable Ansible Project	1	3	March 20, 2018
Changed behavior with Serial? Ansible Developer	0	2	February 19, 2016
Prevent single failure killing playbook Ansible Project	2	25	December 20, 2021
run a specific task sequentially for each host Ansible Project	7	95	February 3, 2014

workaround for serial: 1 failures stopping the entire playbook?

Related topics