Exit role without failure (stop processing role on condition) without ending play or stoping execution of other roles in a stack/ Ensure all roles in a stack get run even if one ends with failure

sinnl · December 30, 2022, 12:16pm

Hi,

Trying what I believe to be common use case where a playbook has set of roles to configure system.

this looks something like

name: Do X, Y and Z
hosts: all

roles:

role: roleA
var1: value1
var2: value2
role: roleB
var1: value1
var2: value2
role: roleC
var1: value1
var2: value2

I found 2 issues with this set up.

1 - I have checks in each role to end_host in case there is no need to run the playbook. I do understand that with correctly written (idempotent) playbook this is not necessary but I found it to be very useful as one, it does save a lot of execution time, especially when your play is big and inventory even bigger but also when you are creating timestamped audit logs that you want to be executed only if change happened. I also considered breaking up tasks in more files so that they can be included on condition but depending on the role it can get bit messy and hard to follow for other contributors.
I’ve been using end_host but it ends play and not a role. Is there a way to end role and not play, and most of all not stop processing of other roles in the list?

2 - If role fails due to unhandled error, fail hat role but again not a play, and most of all do not stop processing of other roles in the list?

I’ve looked into ignoring_errors but again that work on play level and not role level. If role A fails ignore_errors will allow next play to be executed but not roles B and C

So far the only solutions I can think of are:

Have play for each role - not the end of the world, just a lot of unnecessary code
- in AWX, have a playbook/job template for each role and string them together using workflow - again a lot of unnecessary code and potentially job templates.

Is there a way to deal with 2 issues mentioned above? What is the standard approach to dealing with roles failing or exiting roles on condition when executed in a stack/list?

Thanks!

Robert_Sossomon · December 30, 2022, 2:18pm

You can do a sanity check at the start of each role, if the end result is good, then you can skip everything in between by blocking them and then using a when condition on the blocks. SOME “unnecessary” code adds, but gives you also the ability to “chunk” it up and add an overall time-stamp on each chunk.

Then set the “failed when” condition on the items you can safely ignore. Again it’s a bit outside the idempotent method, as you shouldn’t be failing, but in some cases that’s a generally acceptable method to fail-forward.

Generally speaking, I prefer to sanity check for end result on something that would “fail” to apply or perform correctly, then use the sanity check variable to “when” clause the actual work.

ie: check for an application version, failed when host is offline, register output, then use output result in the next to install application… 1 extra task, 1 extra line on the actual work task.

sinnl · December 31, 2022, 12:37pm

Thank you for your suggestion.

I’m afraid though it does not help. Blocking this way in large role makes it quick complex and adds a lot of code, making it harder to read and maintain.

fails same as end_host will prevent other roles in the stack from being executed which is really my main issue.

I do also have checks which is really source of this issue. Consider below
if condition A is true exit else continue checking Condition B (if true exit else continue ) → etc. Now you have 8 or 10 of those. It’s much cleaner to say end role if condition is met (or not depending on what you are trying to achieve) then having block for all of those conditions.
Simple example of this might be roleA supports only RHEL while roleB supports only Ubuntu - Let’s say we are installing something that only is available on one of those (more realistic would be if we consider versions of distros but using distors to keep is simple).

plybook:

name: Install A and B
hosts: linux_servers

roles:

roleA # Installs software A that only supports RHEL
roleB # Installs software B that only supports Ubuntu

both of the roles have a condition at the top of the main file that checks if this is indeed supported OS and exists if it is not.

In that case if we execute role A on system that is not RHEL we will never get to execute roleB on the same server. This means Ubuntu servers will never be managed.

Again I know this can be done with include_taks for specyfic distro without need to end role but this is very simple example, to illustrate that it often makes more sense to end role if condition is/is not met then trying to work around it.

Even in this simple example if we would have inlcude_taks for specyfic distro and we suddently run on distro we have not included in our logic then role fails. This again results in skipping all other roles in the stack (end if RHEL - where we don’t really care what distro it is unless it’s RHEL VS include tasks for RHEL or Ubuntu where we need to have one for each distro we might need to run the role on - and there are many )

Hopefully this makes sense

Lukasz

Kevin_Knox · January 1, 2023, 1:52pm

Have you considered breaking each role out into its own playbook, then combining the playbooks into a single workflow template? You can transfer all the data you need between roles with the set_stats command, and you can set each play to run whether the previous succeeds or fails.

Robert_Sossomon · January 1, 2023, 5:33pm

Yeah, I’d do this as a workflow job and either let smart inventory handle the whole mess or set them all to run regardless of failure state, that’s going to be the only way I can see to make something easy to maintain and working.

Otherwise you are going to have to set the variables and do the when checks. Which frankly isn’t as cumbersome because you can re-use the variables and if you take a holistic approach isn’t a bad option…

Regardless you are going to be adding/maintaining a larger stack of code.

The question is really HOW you want to be maintaining it for the future.

If you are installing packages, use package instead of rpm or apt. If you are checking for things based off OS version, you have to sanity check it. If your environment is that diverse, you have to go with what makes sense in the long haul.

sinnl · January 2, 2023, 2:51pm

Thank you for all the comments and advice.

I did considered using workflows as well and in general that not the worst idea however it requires all roles be job templates which again might not be end of the world but also maybe not always necessary. My biggest problem with this approach is a mess that comes out as a result of workflows and slices when it comes to logging. In my example I have 11 roles I need to execute as part of a baseline and this will only grow with time, each job is executed in 4 slices due to the size of the inventory. This means going through the logs to find information about particular host across all jobs (which there’s going to be 44 of) is not not going to be fun.

Still I do agree all your suggestions are viable options and probably best that can be done if there really is not way to just end role and not play. Seems like pretty handy functionality to have. Or even way to change default behaviour of processing list of roles where hosts for which role failed/ended are not removed from inventory (marked as hosts_with_errors) that is used by next role. It would be maybe even nicer to be able to run meat: clear_host_errors between stacked roles.

Once again thanks for replies.

Lukasz

sinnl · January 7, 2023, 1:22pm

In case anyone find this helpful I have ended up doing below. From initial testing it seems to be doing what I need but further testing is required.

ignore_errors on role level allows for using end_host in roles and rescue block will clear all failed hosts making sure next play targets full inventory.

name: Play1
hosts: all
gather_facts: false
ignore_errors: true

tasks:

name: Stuff1
block:
name: Include stuff1
ansible.builtin.include_role:
name: stuff.yml
rescue:
name: Clear errors
ansible.builtin.meta: clear_host_errors
name: Play2
hosts: all
gather_facts: false
ignore_errors: true

tasks:

name: Stuff2
block:
name: Include stuff2
ansible.builtin.include_role:
name: stuff2.yml
vars:
var1: ‘XXX’
rescue:
name: Clear errors
ansible.builtin.meta: clear_host_errors

Lukasz

Kevin_Knox · January 7, 2023, 1:47pm

Well found. Thank you for sharing back. This is a thing of beauty.

Kevin

Topic		Replies	Views
Exit role without failure (stop processing role on condition) without ending play or stoping execution of other roles in a stack/ Ensure all roles in a stack get run even if one ends with failure Ansible Project awx	0	96	December 30, 2022
How to end play in a role based on condition, but keep playing the rest of the roles Ansible Developer	0	303	December 21, 2021
BUG or Operating As Designed? Ansible Project	2	0	February 14, 2018
Ansible - roles and tasks execution handling in playbook Ansible Project	3	5	May 12, 2018
How to abort just one role and let the others run Ansible Project	9	23	September 18, 2014

Exit role without failure (stop processing role on condition) without ending play or stoping execution of other roles in a stack/ Ensure all roles in a stack get run even if one ends with failure

Related topics