I’ve hit an unexpected error handling failure. This looks like a reasonable severe bug to me, but I wanted to get a second set of eyes on it before opening a bug report. Considering the following reproducer:
- hosts: localhost
gather_facts: false
tasks:
- name: Create test_role
file:
path: roles/test_role/tasks
state: directory
- block:
- name: Include a role that doesn't exist
ansible.builtin.include_role:
name: missing_role
rescue:
- name: Display status message
ansible.builtin.debug:
msg: In rescue block for test 1
- block:
- name: Include a task file that doesn't exist
ansible.builtin.include_role:
name: test_role
tasks_from: missing_task_list
rescue:
- name: Display status message
ansible.builtin.debug:
msg: In rescue block for test 2
The first test, in which we attempt to include a role that doesn’t exist, fails as expected: the failure is handled by the rescue block:
[ERROR]: the role 'missing_role' was not found in /home/lars/tmp/ansible-bug/roles:/home/lars/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/home/lars/tmp/ansible-bug
Origin: /home/lars/tmp/ansible-bug/playbook.yaml:12:19
10 - name: Include a role that doesn't exist
11 ansible.builtin.include_role:
12 name: missing_role
^ column 19
localhost | FAILED! => {
"changed": false,
"reason": "the role 'missing_role' was not found in /home/lars/tmp/ansible-bug/roles:/home/lars/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/home/lars/tmp/ansible-bug"
}
localhost | SUCCESS => {
"msg": "In rescue block for test 1"
}
However, the second test, in the named role exists but the specified task list in the role is missing, bypasses the rescue block and causes the playbook to exit immediately:
localhost | SUCCESS => {
"changed": false,
"include_args": {
"name": "test_role",
"tasks_from": "missing_task_list"
}
}
[ERROR]: Could not find specified file in role: tasks/missing_task_list
There doesn’t appear to be any way to properly handle this error. I wasn’t able to find a bug that looked like this after searching through the open issues.
If you explicitly state a tasks file that should be used, that tasks file must exist. Although it can exist and be empty. This is an intended failure, and not a bug.
This seems like a specious answer: the behavior is clearly different from that for roles, to which presumably the same logic behaves.
We’re hitting this because we’re using roles with a predefined set of task files as an “api” to interact with some other services. Different drivers can be plugged in as long as they include the appropriate task files. If someone writes a driver that doesn’t implement the expected file, we would like to be able to recover the error and clean up properly; the fact that Ansible simply aborts here without respecting the rescue block is a real problem, since it means we’re unable to clean up resources that were created earlier in the playbook.
A role does not actually require task files, and as a result, there is no single thing that makes it identifiable as a role.
It could just have vars, but it may not even be main.yml, or it could just have handlers. The overhead of all of the additional file stats to determine “could this possibly be a role”, have always been deemed to expensive. As such, just an existing role dir with nothing inside is actually a valid role. It just loads, because a role, as described above, requires nothing to make it a role. This is I suppose just a historical artifact at this point.
But if you explicitly say, use this task file, then you’ve indicated that there is something specific that needs to be loaded, and therefore that explicit configuration is required.
If the role was completely missing, then you would get an error, it requires at a minimum that the role name exist as a directory in the path that roles are loaded from.
You answered @felixfontein 's literal question, but that question slightly misses the mark. The answer boils down to, “it’s an artifact of implementation.” We kind of knew that already.
A broader but I think fair interpretation of the question would be: What’s the design goal for rescueable errors? We’ve got an interesting pair of cases here which together look very much like inconsistent behavior to the user. It feels like they should behave the same way.
This novel technique is not one I can honestly get behind. The only surprising part is that the first case is rescuable at all. Do undefined things, expect undefined results. I wouldn’t suggest spending much effort “fixing” either of these cases beyond maybe making the first case explicitly unrescuable. Sometimes “Don’t do that!” is a reasonable response; this could well be one of those cases.
My point, though, to the extent that I have one, is that knowing the inner workings of software so well can sometimes blind us to the genuine confusion of users. Understanding the deep-rooted reasons for certain bugs doesn’t make them not bugs.