Really can't do this in Ansible?

Daniel_Mocsari · July 17, 2025, 6:27pm

Hello!

I’m seeking a way to write a playbook that fails securely if any issue arises on any host. I know these parameters:

max_fail_percentage: 0
any_errors_fatal: true

The challenge is ensuring logging and fact-saving still occur after a failure. Despite trying multiple solutions and checking documentation, it seems Ansible can’t achieve this. Is that true? The code shows my intent, but block/rescue/always ignore the parameters. I need a solution that stops for any error (critical for us) while allowing logging and fact-saving.

- block:
       - ansible.builtin.import_role:
          name: [random name here]
      rescue:
        - name: MAIN_PRINT_FAILED_TASK
          local_action: 
            module: ansible.builtin.fail
            args:
              msg: "[LOG] : [{{ lookup('pipe', 'date +%Y-%m-%dT%H:%M.%S') }}] : [FAILURE] : FAILED_TASK: {{ l_playbook_baseName }}/{{ ansible_failed_task.name }}. Exiting ..."
          run_once: True
          tags:
            - always
      always:
        - name: MAIN_SAVE_FACTS
          ansible.builtin.include_tasks:
            file: common_tasks/handle_facts.yml
            apply:
              tags:
                - always
        - name: "MAIN_END_PLAY"
          local_action: 
            module: ansible.builtin.debug
            args:          
              msg: "[LOG] : [{{ lookup('pipe', 'date +%Y-%m-%dT%H:%M.%S') }}] :  [ INFO ] End execution of {{ l_playbook_baseName }}"
          run_once: True 
          tags:
            - always

bcoca · July 17, 2025, 6:47pm

this is ‘possible’, several ways, one is the approach you took with block/rescue/always, another is to use a callback to save data. But since you don’t show the output nor explain what parameters are ignored, i cannot even guess what your issues are. But looking at your example (and assuming the wrong indentation is a paste issue), I can point out a few things:

fail/debug and other actions don’t use a connection, do NOT require local_action as they always execute on the controller. Check their ‘attributes’ in documentation.
There is no need to use args: with most actions, just use the options directly.
I’m not sure what you are trying to accomplish with the fail task in the rescue block, it seems what you really want is to log the failure, this is better done via a callback. Some like mail already record failures and send to a specified address, the same can be done to disk/system log/etc.

Daniel_Mocsari · July 17, 2025, 7:27pm

Thank you for the hints. I’m absolutely far for being an Ansible expert.
We have several roles, like the example what is imported and we cannot afford to fail any of our command on one host and keep the playbook run on the remaining ones. Therefore I’m looking for a solution which fail the entire playbook as soon as any error occurs, but saves the facts and push the error messages ( as you can see from my code ).

I’ve only read some pieces of information, that callbacks can be the solution, but I’m an Exadata architect, I have no skills on python. Can u tell me more about this possibility?

Also about " his is better done via a callback. Some like mail already record failures and send to a specified address, the same can be done to disk/system log/etc."

I share the output. I’m testing if a given RHEL repo is enabled or not ( this is just for testing purposes ). I set “enabled=0” on one of our hosts to see, if I can fail the entire playbook when an error occurs only on one host. As you can see from the output, the role keeps going to the next module ( and then finishes, because that’s the end of the test role ).

Here comes the content of the imported role:

- name: Ksplice_actions
  become: yes
  shell: dnf repoinfo ol8_x86_64_ksplice -v | grep -i ^Repo-status | grep enabled
  register: var_repo_status
  failed_when: var_repo_status.rc != 0

- name: Results_of_Ksplice_action
  debug: msg="{{ var_repo_status.stdout_lines }}"

- name: Get_Ksplice_update_version
  shell: dnf --showduplicates list uptrack-updates-$(uname -r) | sort --reverse | head -1 | awk '{print $1}'
  register: var_ksplice_version
    
- name: Show_Ksplice_version
  debug: msg="Ksplice version is {{ var_ksplice_version.stdout }}"

Output:
Results_of_Ksplice_action showing only 1 enabled ->expected, that is OK
Get_Ksplice_update_version → That shouldn’t run, by this time Ansible should have failed, because on one host, there is no enbaled repo.

TASK [OCI_POST_PATCH_VMCLUSTER_OS_DOWNTIME : Ksplice_actions] ****************************************************************************************************************************************************
Thursday 17 July 2025  21:19:58 +0200 (0:00:00.101)       0:00:00.252 *********
fatal: [44.144.212.92]: FAILED! => {"changed": true, "cmd": "dnf repoinfo ol8_x86_64_ksplice -v | grep -i ^Repo-status | grep enabled", "delta": "0:00:03.269648", "end": "2025-07-17 21:20:01.987667", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2025-07-17 21:19:58.718019", "stderr": "Last metadata expiration check: 1:29:04 ago on Thu 17 Jul 2025 07:50:55 PM CEST.", "stderr_lines": ["Last metadata expiration check: 1:29:04 ago on Thu 17 Jul 2025 07:50:55 PM CEST."], "stdout": "", "stdout_lines": []}
changed: [44.144.212.35]

TASK [OCI_POST_PATCH_VMCLUSTER_OS_DOWNTIME : Results_of_Ksplice_action] ******************************************************************************************************************************************
Thursday 17 July 2025  21:20:03 +0200 (0:00:05.009)       0:00:05.261 *********
ok: [44.144.212.35] => {
    "msg": [
        "Repo-status        : enabled"
    ]
}

TASK [OCI_POST_PATCH_VMCLUSTER_OS_DOWNTIME : Get_Ksplice_update_version] *****************************************************************************************************************************************
Thursday 17 July 2025  21:20:03 +0200 (0:00:00.089)       0:00:05.351 *********
changed: [44.144.212.35]

TASK [OCI_POST_PATCH_VMCLUSTER_OS_DOWNTIME : Show_Ksplice_version] ***********************************************************************************************************************************************
Thursday 17 July 2025  21:20:08 +0200 (0:00:05.122)       0:00:10.473 *********
ok: [44.144.212.35] => {
    "msg": "Ksplice version is uptrack-updates-5.4.17-2136.330.7.5.el8uek.x86_64.noarch"
}

TASK [MAIN_PRINT_FAILED_TASK] ************************************************************************************************************************************************************************************
Thursday 17 July 2025  21:20:08 +0200 (0:00:00.239)       0:00:10.713 *********
fatal: [44.144.212.92 -> localhost]: FAILED! => {"changed": false, "msg": "[LOG] : [2025-07-17T21:20.08] : [FAILURE] : FAILED_TASK: OCI_POST_PATCH_VMCLUSTER_OS_DOWNTIME.yaml/Ksplice_actions. Exiting ..."}

TASK [MAIN_SAVE_FACTS] *******************************************************************************************************************************************************************************************
Thursday 17 July 2025  21:20:08 +0200 (0:00:00.153)       0:00:10.866 *********
included: /home/k8ra9vk/patch_exadata_ansible/common_tasks/handle_facts.yml for 44.144.212.92

bcoca · July 17, 2025, 7:51pm

But i see nothing that would force all hosts to fail, but guessing you have something else on the play objects that contain the role, only guessing, but I think you want to use serial: 1 or throttle: 1 and/or meta: end_play. Also look into AAP/awx as that has it’s own logging/handling of errors and play data, I think that suits your use case.

Daniel_Mocsari · July 17, 2025, 8:11pm

You are right, I tried at least 4-5 playbooks to force the fail but didn’t work. I pasted the original playbook, so you can see what is the intention. I will paste another playbook tomorrow, so u can see the modofication, which didn’t work.

Played with serial and meta end play, didn’t work.

Daniel_Mocsari · July 18, 2025, 8:55am

Here is another snippet from one of my test. This one also didn’t work, the playbook kept running.
Commented out some other ideas, those one didn’t work either. ( don’t bother with impending, copy paste doesn’t really work everytime )

 - name: Import_role
      block:
        - name: Import critical role
          ansible.builtin.import_role:
            name: OCI_POST_PATCH_VMCLUSTER_OS_DOWNTIME
          #ignore_errors: true
          #register: critical_result

        #- name: Check if role failed
        #  ansible.builtin.set_fact:
        #    critical_failed: true
        #  when: critical_result.failed | default(false)
      rescue:
        - name: Set fail flag manually
          ansible.builtin.set_fact:
            critical_failed: true
         # when: critical_result.failed | default(false)
     
    - name: Fail_fast
      ansible.builtin.fail:
        msg: "Critical role failed, exiting playbook"
      when: critical_failed | default(false)
      run_once: true
      delegate_to: localhost
    
    #- name: End play if critical role failed 
    #  ansible.builtin.meta: end_play
    #  when: critical_failed
      

    - name: Save facts and print end message
      block:
        - name: Save facts
          ansible.builtin.include_tasks: common_tasks/handle_facts.yml

        - name: MAIN_PRINT_FAILED_TASK
          ansible.builtin.debug:
            msg: "[LOG] : [{{ lookup('pipe', 'date +%Y-%m-%dT%H:%M.%S') }}] : [ INFO ] End execution of  {{ l_playbook_baseName }}"

        - name: Print end message
          ansible.builtin.debug:
            msg: "[LOG] : [{{ lookup('pipe', 'date +%Y-%m-%dT%H:%M.%S') }}] : [FAILURE] : FAILED_TASK: {{ l_playbook_baseName }}/{{ ansible_failed_task.name }}. Exiting ..."
          #delegate_to: localhost

Daniel_Mocsari · July 18, 2025, 8:58am

The only solution I have is this one. This is working well, braking the run everytime a problem occurs on any host, but I’d like to avoid to use it everytime in all my playbooks. I’m looking for a nicer and cleaner code. (pls see commented section with iteration and loop)


- name: Ksplice_actions
  become: yes
#  block:
#    - name: Check_Ksplice_Repo
  shell: dnf repoinfo ol8_x86_64_ksplice -v | grep -i ^Repo-status | grep enabled
  register: var_repo_status
  failed_when: var_repo_status.rc != 0
#  changed_when: false
#     below 3 parameters are used to stop the entire playbook execution if the repo is not enabled
      #delegate_to: "{{ item }}"
      #run_once: true
      #loop: "{{ ansible_play_hosts }}"

shertel · July 21, 2025, 7:42pm

That would work with the free strategy, since the rescue section runs ASAP instead of in lockstep with other hosts.

It won’t work with the linear strategy, unless you set any_errors_fatal: True, since the rescue section runs in lockstep. This seems to do what you need:

- hosts: all
  gather_facts: no
  any_errors_fatal: True
  tasks:
    - block:
      - import_role:
          name: role_fails_sometimes
      always:
      - name: Do some logging regardless of failure
        run_once: True
        delegate_to: localhost
        debug: msg="TODO"

Keep in mind that rescue will clear host errors, which is why I used always instead in the example above.

You can also check for failed/unreachable hosts more directly, for example when: ansible_play_hosts != ansible_play_hosts_all.

Daniel_Mocsari · July 23, 2025, 6:53am

Thanks for your help!
I’ve solved the issue using a callback, but I’ll give your solution a try as well

system · August 22, 2025, 6:54am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ansible didn't stop execution upon a task failure Ansible Project	5	35	December 3, 2015
I am missing something about playbook orchestration features? Ansible Project	5	7	September 18, 2012
hard failure or 'checkpoint' type task Ansible Project	2	13	April 8, 2013
Forcing a stop on failure Ansible Project	2	23	October 22, 2015
Exit Playbook When Any Play Fails Ansible Developer	2	43	August 11, 2016

Really can't do this in Ansible?

Related topics