Error handling using block/rescue

Hello guys !

I have the following role created:

  • name: Recover
    block:

  • name: Starts the first node of the cluster in bootstrap mode
    shell: /etc/init.d/mysql bootstrap-pxc
    when: inventory_hostname == groups.CD5525[0]
    become: yes
    register: return_out

  • name: Start the rest of nodes
    systemd:
    state: started
    name: mysql
    when: inventory_hostname != groups.CD5525[0]
    become: yes
    register: screen_out

  • name: Stop the bootstrapped node to restart it in normal state
    shell: service mysql bootstrap-stop
    when: inventory_hostname == groups.CD5525[0]
    become: yes
    register: screen_out

  • name: Start the first node in normal status
    systemd:
    state: started
    name: mysql
    when: inventory_hostname == groups.CD5525[0]
    become: yes
    register: screen_out

rescue:

  • name: Print when errors
    debug:
    msg: “Found an error, can not continue ! {{ansible_failed_task}}”

any_errors_fatal: true

I have three nodes.

I need to control the failure on the first node but the rescue section is not executed until I receive failures from all the nodes.

Went through the documentation and different posts but did not find a conclusion about how this works with more than one node or, perhaps, I’m doing something wrong.

Your comments are welcome, thank you.

This is the output:

TASK [CD-5525 : Starts the first node of the cluster in bootstrap mode] ************************************************************
skipping: [xxx.xx.xxx.xx]
skipping: [xxx.xx.xxx.xx]
fatal: [xxx.xx.xxx.xx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: true, “cmd”: “/etc/init.d/mysql bootstrap-pxc”, “delta”: “0:00:10.059969”, “end”: “2022-04-08 04:45:07.890197”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-08 04:44:57.830228”, “stderr”: “”, “stderr_lines”: , “stdout”: " * Bootstrapping Percona XtraDB Cluster database server mysqld\n * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).\n …fail!“, “stdout_lines”: [” * Bootstrapping Percona XtraDB Cluster database server mysqld", " * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).“, " …fail!”]}

TASK [CD-5525 : Start the rest of nodes] *******************************************************************************************
fatal: [xxx.xx.xxx.xx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: false, “msg”: “Unable to start service mysql: Job for mysql.service failed because the control process exited with error code.\nSee "systemctl status mysql.service" and "journalctl -xe" for details.\n”}
fatal: [xxx.xx.xxx.xx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: false, “msg”: “Unable to start service mysql: Job for mysql.service failed because the control process exited with error code.\nSee "systemctl status mysql.service" and "journalctl -xe" for details.\n”}

TASK [CD-5525 : Print when errors] *************************************************************************************************
ok: [xxx.xx.xxx.xx] => {

Hello again.

I’m still trying to understand this issue …
I played with the example that comes with the Ansible documentation:

  • name: Attempt and graceful roll back demo
    block:

  • debug:
    msg: ‘I execute normally’

  • name: i force a failure
    command: /bin/false
    when: inventory_hostname == groups.CD5525[0] <<<— I added this line to force the failure only in the first node

  • debug:
    msg: ‘I never execute, due to the above task failing, :-(’
    rescue:

  • debug:
    msg: ‘I caught an error’

  • name: i force a failure in middle of recovery! >:-)
    command: /bin/false

  • debug:
    msg: ‘I also never execute :-(’
    always:

  • debug:
    msg: “This always executes”

And this is the output:

ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK
BECOME password:

PLAY [Start MySql cluster databases] ***********************************************************************************************

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I execute normally”
}
ok: [xxx] => {
“msg”: “I execute normally”
}
ok: [xxx] => {
“msg”: “I execute normally”
}

TASK [CD-5525 : i force a failure] *************************************************************************************************
skipping: [xxx]
skipping: [xxx]
fatal: [xxxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.001826”, “end”: “2022-04-18 04:43:40.809305”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 04:43:40.807479”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxxx] => {
“msg”: “I never execute, due to the above task failing, :-(”
}
ok: [xxx] => {
“msg”: “I never execute, due to the above task failing, :-(”
}

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I caught an error”
}

TASK [CD-5525 : i force a failure in middle of recovery! >:-)] *********************************************************************
fatal: [xxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.001778”, “end”: “2022-04-18 04:43:44.077290”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 04:43:44.075512”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “This always executes”
}
ok: [xxx] => {
“msg”: “This always executes”
}
ok: [xxx] => {
“msg”: “This always executes”
}

PLAY RECAP *************************************************************************************************************************
xxx : ok=3 changed=0 unreachable=0 failed=1 skipped=0 rescued=1 ignored=0
xxx : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
xxx : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

What I understand is as soon as the first error occurs the rescue section should be executed but the example above shows this does not happen.
Anyone could explain me where the issue is please ?

Appreciate your help.
Javier

[…] but the example above shows this does not happen.
What, exactly, in the posted output leads you to that conclusion?
What did you expect instead?
It appears to me to be exactly the expected output - in spite of your having changed all three host names to “xxx”. (Which, frankly, didn’t make understanding your output any easier; maybe use “xxx”, “yyy”, and “zzz” next time? And, name your tasks.)

Hello.

Thx. for your reply.
Changed the IPs as suggested. Yes I see it easier.

What I see from the execution above is, after the ‘force a failure’ the debug task is also executed. I understood the rescue section should be executed just after the failure, instead of any other task.
Perhaps I misunderstood this part.

ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK
BECOME password:

PLAY [Start MySql cluster databases] ***********************************************************************************************

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I execute normally”
}
ok: [yyy] => {
“msg”: “I execute normally”
}
ok: [zzz] => {
“msg”: “I execute normally”
}

TASK [CD-5525 : i force a failure] *************************************************************************************************
skipping: [yyy]
skipping: [zzz]
fatal: [xxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python3”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.002018”, “end”: “2022-04-18 06:41:39.602367”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 06:41:39.600349”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [yyy] => {
“msg”: “I never execute, due to the above task failing, :-(”
}
ok: [zzz] => {
“msg”: “I never execute, due to the above task failing, :-(”
}

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I caught an error”
}

TASK [CD-5525 : i force a failure in middle of recovery! >:-)] *********************************************************************
fatal: [xxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python3”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.002136”, “end”: “2022-04-18 06:41:43.100661”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 06:41:43.098525”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

PLAY RECAP *************************************************************************************************************************
xxx : ok=2 changed=0 unreachable=0 failed=1 skipped=0 rescued=1 ignored=0
zzz : ok=2 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
yyy : ok=2 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

In my example, if the first task (Starts the fist node …) fails, should it continue with the second task (Start the rest of nodes) instead of going to the rescue block ?

block:

  • name: Starts the first node of the cluster in bootstrap mode
    shell: /etc/init.d/mysql bootstrap-pxc
    when: inventory_hostname == groups.CD5525[0]
    become: yes
    register: return_out

  • name: Start the rest of nodes
    systemd:
    state: started
    name: mysql
    when: inventory_hostname != groups.CD5525[0]
    become: yes
    register: screen_out

Thank you
Javier

You may be reading it as you would a single threaded program, but it would be closer to a program with a thread per host. Your command line:

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK

runs (apparently) on three hosts: xxx, yyy, and zzz. When it’s all over, the tasks that get executed would be exactly the same if you had run these three commands:

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK --limit=xxx

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK --limit=yyy

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK --limit=zzz

The order of each task on each node would be different of course, but the end result is the same.

If the first task in your example (“Starts the first node of the cluster in bootstrap mode”) failed on one node, it would not affect the running of any tasks on other nodes. This is true regardless of these tasks being in a block. In fact, depending on the strategy you’re running under (see https://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html) the other nodes could complete the entire block before the first node even starts.

Hello.

It seems clear now.

Appreciate your reply.

Regards
Javier