Error handling using block/rescue

Fco_Javier_Lopez · April 8, 2022, 9:49am

Hello guys !

I have the following role created:

name: Recover
block:
name: Starts the first node of the cluster in bootstrap mode
shell: /etc/init.d/mysql bootstrap-pxc
when: inventory_hostname == groups.CD5525[0]
become: yes
register: return_out
name: Start the rest of nodes
systemd:
state: started
name: mysql
when: inventory_hostname != groups.CD5525[0]
become: yes
register: screen_out
name: Stop the bootstrapped node to restart it in normal state
shell: service mysql bootstrap-stop
when: inventory_hostname == groups.CD5525[0]
become: yes
register: screen_out
name: Start the first node in normal status
systemd:
state: started
name: mysql
when: inventory_hostname == groups.CD5525[0]
become: yes
register: screen_out

rescue:

name: Print when errors
debug:
msg: “Found an error, can not continue ! {{ansible_failed_task}}”

any_errors_fatal: true

I have three nodes.

I need to control the failure on the first node but the rescue section is not executed until I receive failures from all the nodes.

Went through the documentation and different posts but did not find a conclusion about how this works with more than one node or, perhaps, I’m doing something wrong.

Your comments are welcome, thank you.

This is the output:

TASK [CD-5525 : Starts the first node of the cluster in bootstrap mode] ************************************************************
skipping: [xxx.xx.xxx.xx]
skipping: [xxx.xx.xxx.xx]
fatal: [xxx.xx.xxx.xx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: true, “cmd”: “/etc/init.d/mysql bootstrap-pxc”, “delta”: “0:00:10.059969”, “end”: “2022-04-08 04:45:07.890197”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-08 04:44:57.830228”, “stderr”: “”, “stderr_lines”: , “stdout”: " * Bootstrapping Percona XtraDB Cluster database server mysqld\n * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).\n …fail!“, “stdout_lines”: [” * Bootstrapping Percona XtraDB Cluster database server mysqld", " * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).“, " …fail!”]}

TASK [CD-5525 : Start the rest of nodes] *******************************************************************************************
fatal: [xxx.xx.xxx.xx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: false, “msg”: “Unable to start service mysql: Job for mysql.service failed because the control process exited with error code.\nSee "systemctl status mysql.service" and "journalctl -xe" for details.\n”}
fatal: [xxx.xx.xxx.xx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: false, “msg”: “Unable to start service mysql: Job for mysql.service failed because the control process exited with error code.\nSee "systemctl status mysql.service" and "journalctl -xe" for details.\n”}

TASK [CD-5525 : Print when errors] *************************************************************************************************
ok: [xxx.xx.xxx.xx] => {
…
…

Fco_Javier_Lopez · April 18, 2022, 9:01am

Hello again.

I’m still trying to understand this issue …
I played with the example that comes with the Ansible documentation:

name: Attempt and graceful roll back demo
block:
debug:
msg: ‘I execute normally’
name: i force a failure
command: /bin/false
when: inventory_hostname == groups.CD5525[0] <<<— I added this line to force the failure only in the first node
debug:
msg: ‘I never execute, due to the above task failing, :-(’
rescue:
debug:
msg: ‘I caught an error’
name: i force a failure in middle of recovery! >:-)
command: /bin/false
debug:
msg: ‘I also never execute :-(’
always:
debug:
msg: “This always executes”

And this is the output:

ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK
BECOME password:

PLAY [Start MySql cluster databases] ***********************************************************************************************

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I execute normally”
}
ok: [xxx] => {
“msg”: “I execute normally”
}
ok: [xxx] => {
“msg”: “I execute normally”
}

TASK [CD-5525 : i force a failure] *************************************************************************************************
skipping: [xxx]
skipping: [xxx]
fatal: [xxxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.001826”, “end”: “2022-04-18 04:43:40.809305”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 04:43:40.807479”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxxx] => {
“msg”: “I never execute, due to the above task failing, :-(”
}
ok: [xxx] => {
“msg”: “I never execute, due to the above task failing, :-(”
}

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I caught an error”
}

TASK [CD-5525 : i force a failure in middle of recovery! >:-)] *********************************************************************
fatal: [xxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.001778”, “end”: “2022-04-18 04:43:44.077290”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 04:43:44.075512”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “This always executes”
}
ok: [xxx] => {
“msg”: “This always executes”
}
ok: [xxx] => {
“msg”: “This always executes”
}

PLAY RECAP *************************************************************************************************************************
xxx : ok=3 changed=0 unreachable=0 failed=1 skipped=0 rescued=1 ignored=0
xxx : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
xxx : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

What I understand is as soon as the first error occurs the rescue section should be executed but the example above shows this does not happen.
Anyone could explain me where the issue is please ?

Appreciate your help.
Javier

utoddl · April 18, 2022, 10:14am

[…] but the example above shows this does not happen.
What, exactly, in the posted output leads you to that conclusion?
What did you expect instead?
It appears to me to be exactly the expected output - in spite of your having changed all three host names to “xxx”. (Which, frankly, didn’t make understanding your output any easier; maybe use “xxx”, “yyy”, and “zzz” next time? And, name your tasks.)

Fco_Javier_Lopez · April 18, 2022, 10:58am

Hello.

Thx. for your reply.
Changed the IPs as suggested. Yes I see it easier.

What I see from the execution above is, after the ‘force a failure’ the debug task is also executed. I understood the rescue section should be executed just after the failure, instead of any other task.
Perhaps I misunderstood this part.

ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK
BECOME password:

PLAY [Start MySql cluster databases] ***********************************************************************************************

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I execute normally”
}
ok: [yyy] => {
“msg”: “I execute normally”
}
ok: [zzz] => {
“msg”: “I execute normally”
}

TASK [CD-5525 : i force a failure] *************************************************************************************************
skipping: [yyy]
skipping: [zzz]
fatal: [xxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python3”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.002018”, “end”: “2022-04-18 06:41:39.602367”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 06:41:39.600349”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [yyy] => {
“msg”: “I never execute, due to the above task failing, :-(”
}
ok: [zzz] => {
“msg”: “I never execute, due to the above task failing, :-(”
}

TASK [CD-5525 : debug] *************************************************************************************************************
ok: [xxx] => {
“msg”: “I caught an error”
}

TASK [CD-5525 : i force a failure in middle of recovery! >:-)] *********************************************************************
fatal: [xxx]: FAILED! => {“ansible_facts”: {“discovered_interpreter_python”: “/usr/bin/python3”}, “changed”: true, “cmd”: [“/bin/false”], “delta”: “0:00:00.002136”, “end”: “2022-04-18 06:41:43.100661”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2022-04-18 06:41:43.098525”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }

PLAY RECAP *************************************************************************************************************************
xxx : ok=2 changed=0 unreachable=0 failed=1 skipped=0 rescued=1 ignored=0
zzz : ok=2 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
yyy : ok=2 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

In my example, if the first task (Starts the fist node …) fails, should it continue with the second task (Start the rest of nodes) instead of going to the rescue block ?

block:

name: Starts the first node of the cluster in bootstrap mode
shell: /etc/init.d/mysql bootstrap-pxc
when: inventory_hostname == groups.CD5525[0]
become: yes
register: return_out
name: Start the rest of nodes
systemd:
state: started
name: mysql
when: inventory_hostname != groups.CD5525[0]
become: yes
register: screen_out

Thank you
Javier

utoddl · April 18, 2022, 2:10pm

You may be reading it as you would a single threaded program, but it would be closer to a program with a thread per host. Your command line:

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK

runs (apparently) on three hosts: xxx, yyy, and zzz. When it’s all over, the tasks that get executed would be exactly the same if you had run these three commands:

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK --limit=xxx

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK --limit=yyy

$ ansible-playbook -i ./environments/CD-5525/hosts.yml main_CD-5525.yml -bK --limit=zzz

The order of each task on each node would be different of course, but the end result is the same.

If the first task in your example (“Starts the first node of the cluster in bootstrap mode”) failed on one node, it would not affect the running of any tasks on other nodes. This is true regardless of these tasks being in a block. In fact, depending on the strategy you’re running under (see https://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html) the other nodes could complete the entire block before the first node even starts.

Fco_Javier_Lopez · April 18, 2022, 3:05pm

Hello.

It seems clear now.

Appreciate your reply.

Regards
Javier

Topic		Replies	Views
any_errors_fatal is not working as expected with block/rescue Ansible Project	0	16	November 22, 2018
Changing actions in case a node fails Ansible Project	6	4	January 21, 2016
Understanding block and rescue Ansible Project	2	25	August 18, 2022
rescue for target failed connection question using localhost Ansible Project	3	12	August 21, 2023
Block/Rescue/Always Exit Code Ansible Project	0	15	July 10, 2017

Error handling using block/rescue

Related topics