restart service, check if port is ready to accept and then move to next host

restart service, check if service is ready to accept connection because it takes time to come up. Once we sure its listening on port then only move to next host. unless dont move because we can only afford to have one service down at a time.

is there any to short hand or ansible native way to handle this using ansible module.

code:

name: Restart zookeeper followers

throttle: 1

any_errors_fatal: true

shell: |

systemctl restart {{zookeeper_service_name}}

timeout 22 sh -c ‘until nc localhost {{zookeeper_server_port}}; do sleep 1; done’

when: not zkmode.stdout_lines is search(‘leader’)

I’d suggest reading up on rolling updates using serial:

https://docs.ansible.com/ansible/latest/playbook_guide/guide_rolling_upgrade.html#the-rolling-upgrade
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-batch-size-with-serial

You can use wait_for or wait_for_connection to ensure service availability before continuing:

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_module.html
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_connection_module.html

Hello Will,

I have used throttle so that part is sorted. But i dont think wait_for works here for example.
task 1 restart. <— now in this task already he has restarted all hosts one by one
task 2 wait_for ← this will fail if port does not come up but no use because restart is triggered.

we just want to know if in one task it restarts and checks if fails aborts play thats it. Now we got the results but used shell module.

I don’t entirely understand your approach, constraints or end-to-end requirements here, but trying to read between the lines…

  1. You have a cluster of zookeeper nodes (presumably 2n+1 so 3, 5 or more nodes)

  2. You want to do a rolling restart of these nodes 1 at a time, wait for the node to come back up, check it’s functioning, and if that doesn’t work, fail the run

  3. With your existing approach you can limit the restart of a service using throttle at the task level, but then don’t know how to handle failure in a subsequent task

  4. You don’t think wait_for will work because you only throttle on the restart task

(Essentially you want your condition “has the service restarted successfully” to be in the task itself.)

Again some thoughts that might help you work through this…

  1. Any reason you couldn’t just use serial at a playbook level? If so, what is that?

  2. If you must throttle rather than serial, consider using it in a block along with a failed_when

  3. Try and avoid using shell and use builtin constructs like service, it’ll save you longer term pain

Read through the links I posted earlier and explain what might stop you using the documented approach.

This post from Vladimir on Superuser might be useful too: https://superuser.com/questions/1664197/ansible-keyword-throttle (loads of other 2n+1 rolling update/restart examples out there too: https://stackoverflow.com/questions/62378317/ansible-rolling-restart-multi-cluster-environment)

Edit: s/along with a failed_when/along with wait_for/

Let me try with block and serial and get back to you

Hello will,

i tried to do it with block and serial no it does not work say’s block cant have serial

tasks:

  • name: block check

block:

  • name: run this shell

shell: ‘systemctl restart “{{zookeeper_service_name}}”’

  • name: debug

debug:

msg: “running my task”

  • name: now run this task

shell: timeout -k 3 1m sh -c ‘until nc -zv localhost {{hostvars[inventory_hostname].zk_port}}; do sleep 1; done’

when:

  • not zkmode is search(‘leader’)

serial: 1

I think you’ve misunderstood what I suggested. (Or I’ve explained it poorly.)

If you use serial, you wouldn’t need a block necessarily as you’d be executing over the inventory hosts one-at-a-time.

If you insist on sticking with throttle, try it with a block in order to group your service restart and service availability check.

I strongly going and taking the time to read the rolling update example that’s already documented, understand it and then think about how to apply that to what you’re trying to achieve.

Ok my requirement is exactly the same.

https://stackoverflow.com/questions/64048208/run-ansible-tasks-on-hosts-one-by-one
EXactly the same.

list of taks needs to be run one by one on single host at a time

That’s correct; serial is not a task or block key word. It’s a playbook key word.

- name: One host at a time
  hosts: ducks_in_a_row
  serial: 1
  max_fail_percentage: 0
  tasks:
    - task1
    - task2
    - task3

Read up on serial and max_fail_percentage . Blocks don’t come into it.

Hello Todd,

I tried serial and it works but my problem is, serial works in playbook so when i write import_playbook inside include_task: zookeeper.yaml it fails saying u cant import playbook inside task.
Now, How do i do it then??

ok so let me give you how i am running basically i have created role prometheus which you can find here in below my personal public repo. Role has its usual main.yml which includes tasks and i have created Restartandcheck.yml which i am unable to use because import_playbook error if i put in zookeeper.yml file

https://github.com/sameergithub5/prometheusrole/tree/main/prometheus

Hello Sameer,
my two cents here as i made a quick lookup to your repo.
I would suggest to refactor your repo to use roles.
You have three different playbooks referenced in main.yml, which are doing more or less the same job.
Create a role ‘enable prometheus’ which will be dynamic enough to make decision based on input variables (zookeeper, Kafka,…)
And one tiny role to restart the services(if needed).
Outcome: single playbook, one prometheus role, one service mgmt(restart) role, no DRY code(dont repeat yourself), re-usable.

Dne čtvrtek 9. listopadu 2023 v 17:29:28 UTC+1 uživatel Sameer Modak napsal:

Thanks a lot Zdenek.

I got it now i have heard your comments and converted this to something closer .

https://github.com/sameergithub5/prometheusrole/tree/main/node_exporter_and_prometheus_jmx_exporter

Can you plz spot if there is a room for an improvement .

yea, getting better :slight_smile:
Have a look to diff of my fork, what could work:
https://github.com/sameergithub5/prometheusrole/pull/1

It is still pretty raw, but it contains the idea.

Dne neděle 19. listopadu 2023 v 8:54:59 UTC+1 uživatel Sameer Modak napsal:

Zdenek-
Quick question on your pull request, possibly missing the obvious. I see you use loop_control to set the outer loop variable on the roles. My understanding is the the roles would be a different namespace for the loops, so not interfere with the {{ item }} for the control loop, so was this for control clarity, or am I missing something with a namespace conflict?

Hi Evan,
The loop_control part already came from Sameer, i just kept this part as i didnt want to bring another level of complexity.
But in general, i use loop_control pretty often, especially in some deeper structures and to enforce readability, i.e.:

  • input vars structure

Regarding interference topic, this is how looping over the role without loop_var could look like. We all see its gonna be KABOOM :slight_smile:

  • main.yml