restart service, check if service is ready to accept connection because it takes time to come up. Once we sure its listening on port then only move to next host. unless dont move because we can only afford to have one service down at a time.
is there any to short hand or ansible native way to handle this using ansible module.
name: Restart zookeeper followers
throttle: 1
any_errors_fatal: true
shell: |
systemctl restart {{zookeeper_service_name}}
timeout 22 sh -c ‘until nc localhost {{zookeeper_server_port}}; do sleep 1; done’
I have used throttle so that part is sorted. But i dont think wait_for works here for example.
task 1 restart. <— now in this task already he has restarted all hosts one by one
task 2 wait_for ← this will fail if port does not come up but no use because restart is triggered.
we just want to know if in one task it restarts and checks if fails aborts play thats it. Now we got the results but used shell module.
I don’t entirely understand your approach, constraints or end-to-end requirements here, but trying to read between the lines…
You have a cluster of zookeeper nodes (presumably 2n+1 so 3, 5 or more nodes)
You want to do a rolling restart of these nodes 1 at a time, wait for the node to come back up, check it’s functioning, and if that doesn’t work, fail the run
With your existing approach you can limit the restart of a service using throttle at the task level, but then don’t know how to handle failure in a subsequent task
You don’t think wait_for will work because you only throttle on the restart task
(Essentially you want your condition “has the service restarted successfully” to be in the task itself.)
Again some thoughts that might help you work through this…
Any reason you couldn’t just use serial at a playbook level? If so, what is that?
If you must throttle rather than serial, consider using it in a block along with a failed_when
Try and avoid using shell and use builtin constructs like service, it’ll save you longer term pain
Read through the links I posted earlier and explain what might stop you using the documented approach.
I think you’ve misunderstood what I suggested. (Or I’ve explained it poorly.)
If you use serial, you wouldn’t need a block necessarily as you’d be executing over the inventory hosts one-at-a-time.
If you insist on sticking with throttle, try it with a block in order to group your service restart and service availability check.
I strongly going and taking the time to read the rolling update example that’s already documented, understand it and then think about how to apply that to what you’re trying to achieve.
I tried serial and it works but my problem is, serial works in playbook so when i write import_playbook inside include_task: zookeeper.yaml it fails saying u cant import playbook inside task.
Now, How do i do it then??
ok so let me give you how i am running basically i have created role prometheus which you can find here in below my personal public repo. Role has its usual main.yml which includes tasks and i have created Restartandcheck.yml which i am unable to use because import_playbook error if i put in zookeeper.yml file
Hello Sameer,
my two cents here as i made a quick lookup to your repo.
I would suggest to refactor your repo to use roles.
You have three different playbooks referenced in main.yml, which are doing more or less the same job.
Create a role ‘enable prometheus’ which will be dynamic enough to make decision based on input variables (zookeeper, Kafka,…)
And one tiny role to restart the services(if needed).
Outcome: single playbook, one prometheus role, one service mgmt(restart) role, no DRY code(dont repeat yourself), re-usable.
Dne čtvrtek 9. listopadu 2023 v 17:29:28 UTC+1 uživatel Sameer Modak napsal:
Quick question on your pull request, possibly missing the obvious. I see you use loop_control to set the outer loop variable on the roles. My understanding is the the roles would be a different namespace for the loops, so not interfere with the {{ item }} for the control loop, so was this for control clarity, or am I missing something with a namespace conflict?
Hi Evan,
The loop_control part already came from Sameer, i just kept this part as i didnt want to bring another level of complexity.
But in general, i use loop_control pretty often, especially in some deeper structures and to enforce readability, i.e.: