Patching cluster servers

Hello everyone!

I am using Ansible to patch Linux and Windows servers. It is working well to patch servers that are not part of a cluster. But I can’t make it work to patch clusters.

Those are my requirements:

  • Start patching simultaneously all servers that are not part of any cluster.
  • Start patching N servers on all clusters simultaneously and patch all remaining servers on each cluster when the first N servers finish patching successfully.

I’ve tried to accomplish this grouping hosts by cluster. I used ‘single’ to group all servers not part of any cluseter. I’ve used multiple ansible.builtin.import_playbook on a main play. One import would handle ‘single’ servers using free strategy. I’ve also added one extra import for each cluster and for them strategy would be linear with N and 100% serial, to patch N servers simultaneously first and after patching those servers is done successfully the remaining would be patched simultaneously.

Problem is I can’t setup ansible.builtin.import_playbook to run asynchronously. It will wait all single server execution before starting running cluster imports.

Is there a way to accomplish this with Ansible?

Does anyone have any other solution to patch servers on multiple clusters using Ansible?

Thank you!

so lets imagine there is three groups based on your description:

  • no_cluster
  • A_cluster
  • B_cluster

So you want no_cluster and A_cluster to run at the same time, you would just do this->

- name: wait for all nodes to have SSH reachability
  hosts: "no_cluster:A_cluster"
  tasks:
    blahblahblah

then immediately below this play, you just have a 2nd play:

- name: wait for all nodes to have SSH reachability
  hosts: "B_cluster"
  tasks:
    blahblahblah

the default behavior will be to run each play in parallel, so this “should” get you what you want behavior wise in the simplest “ansible”-way I can think of.

There is also ways to control the strategy in Ansible… Controlling playbook execution: strategies and more — Ansible Documentation

but I don’t think you need to do this if I understand your use-case. You can also use async… but I highly recommend against this.

Finally… I am trying to solve your problem with cli-Ansible but creating parallel automation jobs is something that automation controller (product part of Ansible Automation Platform) or the upstream project AWX can do really well. You basically would use a workflow here.

1 Like

Thank you for your message Sean.

Adding 2 groups on the same include/task list would make they all use the same strategy, right?

By doing that I will not be able to use free strategy for no_cluster and linear to A_cluster.

I have a role with tasks to make checks, create snapshots, do the patching and post logs.

I think to achieve what I need I should start 3 calls to the same role simutaneously:

  • no_cluster with free strategy
  • A_cluster with linear strategy considering only hosts of this group
  • B_cluster with linear strategy considering only hosts of this group

This way patching of all groups would start simultaneously but no_cluster will patch all at the same while A_cluster and B_cluster will patch N servers simultaneously and all other servers after the first set is done.

I don’t think you need the strategy at all, the default strategy is to run in parallel, I probably shouldn’t have linked this at all because in your specific situation it is not needed.

I also think you mis-understood my A_cluster and B_cluster. I didn’t intend to show two different clusters, but part A and part B. Does that make sense?

If you have 3 groups of servers, you have “servers that don’t belong to any cluster”, so they can be patched simultaneously with “half” of the servers that are in the cluster. This is what I “think” you want. You would do what I showed above, just understand that A_cluster is just part A of your cluster. I think names are hard…

Hello Sean,

Yes, this is what I am trying to do:

If you have 3 groups of servers, you have “servers that don’t belong to any cluster”, so they can be patched simultaneously with “half” of the servers that are in the cluster. This is what I “think” you want. You would do what I showed above, just understand that A_cluster is just part A of your cluster. I think names are hard…

Only thing is there are multiple clusters to be patched, and I need the installation on each cluster to stop if there are any error on servers of first batch ot the respective cluster.

To achieve that I think it would be easier to have one linear strategy execution for each cluster. I would like to achieve that with a single Ansible run, but I don’t think it is possible to have like I described: 1 free execution and N linear executions, all starting at the same time.

When using subsequent play/task calls like you suggested, the servers on the second batch will need to know all servers that are part of the same cluster and check patching results of those server to throw execption if there errors on any of them.

so imagine you have 3 clusters, each with 4 servers each (just for illustration purposes) and 2 servers that belong to no clusters. Look at the below diagram:

The groups defined above are in the INI format, but you can use YAML as well if you prefer.

To run simultaneously in parallel, all the A groups AND no cluster servers it would simply be an Ansible Playbook Play that defines all those groups, THEN it would do the B_group in the second play->

- name: run automation for nocluster and all A_group servers
  hosts: "nocluster:A_group"
  tasks:
    blahblahblah
- name: run automation for all B_group servers
  hosts: "B_group"
  tasks:
    blahblahblah

Everything in group_A would finish before it started group_B given the above Play stanzas

Does that make sense and help?

3 Likes

Yes, that makes sense, and it almost works.

Only problem is it doesn’t address errors patching cluster servers.

Considering your example, let’s say Cluster 1 is a DB server cluster, we would like to check patching result on each server before moving to second set of servers because if installing new patches break something we still have working servers and can restore backups of broken servers to restore the whol cluster.

Having two separate plays make servers on them unrelated, right?

Correct. If an individual server fails, the automation will stop for that host alone. Ansible would need more information to know how hosts are inter-related.

You could check if whatever group/server ran, then you could run something else, but this would require multiple tasks in your Ansible Playbooks can could be quite hard to understand and quickly become “unAnsibley”, which is to say unwieldy. I would highly recommend if you need this sorta complexity to consider the community project AWX (or our product Ansible Automation Platform) and use the workflow feature. This allows you to quickly drag and drop a playbook but you can decide if the next “job” runs if the previous job runs: 23. Workflows — Automation Controller User Guide v4.4

Look at the screenshot from the documentation:


Each of these rectangles is the equivalent of an Ansible Playbook. A red line means “run this job only if the previous job failed”, a green line means “run this job only if the previous job passed” and a blue line means “always run this job”.

This way you can drag and drop, a very simple playbook, without too much complexity, and then make judgement calls that may get increasingly complex. If the “database cluster group A” fails, then run this job, “if it passes”, then run this job.

Does that make sense? Otherwise your playbook is going to get really jumbly really quick with all the corner cases of what server needs to be patched before some other server.

If you can only use command-line Ansible, if its just a handful of servers that have odd requirements, you can just use set_fact or register the result of a task and then have a subsequent task only run if that previous one passed.

1 Like

Hi @horner! It looks like the post might be solved - could you check to see if the response by @IPvSean worked for you?

If so, it would be super helpful if you could click the :heavy_check_mark: on their post to accept the solution - it recognises the input of others, helps our volunteers find new issues to answer, and keeps the forum nice and tidy.

Thanks!
(this is template reply, do feel free to reply if I’ve misunderstood the situation!)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.