We currently have three datacenters, and they’re all three basically the same with some very minor differences. Up until now, what I’ve done is created a playbook for each, and a hosts file for each datacenter. This was primarily because the ethernet interfaces slightly differed and we didn’t want to have to remember to provide forgettable variables on the command line.
Now, we’re finally migrating our systems where all datacenters will be exactly the same, so we can use the same playbook. To keep it simple, lets say that each datacenter has one load balancer, and two web servers.
DC1
[webservers]
1.1.1.20
1.1.1.30
[lbservers]
1.1.1.10
DC2
[webservers]
1.1.2.20
1.1.2.30
[lbservers]
1.1.2.10
DC3
[webservers]
1.1.3.20
1.1.3.30
[lbservers]
1.1.3.10
Typically, in our previous setup where each datacenter had its own hosts file and playbook, we’d do the following to deploy all the installation tasks:
Since our datacenters will basically be the same, and the playbook can now be the same, I understand that I could use just one playbook, and change out the hosts file and it’ll work to isolate deployments to that datacenter. The issue, and why I’m writing this is that I’d also be able to, for example, deploy our website to all [webservers] regardless of which datacenter it’s in—and it has to perform a few tasks on the respective load balancer when it does it (taking it out and adding it back).
So, what is the recommended way to have a multi-datacenter hosts file so that we can work with a single datacenter, or all of them ideally using the same hosts file and the same playbook?
Thank you guys in advance for any advice you can provide.
If I didn’t limit, would it run the tasks for each lbservers for each webserver, or would it properly know that dc1’s webservers should only interact with dc1’s load balancer?
I am trying to make it so that I could run a playbook for a specific datacenter, or I could run it for all at the same time (like a code deploy). The sticking point seems to be that the lbserver in dc1 should only know about the webservers in dc1.
I think I understand what you mean. It would help to see the
playbook(s). Perhaps you need an inventory variable for the
load balancers that define what group they are in charge of?
I don't think ansible is capable of automatically inferring it on its own.
You might be able to write some logic around it in your
playbook/template as well. Declaring it might be faster and easier to
understand.
They’re pretty complicated, and we have several depending on the scenario. The most common that we run though, and simplest, is a code deploy, and it looks like this:
gather facts from monitoring nodes for iptables rules
hosts: webservers
serial: 1
accelerate: true
These are the tasks to run before applying updates:
pre_tasks:
name: disable the staging server in haproxy
shell: echo “disable server web_staging/{{ ansible_hostname }}” | socat stdio /var/lib/haproxy/stats
delegate_to: “{{ item }}”
with_items: groups.lbservers
name: disable the production server in haproxy
shell: echo “disable server web_production/{{ ansible_hostname }}” | socat stdio /var/lib/haproxy/stats
delegate_to: “{{ item }}”
with_items: groups.lbservers
roles:
deploy_web
These tasks run after the roles:
post_tasks:
name: Enable the staging server in haproxy
shell: echo “enable server web_staging/{{ ansible_hostname }}” | socat stdio /var/lib/haproxy/stats
delegate_to: “{{ item }}”
with_items: groups.lbservers
name: Enable the production server in haproxy
shell: echo “enable server web_production/{{ ansible_hostname }}” | socat stdio /var/lib/haproxy/stats
delegate_to: “{{ item }}”
with_items: groups.lbservers
In short, it removes the web server from HA Proxy, then does a code update (deploy_web role is basically a git pull), then it adds the server back to HA Proxy.
I have not considered an inventory variable. I’ll look into it to see if it can be done in a way that makes sense to me.
I’m starting to think the cleanest way might just be three hosts files, and then just running the playbook three times.
So based on what I was saying before.. Make a group_var for the
webservers that tell them what lbserver group owns them. Maybe call
it {{ lb_server_group }}
I’ll play with doing it that way, and I’ll probably also do the multiple hosts file. Whatever ends up working the best and is least likely to allow errors to happen is what I’ll go with.
We ended up going with multiple host files, one for each datacenter, each containing only the relevant information for that datacenters information. The added benefit of going this route is that we can deploy to a single datacenter, verify that all is well, and then roll out to the others (like upgrading or installing new packages). To get around the annoyance for certain playbooks that I know can always be run in all datacenters—like most of our code deploys—on our ansible server we have alias commands in our bash profile that can run the ansible command on all datacenters.
I wish there was a better way, but I just couldn’t find one.
Surprising !
There is still better way available to do code deploy using Ansible ..
Task should looks like
1) remove webserver from lb - use module haproxy (wait till connection drain)
2) silent monitoring for web health check - use nagios/zabbix or any another monitoring module
3) deploy code
4) do manual web health check or use nagios socket
5) add back in lb
6) unshut web health check in monitoring
If step 4 fails revert to previous version of deployment and continue the steps 5 and 6
We had the multi dc and multi env setup where 5 dc and 3 env used
For that we have used group_vars to keep all our dc specific folders and the env as yml var file, just to use the extra-vars dc=cool env=prod solved the issues.
In these env var files we have added all our info, just like if we have prod env running at cool dc