So now that we have a small team working on Ansible at our company, situations have arisen on multiple occasions where a change gets merged to master, run on all the servers, and then someone has a not-up-to-date branch locally and runs the outdated role/play on a some server, overwriting the changes pushed to master. As someone who has never worked in a multi-person Ansible operation, I am interested to know what sort of work flow is used to prevent this situation? Of course we could spin up test servers for every little change we need to make on every feature branch, and never run anything on a server in use by our team, but that is not a particularly efficient, and seems very tedious. We have also used Jenkins/tower to run certain important jobs at regular intervals to enforce master, but those can only be run so frequently. Thanks in advance for any help!
Here is an example sequence of the situation:
haproxy basic auth password is setup for a server, and the password is stored in ansible.
a change to that password is pushed to master, and run on the haproxy server.
a developer working locally spins up a test server, and needs to add an entry to the haproxy server to be publicly adressable. This simultaneously overwrites the new password with the old password.
This is a coordination issue, something Ansible itself does not deal with.
This is also one of the reasons Ansible Tower (or AWX ) exist, they
centralize the automation, have RBAC, audit, workflows, etc
all the features a large shop needs to coordinate and organize their automation.
Of course there are other products, free and proprietary that can do
the same thing, Ansible in the end is a command line tool, which makes
it easy for other schedulers/job managers to use it.
We have some tasks, set with tags: always that check the status of the git clone that the playbooks are part of. If it is behind the origin it fails with an assert task with a message explaining the failure.
In some cases we may want to bypass this, so we have an extra var that can be set that will skip the check.
We don’t use any external roles, they are all “vendored” along with our playbooks, so that single check that the git repo is up to date solves our needs.
We had some problems in the past where we updated a config required for a new app release, someone ran a deploy on that same app with an old config, and it caused issues, which is what brought us to these checks.
I have also seen use of the git module itself for doing this, by doing a self clone with update: no and checking the before and after sha, but I’ve not attempted this personally.