Linux Updates

Hey all,

I’m just looking at putting in Ansible and wanting to get some use cases in. The one thing I really want to do is get Linux updates via Ansible going. We have a few servers that need to have things killed/run before and after reboot (which, for some reason won’t work via rc). So basically I’m just wondering if it’s possible for Ansible to have a playbook that will:

  1. Execute script (which kills off processes)
  2. Run yum update
  3. Reboot
  4. Execute script (start processes)

Would be great to see any examples. Can’t seem to find anything like it on the web… Really looking forward to getting Ansible working, the things that I’ve been doing so far look really positive.

Cheers,
Allan

if these are init scripts, you can probably skip 1. and 4.

is there any danger the yum upgrades might change config files?
i'm guessing you want to do this across N servers in serial, not parallel -
read over :

http://docs.ansible.com/guide_rolling_upgrade.html

and work through the links til that makes sense, your use case doesn't
sound as complicated as that one.

Unfortunately some of the scripts can’t be run via init so have to be manually started. Though theoretically we could make a separate script in bash to kick start those applications…

What I do need to do though, is make sure that the processes get started and are running as expected (i.e. check a network port).

The updates would be system updates so there is a possibility of things like /etc/localtime being updated - which on some environments we’d need to revert these to GMT from BST.

I’ll look through the rolling updates one and see if I can get the yum updates working. I did notice that when I use sudo in a playbook it sits there and waits on the server, presumably asking for the authentication password. Perhaps I need to go back and RTFM a bit more on that one.

The following playbook is what I use at a few customers, in one case we patch about 2700 servers each month. Before we were able to do this on a monthly basis, we had quite some things to clean up and standardize.

The very first time, we used smaller batches so that we could make sure that all init-scripts were present and to communicate with the various (internal) customers wrt. problems. Once all systems are alligned to the same baseline, things become a lot easier, the set of updates is very tangible and we do batches of 50 systems and execute multiple runs in parallel.

In summary, we do:

  - Check if redhat-lsb is installed
  - Clean up stale repository metadata (optional, we needed to remove leftover Satellite channel data)
  - Check free space in /var/yum/cache and /usr (optional, it prevents failures that require to login to find what's going on)
  - Update all packages using yum
  - Propose to reboot the systems that have had updates
  - Check if the system comes back correctly (we also plan to check the uptime, pull-request in queue)

All our systems are connected to the same frozen channels in a Satellite, which makes it a lot easier to manage. Every month we start to update the frozen channel with the latest updates, we then test the process and updates on about 150 internal systems (some of these are crucial infrastructure, so they get the security updates earlier).

The next day we have a meeting with Change Management, Security Governance and Linux Operations and we go through the list of updates (we have a custom tool to compile a list of updates, and the distribution over our 2700 Linux servers of each update). Based on this list and discussion, we decide if patching is useful and rebooting is necessary.

Then we have spread the patching of all systems over 4 days (2 non-prod the first week and 2 prod the second week), in about 12 different timeframes. This is useful to ensure that systems in a complex setup are not patched/rebooted at the same time, and in case of issues we can reduce the impact and have sufficient time to troubleshoot and resolve. Each "wave" takes about 20 minutes, so in essence we patch 2700 servers in roughly 5 hours.

Essential is that all services are properly scripted using init-scripts and clean shutdowns work well, and everything is started correctly. In case of MySQL e.g. it may mean tuning the timeout of the init-script, etc.

Also essential is to get your customers involved in the process and give them control over what systems are part of what wave, whether they control the reboots themselves, etc. Key is to not allow any exceptions, but look for solutions together. We had very little opposition, and once we had proven this mechanism worked, only small changes were made in iterations.

We plan to integrate our firmware-patching playbook into this one as well, twice a year. But this coincides with minor OS updates and patching takes in this case longer than 20 minutes anyway.

Thanks for the great post Dag! Seems like it’s a similar use case to my situation and will hopefully help to get my playbook complete. So far so good though :slight_smile:

​I have written and have been using this custom module now for a while​ to
check on specific processes in the process list, and eventually killing
them:

https://github.com/ginsys/ansible-plugins/blob/devel/library/check_process