Configuration Drift

Good Afternoon,
I was wondering what you all are doing to manage configuration drift. Are you having ansible fix the drift, are you having it notify you of the drift, or are you doing something else. At work, we are preparing to start having some conversations about what we want to do, and I thought this information from you all might be helpful in our journey.

Thanks for your time!!

—john

We have an ansible role that applies the CIS Distro Independent Linux 2 baseline when we launch new machines. We also have an ansible tower workflow for regularly scheduled patching. At the end of the patching workflow we again run the CIS baseline role to ensure we are maintaining compliance with our secure configuration baseline.

stop machine → snapshot → start machine → patch → reboot → test → snapshot → secure config → reboot

if patching fails we revert to the starting snapshot.
if secure config fails we revert to the post-test snapshot.

All of this is done via ansible automation platform.

Walter

That is very interesting, and helpful. Thanks…

For drift control I don’t find ansible the best tool when compared to something like Puppet in this role. However if drift control is important, that is were Tower/AWX or Satellite (ir pure RHEL based) start to shine. You can setup a scheduled application of playbooks to always ensure the configurations are current and up to date. I used Satellite and Ansible to maintain STIG. and FISMA MED security configs across multiple federal sites with a scheduled nightly push of core configs to all systems.

yeah, we are going to do it through AAP/AWX. Where my interest is besides the method ( reporting vs clobbering ), is the intervals people are using, which you answered. We are currently migrating away from Puppet in favor of Ansible, and in the process, we are reviewing decisions that were made when puppet was installed, to see if those are still valid. Many have been changed, because technology has changed. Thanks for the info, it’s really helpful.

I think the idea was carried over when we migrated from Puppet to Ansible, but all our middleware projects include a “daily” playbook. Ideally they don’t do anything unless something has drifted, although a few feed into reporting. This is separate from our patch-n-reboot process, which is more a systems level thing. We try to keep a separation between OS config and middleware configs, but the OS group’s playbook, which follows the mono-repo pattern, also runs on each host daily.

Having Ansible just push out changes as soon as they’re merged to main (or whatever) is pleasantly CI/CD like, and means that you can solicit pull requests from interested parties who may not have privs to run Ansible, and not have to push out their changes by hand once the PRs are merged. But, it makes it harder to make temporary changes outside of Ansible, which maybe you want a hard and fast rule against, but which is often pretty useful for testing things, especially on sandbox type hosts.

Having Ansible run some playbook(s) (out of cron, or the scheduler of your choice) periodically and report back on any drift is another way around the “how do you test things” problem, but it means that you have to then take some additional action to push out desirable changes. But it can be simpler than coming up with a way to say “temporarily don’t let Ansible smite my changes on this host, I’m testing stuff”.

Right, this why I liked Puppet for drift control critical things. And something I also transferred to Ansible. To avoid hard drift correction, I find you need atleast daily config reset. On developer facing systems, I have found going as often as an hour to as little as 30 min, is important to catch changes and provide predictable “soft” intervals for doing things that require temporary deviation on systems. The longer between forced true ups the harsher the drift reset becomes and the hard to diagnose what caused the drift and failure when correction of a drift cause a break…

For drift control i've found most CM systems to be lacking. I've
always used something like tripwire/aide to detect file changes and
correlate that with the proper configuration updates. Puppet and other
'resident' systems seemsgood for this but they run every Nminutes
doing a lot of work to verify things, instead of using something like
inotify to trigger immediate response from a passive kenrel hook (via
fam deamon or something as simple as incron). This ends up being a LOT
more efficient and avoids a lot of useless processing.

And yes there is an inotify tool for Ansible a la https://github.com/gantsign/ansible-role-inotify

Or you can set an attribute

chattr +i myconf.conf

or do it via https://docs.ansible.com/ansible/latest/collections/ansible/builtin/file_module.html

Oh, thanks. that is really helpful. In parallel, we have been kicking the tires on Insights for our RHEL servers too. We have a few Ubuntu servers, so obviously that won’t work for them, but I do like the idea of using a Tripwire/AIDE type tool and then passing it off to Ansible if needed. You have definitely given me some things to ponder.

–John

Ansible CAN do many things, does not mean it should, see this old
presentation where i show an example of 'file alteration monitoring'
https://www.slideshare.net/bcoca/ansible-tips- tricks slide 15
(cowsay kitty mentions aide/osiris/tripwire).

Good point. I should add that all our Linux servers also have aide installed and configured for performing file integrity checking.

Walter