Managing Network Config Drift with Ansible Part 1
It’s crucial to manage network configuration drift to ensure your network remains secure, compliant, and reliable. This three part blog explores network configuration drift through a combination of video demos, code snippets, and supporting Git repositories.
Why does all of this matter?
Configuration drift is what happens when network devices (like routers, switches, and firewalls) deviate from their intended, standardized baseline. This is typically caused by manual, ad-hoc adjustments made during troubleshooting or routine operations, and it introduces significant risks:
- Security Vulnerabilities
: An inconsistent network is inherently vulnerable. A firewall rule left disabled after a test, an outdated access list, or an exposed SNMP community string can all create exploitable entry points for attackers.
- Compliance Violations
: For regulated industries like finance (PCI-DSS) and healthcare (HIPAA), configuration drift can push devices out of compliance. This can result in failed audits, steep financial penalties, and lasting reputational harm.
- Network Instability and Outages
: Mismatched settings for routing protocols, QoS policies, or VLANs can lead to erratic behavior, packet loss, and outages that are extremely difficult to diagnose. When a network’s reality doesn’t match its design, troubleshooting becomes incredibly inefficient.
- Failed Automation
: Automation tools depend on a predictable environment. When configurations have drifted, scripts built for a standardized setup are prone to failure, undermining automation initiatives and hindering operational efficiency.
Single Source of Truth
A network single source of truth (SSoT) is critically important for configuration drift management because it establishes the single, authoritative definition of how the network should be. Without this official baseline, you’re merely guessing about the network’s correct state, making it impossible to effectively identify, manage, or fix configuration drift.
How does it work?
Automation tools like Ansible are powerless without a clear directive. The SSoT provides the necessary data to find and fix drift automatically.
- Detection: An automation tool compares the live configuration of a device against its ’intended’ configuration in the SSoT. Any difference is immediately flagged as drift.
- Remediation: Once drift is detected, the same tool can use the SSoT to generate the correct configuration and automatically push it to the device, bringing it back into compliance.
Examples of Network SSoT
The choice of SSoT can range from simple files to sophisticated database systems, each with different levels of reliability and functionality. These can be broadly categorized by their data structure.
Network Single Source of Truth
1. Basic or Unstructured Data Sources
- Configuration Backups: A repository of raw configuration files, captured directly from devices, can act as a simple baseline.
- Spreadsheets: A shared spreadsheet can track IP addresses, VLANs, and interface descriptions. While accessible, spreadsheets are notoriously fragile, lack robust version control, and become unmanageable as the network grows.
2. Structured Data and Template-Based Sources
Structured data is organized in a predictable format (like YAML or JSON) that machines can easily read, validate, and query. Git Repositories with YAML/JSON: This is the de facto standard for network automation. In Ansible, group_vars and host_vars YAML files store data like IP addresses, BGP ASN, and NTP server lists. These variables are then combined with the following:
- Jinja2 templates to render the complete, final configuration for each device.
- Network Resource Modules to build scoped configurations that map to IP addresses, BGP ASN, and NTP server etc
3. Purpose-Built SSoT Platforms
These platforms provide a dedicated database, API, and user interface designed specifically to be the SSoT for network infrastructure.
- NetBox / Nautobot: These are popular open-source platforms that serve as the central database for all network assets, including sites, racks, devices, interfaces, IP addresses, and VLANs.
- CMDB (Configuration Management Database): In larger enterprises, a CMDB (like ServiceNow) may act as the overarching SSoT. These systems track all IT assets and their relationships, though they are often more general-purpose and less network-specific than tools like NetBox.
How to implement Network Configuration Drift Management in your organization
This section will explore one of the three separate demos that discuss how to implement Network Configuration Drift Management with Ansible automation:
- Intended config file (unstructured) compared to network device configurations using ansible config modules
- GIt repository host_vars (structured) compared to network device configurations using ansible resource modules
- Netbox config context (structured) compared to network device configurations using event driven ansible and ansible resource modules
Drift Management Workflow with Intended Config Files
The Intended config file (unstructured) was compared to network device configurations using ansible config modules.
In the following Ansible Automation workflow we will explore (SSOT, Drift, and Remediation). Please note, we are only covering the key takeaways in this blog, but not the AAP workflow in its entirety.
Workflow
Key Take aways:
SSOT: The network single source of truth is established by defining variables for both group_vars and host_vars. These variables are iterated through a jinja2 template to render a complete cisco switch CLI configuration file. Please see the below git repo link for more details on how we created the intended configuration file.
*Intended Configuration*
Drift: DIFFs are checked using the cisco.ios.ios_config module. (explained below)to compare the intended configuration to the Cisco switch’s startup configuration.
Remediation: SCP is used to either merge or overwrite the CIsco switch running-startup config based on policy. (see demo below)
Git Repo:
The following example code for this section (playbooks , templates, config files) are located here.
This section of the blog also references Ansible Automates
Turning Intent into Impact session slides from the recent Ansible Automates 2025.
Demo replay (This demo is based on the afore mentioned Git repo)
In this demo a Network engineer makes a configuration mistake to the switches from the out-of-band SSH. Ansible config drift helps us detect the mistake and resolve the issue using ansible modules and an automation controller workflow.
File Tree from the repo:
Drift Playbook
Here’s a snippet of what the diff.yml (drift) check task would look like in your playbook:
---
- name: Playbook to compare the DIFF between the Intended Config and the Startup config
hosts: leaf_switches
gather_facts: false
tasks:
- name: Retrieve a repository from a distant location and make it available to the local EE
ansible.scm.git_retrieve:
origin:
url: "http://gitea:{{password}}@aap:3000/gitea/{{repo}}"
parent_directory: /tmp/
branch:
name: main
duplicate_detection: no
register: repository
delegate_to: localhost
run_once: true
- name:
|
Diff against cisco ios configuration for leaf_switches
Scroll to the bottom of DIFF for switch name
cisco.ios.config:
diff_against: startup
running_config: "{{ lookup('file', 'intended/{{ inventory_hostname }}.cfg') }}"
register: output
<truncated>
The above playbook tasks compares the device’s configuration against our intended “golden” configuration. Note the following parameters:
- diff_against: startup_config: This specifies which configuration on the device to check. We’ve chosen the startup_config instead of the default(running_config)
- running_config: This parameter defines the Intended Configuration Source: The file we are comparing against is the one we previously generated using a Jinja2 template in the configurator.yml playbook. This file represents our single source of truth.
Drift Playbook Output
This snippet shows the output from the diff.yml playbook. Lines beginning with - represent the “before” state (the switch’s current startup configuration), while lines with + show the “after” state (the intended configuration). In this case, the new logging settings in the after block were originated from our group_vars.
TASK [Diff against cisco ios configuration for leaf_switches
Scroll to the bottom of DIFF for switch name] ***
--- before
+++ after
@@ -8,6 +8,10 @@
exit-address-familyt
address-family ipv6
exit-address-family
+logging userinfo
+logging buffered 12000 notifications
+logging console critical
+logging monitor warnings
no aaa new-model
switch 1 provision c9kv-uadp-8p
ip routing
@@ -15,7 +19,8 @@
no ip domain lookup
ip domain name example.com
login on-success log
-vtp version 1
+vtp mode transparent
+vtp version 2
crypto pki trustpoint TP-self-signed-3305966887
enrollment selfsigned
subject-name cn=IOS-Self-Signed-Certificate-3305966887
@@ -38,6 +43,12 @@
username admin privilege 15 password 0
red…
changed: [clab-cat-leaf1]
Policy
Below is a snippet of the vars/push.yaml where we limit devices in our inventory to either merge the before and after together or overwrite the device configuration with the intended configuration. In other words, overwrite the before state with the after. This file was used as part of the approval node stage in the workflow.
Drift Workflow
# Update Devices to Overwrite or Merge configurations
---
_limit_overwrite:
- clab-cat-leaf1
_limit_merge:
- clab-cat-leaf2
Remediate
While this playbook shows one method (scp), it’s worth noting the other alternatives for this task:
- The netcommon.net_put module could be used to directly transfer a configuration file.
- The network.restore validated content role offers another structured approach for restoring configurations.
---
- name: Playbook to overwrite the intended config to the startup configuration
hosts: leaf_switches
gather_facts: false
vars_files:
- vars/push.yaml
tasks:
- name: Copy file over to startup-config
ansible.builtin.command: "scp -o StrictHostKeyChecking=no intended/{{ inventory_hostname }}.cfg {{ ansible_user }}@{{ inventory_hostname }}:nvram:startup-config"
delegate_to: localhost
when: inventory_hostname in _limit_overwrite
no_log: true
cisco.ios.ios_command:
commands:
- command: 'configure replace nvram:startup-config force'
when: inventory_hostname in _limit_overwrite
- name: Merge File to Running Config
cisco.ios.ios_config:
src: intended/{{ inventory_hostname }}.cfg
save_when: always
when: inventory_hostname in _limit_merge
Bringing it all together
Part 1 of this blog series demonstrates a complete, practical workflow for managing network configuration drift using Ansible. It establishes that to combat the security and stability risks of drift, a Single Source of Truth (SSoT) is essential. In the detailed demo, this SSoT was created by using Jinja2 templates to render complete, “golden” configuration files for each network device. The core of the solution is an Ansible workflow that first uses the cisco.ios.config module to perform a diff, comparing this intended configuration file against the live device to detect any drift. Once drift is identified, the workflow provides a clear, policy-driven remediation path—using modules like ansible.builtin.command or cisco.ios.ios_config to either merge or completely overwrite the device’s configuration, ensuring it is brought back into alignment with the source of truth.
See you next week for Part 2: Manage Config Drift with Network Resource Modules!