Managing Network Config Drift with Ansible Part 1

Managing Network Config Drift with Ansible Part 1

It’s crucial to manage network configuration drift to ensure your network remains secure, compliant, and reliable. This three part blog explores network configuration drift through a combination of video demos, code snippets, and supporting Git repositories.

Why does all of this matter?

Configuration drift is what happens when network devices (like routers, switches, and firewalls) deviate from their intended, standardized baseline. This is typically caused by manual, ad-hoc adjustments made during troubleshooting or routine operations, and it introduces significant risks:

  • Security Vulnerabilities :shield:: An inconsistent network is inherently vulnerable. A firewall rule left disabled after a test, an outdated access list, or an exposed SNMP community string can all create exploitable entry points for attackers.
  • Compliance Violations :scroll:: For regulated industries like finance (PCI-DSS) and healthcare (HIPAA), configuration drift can push devices out of compliance. This can result in failed audits, steep financial penalties, and lasting reputational harm.
  • Network Instability and Outages :chart_decreasing:: Mismatched settings for routing protocols, QoS policies, or VLANs can lead to erratic behavior, packet loss, and outages that are extremely difficult to diagnose. When a network’s reality doesn’t match its design, troubleshooting becomes incredibly inefficient.
  • Failed Automation :robot:: Automation tools depend on a predictable environment. When configurations have drifted, scripts built for a standardized setup are prone to failure, undermining automation initiatives and hindering operational efficiency.

Single Source of Truth

A network single source of truth (SSoT) is critically important for configuration drift management because it establishes the single, authoritative definition of how the network should be. Without this official baseline, you’re merely guessing about the network’s correct state, making it impossible to effectively identify, manage, or fix configuration drift.

How does it work?

Automation tools like Ansible are powerless without a clear directive. The SSoT provides the necessary data to find and fix drift automatically.

  • Detection: An automation tool compares the live configuration of a device against its ’intended’ configuration in the SSoT. Any difference is immediately flagged as drift.
  • Remediation: Once drift is detected, the same tool can use the SSoT to generate the correct configuration and automatically push it to the device, bringing it back into compliance.

Examples of Network SSoT

The choice of SSoT can range from simple files to sophisticated database systems, each with different levels of reliability and functionality. These can be broadly categorized by their data structure.


Network Single Source of Truth

1. Basic or Unstructured Data Sources

  • Configuration Backups: A repository of raw configuration files, captured directly from devices, can act as a simple baseline.
  • Spreadsheets: A shared spreadsheet can track IP addresses, VLANs, and interface descriptions. While accessible, spreadsheets are notoriously fragile, lack robust version control, and become unmanageable as the network grows.

2. Structured Data and Template-Based Sources

Structured data is organized in a predictable format (like YAML or JSON) that machines can easily read, validate, and query. Git Repositories with YAML/JSON: This is the de facto standard for network automation. In Ansible, group_vars and host_vars YAML files store data like IP addresses, BGP ASN, and NTP server lists. These variables are then combined with the following:

  • Jinja2 templates to render the complete, final configuration for each device.
  • Network Resource Modules to build scoped configurations that map to IP addresses, BGP ASN, and NTP server etc

3. Purpose-Built SSoT Platforms

These platforms provide a dedicated database, API, and user interface designed specifically to be the SSoT for network infrastructure.

  • NetBox / Nautobot: These are popular open-source platforms that serve as the central database for all network assets, including sites, racks, devices, interfaces, IP addresses, and VLANs.
  • CMDB (Configuration Management Database): In larger enterprises, a CMDB (like ServiceNow) may act as the overarching SSoT. These systems track all IT assets and their relationships, though they are often more general-purpose and less network-specific than tools like NetBox.

How to implement Network Configuration Drift Management in your organization

This section will explore one of the three separate demos that discuss how to implement Network Configuration Drift Management with Ansible automation:

  1. Intended config file (unstructured) compared to network device configurations using ansible config modules
  2. GIt repository host_vars (structured) compared to network device configurations using ansible resource modules
  3. Netbox config context (structured) compared to network device configurations using event driven ansible and ansible resource modules

Drift Management Workflow with Intended Config Files

The Intended config file (unstructured) was compared to network device configurations using ansible config modules.

In the following Ansible Automation workflow we will explore (SSOT, Drift, and Remediation). Please note, we are only covering the key takeaways in this blog, but not the AAP workflow in its entirety.


Workflow

Key Take aways:

SSOT: The network single source of truth is established by defining variables for both group_vars and host_vars. These variables are iterated through a jinja2 template to render a complete cisco switch CLI configuration file. Please see the below git repo link for more details on how we created the intended configuration file.

                         *Intended Configuration*

Drift: DIFFs are checked using the cisco.ios.ios_config module. (explained below)to compare the intended configuration to the Cisco switch’s startup configuration.

Remediation: SCP is used to either merge or overwrite the CIsco switch running-startup config based on policy. (see demo below)

Git Repo:
The following example code for this section (playbooks , templates, config files) are located here.
This section of the blog also references Ansible Automates
Turning Intent into Impact session slides from the recent Ansible Automates 2025.

Demo replay (This demo is based on the afore mentioned Git repo)
In this demo a Network engineer makes a configuration mistake to the switches from the out-of-band SSH. Ansible config drift helps us detect the mistake and resolve the issue using ansible modules and an automation controller workflow.

File Tree from the repo:

Drift Playbook

Here’s a snippet of what the diff.yml (drift) check task would look like in your playbook:

---
- name: Playbook to compare the DIFF between the Intended Config and the Startup config
  hosts: leaf_switches
  gather_facts: false

  tasks:

  - name: Retrieve a repository from a distant location and make it available to the local EE
    ansible.scm.git_retrieve:
      origin:
        url: "http://gitea:{{password}}@aap:3000/gitea/{{repo}}"
      parent_directory: /tmp/
      branch:
        name: main
        duplicate_detection: no
    register: repository
    delegate_to: localhost
    run_once: true

  - name: 
     |
       Diff against cisco ios configuration for leaf_switches
       Scroll to the bottom of DIFF for switch name
    cisco.ios.config:
      diff_against: startup
      running_config: "{{ lookup('file', 'intended/{{ inventory_hostname }}.cfg') }}"
    register: output

<truncated>

The above playbook tasks compares the device’s configuration against our intended “golden” configuration. Note the following parameters:

  • diff_against: startup_config: This specifies which configuration on the device to check. We’ve chosen the startup_config instead of the default(running_config)
  • running_config: This parameter defines the Intended Configuration Source: The file we are comparing against is the one we previously generated using a Jinja2 template in the configurator.yml playbook. This file represents our single source of truth.

Drift Playbook Output

This snippet shows the output from the diff.yml playbook. Lines beginning with - represent the “before” state (the switch’s current startup configuration), while lines with + show the “after” state (the intended configuration). In this case, the new logging settings in the after block were originated from our group_vars.


TASK [Diff against cisco ios configuration for leaf_switches
Scroll to the bottom of DIFF for switch name] ***


--- before
+++ after

@@ -8,6 +8,10 @@
exit-address-familyt
address-family ipv6
exit-address-family

+logging userinfo
+logging buffered 12000 notifications
+logging console critical
+logging monitor warnings

no aaa new-model
switch 1 provision c9kv-uadp-8p
ip routing
@@ -15,7 +19,8 @@
no ip domain lookup
ip domain name example.com
login on-success log
-vtp version 1
+vtp mode transparent
+vtp version 2
crypto pki trustpoint TP-self-signed-3305966887
enrollment selfsigned
subject-name cn=IOS-Self-Signed-Certificate-3305966887
@@ -38,6 +43,12 @@
username admin privilege 15 password 0
red…
changed: [clab-cat-leaf1]

Policy

Below is a snippet of the vars/push.yaml where we limit devices in our inventory to either merge the before and after together or overwrite the device configuration with the intended configuration. In other words, overwrite the before state with the after. This file was used as part of the approval node stage in the workflow.

Drift Workflow

# Update Devices to Overwrite or Merge configurations

---

_limit_overwrite:

- clab-cat-leaf1

_limit_merge:

- clab-cat-leaf2

Remediate

While this playbook shows one method (scp), it’s worth noting the other alternatives for this task:

  • The netcommon.net_put module could be used to directly transfer a configuration file.
  • The network.restore validated content role offers another structured approach for restoring configurations.
---
- name: Playbook to overwrite the intended config to the startup configuration 
  hosts: leaf_switches
  gather_facts: false
  
  vars_files:
   - vars/push.yaml

  tasks:
  
  - name: Copy file over to startup-config
    ansible.builtin.command: "scp -o StrictHostKeyChecking=no intended/{{ inventory_hostname }}.cfg {{ ansible_user }}@{{ inventory_hostname }}:nvram:startup-config"
    delegate_to: localhost
    when: inventory_hostname in _limit_overwrite
    no_log: true
    
    cisco.ios.ios_command:
      commands:
        - command: 'configure replace nvram:startup-config force'
    when: inventory_hostname in _limit_overwrite

  - name: Merge File to Running Config
    cisco.ios.ios_config:
      src: intended/{{ inventory_hostname }}.cfg
      save_when: always
    when: inventory_hostname in _limit_merge

Bringing it all together

Part 1 of this blog series demonstrates a complete, practical workflow for managing network configuration drift using Ansible. It establishes that to combat the security and stability risks of drift, a Single Source of Truth (SSoT) is essential. In the detailed demo, this SSoT was created by using Jinja2 templates to render complete, “golden” configuration files for each network device. The core of the solution is an Ansible workflow that first uses the cisco.ios.config module to perform a diff, comparing this intended configuration file against the live device to detect any drift. Once drift is identified, the workflow provides a clear, policy-driven remediation path—using modules like ansible.builtin.command or cisco.ios.ios_config to either merge or completely overwrite the device’s configuration, ensuring it is brought back into alignment with the source of truth.

See you next week for Part 2: Manage Config Drift with Network Resource Modules!

3 Likes