Managing Network Config Drift with Ansible (Part2)
In Part 1 of this series, we established the risks of network configuration drift and the critical role a Single Source of Truth (SSoT) plays in maintaining network integrity. We demonstrated a foundational workflow where Ansible used a rendered configuration file as the SSoT, performed a diff to detect unauthorized changes, and then used a remediation playbook to bring the device back into compliance.
Now, in Part 2, we will build upon that foundation and explore a more robust and scalable approach to drift management using structured data and Ansible’s network resource modules.
Manage Config Drift with Network Resource Modules
This scenario uses GIt repository host_vars (structured) compared to network device configurations using ansible resource modules.
See the following Ansible Automation workflow where we will walk through (SSOT, Drift, and Remediation). Please note, we are only covering the key takeaways but not the AAP workflow in its entirety.
*Workflow*
Key Take aways:
SSOT: The network single source of truth for this demo is established by defining variables for host_vars using Yaml files. In this permentation the various resource modules are used to render scoped configurations from the structured data.
Drift: Drift is checked using the before state, which is derived from the Cisco router. This is accomplished by the network resource module’s option to run the state of “ replaced” in check mode while comparing the current config state from the device to the intended config.
Remediation: A resource module can simply be scheduled or re-run later in “run mode” to actually apply the needed changes. This is depicted by the Config Push (job_template in the above workflow.
Git Repo:
The following example code for this section (playbooks , roles, config files) are located here.
File Tree from the repo:
Demo replay This demo highlights Ansible’s ability to automatically correct network configuration drift. All code for this demonstration is available in the linked Git repository.
The Demo Scenario
A network engineer makes a manual configuration mistake on a router using an out-of-band SSH connection. This causes the device’s live configuration to drift from the approved baseline.
Key Technologies Used
This solution’s effectiveness comes from two key Ansible features:
- Network Resource Modules: We chose these modules because they are stateful, meaning they enforce a desired configuration. They also have built-in diff capabilities, which are perfect for identifying configuration drift.
- ansible.utils.fact_diff: To enhance the process, we use this module to provide a clearer and more optimized report of the configuration differences, making drift management more efficient.
Drift Playbook
Below is an excerpt from our drift check playbook, diff.yml
. This example shows how we execute a specific role to validate the logging_global configuration using Ansible’s stateful resource modules.
---
- name: Drift Check for Router settings
hosts: drift_routers
gather_facts: false
vars:
resource:
- logging_global
- ntp_global
- snmp_server
- interfaces
tasks:
- name: Run Drift Roles
ansible.builtin.include_role:
name: "roles/{{ role_item }}"
loop: "{{ resource }}"
loop_control:
loop_var: role_item
Roles
The following playbook’s workflow is broken into two primary tasks: one to check for configuration drift and another to report on it. The behavior of this workflow is controlled by variables passed from the Ansible Automation Platform (AAP) Job Template.
The first task leverages the netcommon.network_resource meta-module
, which was chosen specifically to make this role vendor-agnostic. It dynamically selects the correct platform-specific module by using the os_name parameter, a variable derived from our AAP inventory and group_vars. In this demo, we target the global logging settings by setting the name parameter to logging_global. The state is set to’ replaced’ to ensure the device’s configuration is brought into exact alignment with our definition.
Initially, this task is executed in Check Mode (check_mode: true)
, a variable passed from the AAP Job Template. Running in this mode allows Ansible to calculate a diff report showing any configuration discrepancies without actually applying changes to the device.
The second task is conditional and serves to display the diff that was generated. It compares the "before" state (the device's current configuration)
with the "after" state (the intended configuration defined in our host variables)
. This reporting task only runs when the diffs variable is set to true, which is also controlled as an extra variable in the AAP Job Template.
---
- name: Check logging configuration
ansible.netcommon.network_resource:
config: "{{ logging_global }}"
os_name: "{{ hostvars[inventory_hostname]['ansible_network_os'] }}"
name: logging_global
state: "{{ state }}"
register: logging_diff
check_mode: "{{ check_mode }}"
- ansible.utils.fact_diff:
before: "{{ logging_diff.before }}"
after: "{{ hostvars[inventory_hostname]['logging_global'] }}"
diff: "{{ diffs }}"
delegate_to: localhost
when: diffs == true
Host_vars
Here’s a snippet of the actual variable that was read into logging_global role and task. In the above playbook the task used the config parameter to map to the following dictionary “logging _global”
.
logging_global:
buffered:
severity: notifications
size: 12000
console:
severity: critical
facility: local5
hosts:
- host: 1.1.1.2
- host: 1.1.1.3
monitor:
severity: warnings
snmp_trap:
- alerts
- critical
- emergencies
- errors
- warnings
trap: errors
userinfo: true
Drift Playbook Output
This snippet from the snmp_server role highlights a configuration drift. It shows what happened after a network engineer manually added an unauthorized SNMP community string via SSH. The before section reflects this incorrect configuration as it exists on the live router, which conflicts with the correct state defined in the snmp_server.yaml
host_vars file in the git repo.
*DIFF*
Remediate
The remediation is basically running the same playbook “diff.yml “ again with (check_mode: false), in order to apply the needed configuration replacement in run mode
. The output of the resource module task includes the actual commands that would need to be applied to the router to remediate the configuration difference. In the below example the logging host 10.10.40.31 must be removed from the router to reconcile with the intended configuration.
Remediation
Bringing it all together
This post builds upon the foundational concepts from Part 1 to demonstrate a more robust and scalable method for managing network configuration drift. The key evolution is the shift from using unstructured configuration files to a more flexible Single Source of Truth (SSoT) based on structured data (YAML host_vars) stored in a Git repository.
The power of this approach comes from leveraging Ansible’s idempotent network resource modules. These modules allow for a simple yet powerful two-step workflow:
Audit: Run the playbook with check_mode: true. The resource module compares the intended state from the structured SSoT against the live device and generates a precise diff and the exact commands needed for remediation.
Remediate: Run the exact same playbook with check_mode: false. The resource module intelligently applies only the necessary changes to bring the device into compliance.
This modular, stateful method provides a more maintainable and reliable way to enforce configuration standards, moving from managing entire text blocks to managing specific network resources.
See you next week for Part 3: Manage Config Drift with Netbox and Ansible!