Optimizing 'gather_facts' response time over networking devices

With:

  • ansible 3.2.0 (pip3)
  • ansible-base 2.10.7 (pip3)

The goal is twofold:- gather the facts only from the local cache when the cache is not empty and its timeout <fact_caching_timeout> has not expired

  • contact the remote network device only when the cache is empty or the timeout has expired

I made some tests with the following settings in /etc/ansible/ansible.cfg:

fact_caching = redis
fact_caching_timeout = 3600
fact_caching_connection = localhost:6379:0:<redis_password>

Making sure that redis is running with requirepass <redis_password> in /etc/redis/redis.conf:

$ sudo systemctl status redis
● redis-server.service - Advanced key-value store
Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-04-09 12:09:54 CEST; 7h ago
Docs: http://redis.io/documentation,
man:redis-server(1)
Main PID: 1610 (redis-server)
Status: “Ready to accept connections”
Tasks: 5 (limit: 18973)
Memory: 5.7M
CGroup: /system.slice/redis-server.service
└─1610 /usr/bin/redis-server 127.0.0.1:6379

Apr 09 12:09:54 host systemd[1]: Starting Advanced key-value store…
Apr 09 12:09:54 host systemd[1]: Started Advanced key-value store.

Running the following simple play over an IOS device:

  • “Gathering Facts” took several seconds indicating that the remote device has been contacted

  • “Pinging remote device to create/update facts cache” was almost instantaneous

  • name: Gathering remote device facts preferably from the cache
    hosts: all
    vars:
    ansible_connection: network_cli
    gather_facts: yes
    strategy: debug
    tasks:

  • name: Pinging remote device to create/update facts cache
    ping:

leads to:

ansible-playbook 2.10.7
config file = /etc/ansible/ansible.cfg
ansible python module location = /usr/local/lib/python3.9/dist-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.9.4 (default, Apr 4 2021, 19:38:44) [GCC 10.2.1 20210401]
Using /etc/ansible/ansible.cfg as config file
Parsed lab/hosts inventory source with ini plugin
redirecting (type: cache) ansible.builtin.redis to community.general.redis
Redis connection: Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
Skipping callback ‘default’, as we already have a stdout callback.
Skipping callback ‘minimal’, as we already have a stdout callback.
Skipping callback ‘oneline’, as we already have a stdout callback.

I retried the playbook and I rectify a previous assertion: with ‘gather_facts: no’, the facts are read from the cache as expected.
However, I confirm that with ‘gather_facts: yes’, the facts are always gathered from the remote host, regardless of the cache timeout value.

I also tried the same playbook over a compute node (Ubuntu server) (without ‘ansible_connection: network_cli’ of course) , and I got the same results: the facts are always gathered from the remote device at each run with ‘gather_facts: yes

It seems that I’m misunderstanding the real meaning of gather_facts and the primary goal of the thread does not seem to be implemented by ansible and should be manually implemented somehow by the user.
I suppose that it also means that when the cache timeout expires, all the ‘ansible_facts’ data disappear from the cache and that’s it. If nothing is done by the user to gather them from the remote device, they are not accessible anymore.
If the timeout expiration is supposed to trigger some background facts gathering from the remote device, it must happen during some playbook run, otherwise it is lost.

  • If I run the first playbook with ‘gather_facts: yes, then run it before the cache timeout expires with ‘gather_facts: no’, the ‘ansible_facts’ continue to be accessible.
  • however, if I run the first playbook with ‘gather_facts: yes, wait for the cache timeout to expire and then run it with ‘gather_facts: no’, the ‘ansible_facts’ are not accessible anymore. Nothing is triggered.

Is the last behavior expected?

https://docs.ansible.com/ansible/latest/reference_appendices/config.html#default-gathering
^ set to smart and you can ignore `gather_facts` except for those
plays in which you want to force it.

@Brian Coca

I have already tried to set gathering=smart in ansible.cfg or export ANSIBLE_GATHERING=smart
Nothing happens during the second play run when the fact_caching_timeout expires after the first one.

Anyway, even if it worked, there would be a major drawback: its unpredictability.

For instance, let’s assume:

  • fact_caching_timeout expires after the first play run
  • one of the ansible_facts is used at the beginning of the second run to perform a group_by with ansible_net_version for instance

Even if the smart feature works and kicks in after the second run begins, there is a high probability that the playbook will fail due to ansible_net_version being undefined, depending on when exactly it does kick in and how long it takes to retrieve the data from the remote device.

On top of that, there is no way to run a module to gather facts when that variable is undefined, because there is no setup equivalence in the networking ecosystem.
IIUC, when gather_facts: yes is used, it launches the correct platform-dependent gathering facts module based on the value of predefined ansible_network_os. And there is not a single umbrella module to take care of that logic.

Hence my initial wrong belief that gather_facts: yes would first check if the fact_caching_timeout is about to expire (for instance halfway through the timeout) before deciding whether to gather facts from the remote device or not. With that type of logic, we could count on the fact that after that call, the ansible_facts would be accessible for sure**.**

As a summary:

  • the smart feature does not work over networking devices

  • even if it did, it would be:

    • unpredictable
    • unusable in some use cases where a gather_facts should be avoided unless absolutely necessary
      Workaround:
  • gathering = explicit

  • run a background playbook every halfway through the fact_caching_timeout for all remote devices

… with gather_facts: yes. (end of last sentence)

the time out only affects facts when first fetched, once in memory
they should not expire or you would lose facts mid run and become
inconsistent. if you want to force retry you can check datetime facts
yourself and make fact gathering conditional on that on the 2nd play.