Optimizing 'gather_facts' response time over networking devices

actionmystique · April 9, 2021, 5:31pm

With:

ansible 3.2.0 (pip3)
ansible-base 2.10.7 (pip3)

The goal is twofold:- gather the facts only from the local cache when the cache is not empty and its timeout <fact_caching_timeout> has not expired

contact the remote network device only when the cache is empty or the timeout has expired

I made some tests with the following settings in /etc/ansible/ansible.cfg:

fact_caching = redis
fact_caching_timeout = 3600
fact_caching_connection = localhost:6379:0:<redis_password>

Making sure that redis is running with requirepass <redis_password> in /etc/redis/redis.conf:

$ sudo systemctl status redis
● redis-server.service - Advanced key-value store
Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-04-09 12:09:54 CEST; 7h ago
Docs: http://redis.io/documentation,
man:redis-server(1)
Main PID: 1610 (redis-server)
Status: “Ready to accept connections”
Tasks: 5 (limit: 18973)
Memory: 5.7M
CGroup: /system.slice/redis-server.service
└─1610 /usr/bin/redis-server 127.0.0.1:6379

Apr 09 12:09:54 host systemd[1]: Starting Advanced key-value store…
Apr 09 12:09:54 host systemd[1]: Started Advanced key-value store.

Running the following simple play over an IOS device:

“Gathering Facts” took several seconds indicating that the remote device has been contacted
“Pinging remote device to create/update facts cache” was almost instantaneous
name: Gathering remote device facts preferably from the cache
hosts: all
vars:
ansible_connection: network_cli
gather_facts: yes
strategy: debug
tasks:
name: Pinging remote device to create/update facts cache
ping:

leads to:

ansible-playbook 2.10.7
config file = /etc/ansible/ansible.cfg
ansible python module location = /usr/local/lib/python3.9/dist-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.9.4 (default, Apr 4 2021, 19:38:44) [GCC 10.2.1 20210401]
Using /etc/ansible/ansible.cfg as config file
Parsed lab/hosts inventory source with ini plugin
redirecting (type: cache) ansible.builtin.redis to community.general.redis
Redis connection: Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
Skipping callback ‘default’, as we already have a stdout callback.
Skipping callback ‘minimal’, as we already have a stdout callback.
Skipping callback ‘oneline’, as we already have a stdout callback.

actionmystique · April 10, 2021, 5:39am

I retried the playbook and I rectify a previous assertion: with ‘gather_facts: no’, the facts are read from the cache as expected.
However, I confirm that with ‘gather_facts: yes’, the facts are always gathered from the remote host, regardless of the cache timeout value.

I also tried the same playbook over a compute node (Ubuntu server) (without ‘ansible_connection: network_cli’ of course) , and I got the same results: the facts are always gathered from the remote device at each run with ‘gather_facts: yes’

It seems that I’m misunderstanding the real meaning of gather_facts and the primary goal of the thread does not seem to be implemented by ansible and should be manually implemented somehow by the user.
I suppose that it also means that when the cache timeout expires, all the ‘ansible_facts’ data disappear from the cache and that’s it. If nothing is done by the user to gather them from the remote device, they are not accessible anymore.
If the timeout expiration is supposed to trigger some background facts gathering from the remote device, it must happen during some playbook run, otherwise it is lost.

If I run the first playbook with ‘gather_facts: yes’, then run it before the cache timeout expires with ‘gather_facts: no’, the ‘ansible_facts’ continue to be accessible.
however, if I run the first playbook with ‘gather_facts: yes’, wait for the cache timeout to expire and then run it with ‘gather_facts: no’, the ‘ansible_facts’ are not accessible anymore. Nothing is triggered.

Is the last behavior expected?

system · April 12, 2021, 4:36pm

https://docs.ansible.com/ansible/latest/reference_appendices/config.html#default-gathering
^ set to smart and you can ignore `gather_facts` except for those
plays in which you want to force it.

actionmystique · April 13, 2021, 12:57pm

@Brian Coca

I have already tried to set gathering=smart in ansible.cfg or export ANSIBLE_GATHERING=smart
Nothing happens during the second play run when the fact_caching_timeout expires after the first one.

Anyway, even if it worked, there would be a major drawback: its unpredictability.

For instance, let’s assume:

fact_caching_timeout expires after the first play run
one of the ansible_facts is used at the beginning of the second run to perform a group_by with ansible_net_version for instance

Even if the smart feature works and kicks in after the second run begins, there is a high probability that the playbook will fail due to ansible_net_version being undefined, depending on when exactly it does kick in and how long it takes to retrieve the data from the remote device.

On top of that, there is no way to run a module to gather facts when that variable is undefined, because there is no setup equivalence in the networking ecosystem.
IIUC, when gather_facts: yes is used, it launches the correct platform-dependent gathering facts module based on the value of predefined ansible_network_os. And there is not a single umbrella module to take care of that logic.

Hence my initial wrong belief that gather_facts: yes would first check if the fact_caching_timeout is about to expire (for instance halfway through the timeout) before deciding whether to gather facts from the remote device or not. With that type of logic, we could count on the fact that after that call, the ansible_facts would be accessible for sure**.**

As a summary:

the smart feature does not work over networking devices
even if it did, it would be:
- unpredictable
- unusable in some use cases where a gather_facts should be avoided unless absolutely necessary
  Workaround:
gathering = explicit
run a background playbook every halfway through the fact_caching_timeout for all remote devices

actionmystique · April 13, 2021, 12:59pm

… with gather_facts: yes. (end of last sentence)

system · April 13, 2021, 4:43pm

the time out only affects facts when first fetched, once in memory
they should not expire or you would lose facts mid run and become
inconsistent. if you want to force retry you can check datetime facts
yourself and make fact gathering conditional on that on the 2nd play.

Topic		Replies	Views
Ansible 1.3 - cached facts. Ansible Project	6	3	October 15, 2013
ansible ignores facts caching to redis Ansible Project fedora	0	2	July 9, 2015
fact caching refreshing? Ansible Project	7	31	April 21, 2016
Gathering facts failing Ansible Project	0	151	January 30, 2022
Preliminary fact-caching support now available on the development branch Ansible Project	0	2	August 11, 2014

Optimizing 'gather_facts' response time over networking devices

Related topics