Running the following simple play over an IOS device:
“Gathering Facts” took several seconds indicating that the remote device has been contacted
“Pinging remote device to create/update facts cache” was almost instantaneous
name: Gathering remote device facts preferably from the cache
hosts: all
vars:
ansible_connection: network_cli
gather_facts: yes
strategy: debug
tasks:
name: Pinging remote device to create/update facts cache
ping:
leads to:
ansible-playbook 2.10.7
config file = /etc/ansible/ansible.cfg
ansible python module location = /usr/local/lib/python3.9/dist-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 3.9.4 (default, Apr 4 2021, 19:38:44) [GCC 10.2.1 20210401]
Using /etc/ansible/ansible.cfg as config file
Parsed lab/hosts inventory source with ini plugin
redirecting (type: cache) ansible.builtin.redis to community.general.redis
Redis connection: Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
Skipping callback ‘default’, as we already have a stdout callback.
Skipping callback ‘minimal’, as we already have a stdout callback.
Skipping callback ‘oneline’, as we already have a stdout callback.
I retried the playbook and I rectify a previous assertion: with ‘gather_facts: no’, the facts are read from the cache as expected.
However, I confirm that with ‘gather_facts: yes’, the facts are always gathered from the remote host, regardless of the cache timeout value.
I also tried the same playbook over a compute node (Ubuntu server) (without ‘ansible_connection: network_cli’ of course) , and I got the same results: the facts are always gathered from the remote device at each run with ‘gather_facts: yes’
It seems that I’m misunderstanding the real meaning of gather_facts and the primary goal of the thread does not seem to be implemented by ansible and should be manually implemented somehow by the user.
I suppose that it also means that when the cache timeout expires, all the ‘ansible_facts’ data disappear from the cache and that’s it. If nothing is done by the user to gather them from the remote device, they are not accessible anymore.
If the timeout expiration is supposed to trigger some background facts gathering from the remote device, it must happen during some playbook run, otherwise it is lost.
If I run the first playbook with ‘gather_facts: yes’, then run it before the cache timeout expires with ‘gather_facts: no’, the ‘ansible_facts’ continue to be accessible.
however, if I run the first playbook with ‘gather_facts: yes’, wait for the cache timeout to expire and then run it with ‘gather_facts: no’, the ‘ansible_facts’ are not accessible anymore. Nothing is triggered.
I have already tried to set gathering=smart in ansible.cfg or export ANSIBLE_GATHERING=smart
Nothing happens during the second play run when the fact_caching_timeout expires after the first one.
Anyway, even if it worked, there would be a major drawback: its unpredictability.
For instance, let’s assume:
fact_caching_timeout expires after the first play run
one of the ansible_facts is used at the beginning of the second run to perform a group_by with ansible_net_version for instance
Even if the smart feature works and kicks in after the second run begins, there is a high probability that the playbook will fail due to ansible_net_version being undefined, depending on when exactly it does kick in and how long it takes to retrieve the data from the remote device.
On top of that, there is no way to run a module to gather facts when that variable is undefined, because there is no setup equivalence in the networking ecosystem.
IIUC, when gather_facts: yes is used, it launches the correct platform-dependent gathering facts module based on the value of predefined ansible_network_os. And there is not a single umbrella module to take care of that logic.
Hence my initial wrong belief that gather_facts: yes would first check if the fact_caching_timeout is about to expire (for instance halfway through the timeout) before deciding whether to gather facts from the remote device or not. With that type of logic, we could count on the fact that after that call, the ansible_facts would be accessible for sure**.**
As a summary:
the smart feature does not work over networking devices
even if it did, it would be:
unpredictable
unusable in some use cases where a gather_facts should be avoided unless absolutely necessary Workaround:
gathering = explicit
run a background playbook every halfway through the fact_caching_timeout for all remote devices
the time out only affects facts when first fetched, once in memory
they should not expire or you would lose facts mid run and become
inconsistent. if you want to force retry you can check datetime facts
yourself and make fact gathering conditional on that on the 2nd play.