I don't understand gather_subset

I want to get tailor-made ansible facts from some of my hosts and figured that i could use ansible.builtin.setup with gather_subset but the results are not as I thought.

Consider the following test playbook which should produce nearly nothing:

- name: Fix Something
  hosts: mail
  become: true
  gather_facts: false
  tasks:
    - name: A
      become: true
      ansible.builtin.setup:
        gather_subset:
          - '!all'
          - '!min'
          - '!date_time'
    - name: B
      ansible.builtin.debug:
        msg: "{{ ansible_date_time }}"

It gets the whole ansible_date_time variable and I wonder why.

The result when I run this playbook is:

ansible-playbook ./fix_pb.yml 

PLAY [Fix Something] *************************************************************************************************

TASK [A] *************************************************************************************************************
ok: [mail]

TASK [B] *************************************************************************************************************
ok: [mail] => {}

MSG:

{'date': '2024-06-19', 'day': '19', 'epoch': '1718795981', 'epoch_int': '1718795981', 'hour': '11', 'iso8601': '2024-06-19T11:19:41Z', 'iso8601_basic': '20240619T111941008002', 'iso8601_basic_short': '20240619T111941', 'iso8601_micro': '2024-06-19T11:19:41.008002Z', 'minute': '19', 'month': '06', 'second': '41', 'time': '11:19:41', 'tz': 'UTC', 'tz_dst': 'UTC', 'tz_offset': '+0000', 'weekday': 'Wednesday', 'weekday_number': '3', 'weeknumber': '25', 'year': '2024'}

PLAY RECAP ***********************************************************************************************************
mail                       : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

The last lines of the fact file show that ansible knows what I want:

...
    "gather_subset": [
        "!all",
        "!min",
        "!date_time"
    ],
...

What is going on here ?

Any pointers welcome !

Thanks you very much

Norbert

I don’t understand it either, @Norbert . When I run the same code (against localhost), task “B” throws an undefined variable error.

What does ansible-config dump --only-changed show?

The what now? Do you have fact caching enabled? If so, you may be seeing a previously cached date_time fact.

2 Likes

That’s an excellent point/question.

Does the time shown reflect the current time, or the time of a prior run on that host?

It’s because of become: true usage somehow.

If everything is executed as default user (same context), I got

The error was: ‘ansible_date_time’ is undefined

- name: Fix Something
  hosts: localhost
  gather_facts: false
  tasks:
    - name: A
      ansible.builtin.setup:
        gather_subset:
          - '!all'
          - '!min'
          - '!date_time'
    - name: B
      ansible.builtin.debug:
        msg: "{{ ansible_date_time }}"

Furthermore, !all excluded already everything. There is no need to exclude also !date_time and !min.

You would think, but you’d be wrong. According to ansible.builtin.setup module – Gathers facts about remote hosts — Ansible Community Documentation

If !all is specified then only the min subset is collected. To avoid collecting even the min subset, specify !all,!min.

BTW, I used your code with become: true, so the blanket statement is not right. Evidently the cached facts you’re seeing are only readable by root. (?) Maybe that’s why you don’t see them w/o become: true.

2 Likes

Interesting. Now (several days later) I see an undefined variable error.

This here is my changed configuration(ansible-config dump --only-changed):

CACHE_PLUGIN(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = jsonfile
CACHE_PLUGIN_CONNECTION(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = facts
CACHE_PLUGIN_PREFIX(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = ansible_facts.
CONFIG_FILE() = /home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg
DEFAULT_GATHERING(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = smart
DEFAULT_HOST_LIST(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = ['/home/norbert/Sourc>
DEFAULT_MANAGED_STR(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = Ansible managed: te>
DEFAULT_PRIVATE_KEY_FILE(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = /home/norbert/>
DEFAULT_REMOTE_USER(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = ansible
DEFAULT_STDOUT_CALLBACK(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = debug
DEFAULT_VAULT_PASSWORD_FILE(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = /home/norbe>
RETRY_FILES_ENABLED(/home/norbert/Sourcecode/Klamann_IT/branches/Neuaufbau/ansible/ansible.cfg) = False

UPDATE 2
When I use gather_facts=true the ansible_date_time is filled. But I think it shouldn’t , because it is excluded.

i have only used the gather_subset with the network automation modules, and they work fine, here is an example of how to use them: toolkit/roles/facts/tasks/ios.yml at master · network-automation/toolkit · GitHub

It is gathering or NOT gathering certain fact keys from network devices

Okay, let’s assume gather_subset is broken with respect to ansible_date_time in such a way that it gets set/updated when it shouldn’t be, and let’s also ignore whether that’s actually the case. (I still think it’s a fact caching issue, but, whatever.)

Instead, let’s shift focus to what you want to accomplish, and how we can achieve that that in the face of strange ansible_date_time behavior.

You have fact caching enabled, so it’s normal for facts gathered in previous runs to persist even if you don’t freshly gather those specific facts.

The play keyword gather_facts: true adds another execution of the setup module that occurs before the rest of the play, and the arguments for your explicit call to the module are not relevant to how that execution behaves. You can use module_defaults to affect its execution, but because you have caching you’ll still have the issue where any other play that runs against this host can result in facts other than the ones you explicitly request in this play being available.

3 Likes

Since you are using fact caching, you will want to be aware that “gathered” facts are not necessarily updated when the fact is already cached. There is a configurable fact expiry period, but if you want to make absolutely sure that a fact isn’t stale, you can use meta: clear_facts as a task to clean the cache (per host, not the entire cache). Because of this, ansible_date_time is not a particularly reliable fact whenever caching is enabled.

What you may want to do instead, if you need the current date_time, is to use the {{ now() }} function, which can be formatted with args.

Otherwise, you’re fighting with your fact caching which is using the default timeout of 86400 seconds, or 24 hours.

2 Likes

This is not correct (at least for the builtin plugins, and while it would theoretically be possible for other plugins to ignore updates it would break the API expectations and usually be a bad idea.)

The cache affects whether facts that you have not gathered in the current run are available, and (in some configurations) whether play-level gathering happens. Gathered facts are always updated.

- hosts: localhost
  gather_facts: false
  tasks:
    - name: Uses the cached value because facts were not gathered
      debug:
        msg: "{{ ansible_facts.date_time.time }}"

    - gather_facts:

    - name: Has the freshly gathered value
      debug:
        msg: "{{ ansible_facts.date_time.time }}"

    - gather_facts:
        gather_subset: "!all,!date_time"

    - name: Still the fresher value
      debug:
        msg: "{{ ansible_facts.date_time.time }}"

    - gather_facts:
        gather_subset: date_time

    - name: Refreshed again
      debug:
        msg: "{{ ansible_facts.date_time.time }}"
PLAY [localhost] ***************************************************************

TASK [Uses the cached value because facts were not gathered] *******************
ok: [localhost] =>
    msg: '14:51:14'

TASK [gather_facts] ************************************************************
ok: [localhost]

TASK [Has the freshly gathered value] ******************************************
ok: [localhost] =>
    msg: '14:53:41'

TASK [gather_facts] ************************************************************
ok: [localhost]

TASK [Still the fresher value] *************************************************
ok: [localhost] =>
    msg: '14:53:41'

TASK [gather_facts] ************************************************************
ok: [localhost]

TASK [Refreshed again] *********************************************************
ok: [localhost] =>
    msg: '14:53:43'

I stand corrected. I thought the point of cached facts wasn’t just to have it available for subsequent runs with gather_facts: false, but to improve the performance of gather_facts itself by skipping cached facts that are not expired.

1 Like

Well, that’s where we get into the question of configurations and what is meant by gather_facts. DEFAULT_GATHERING has three possible values.

- name: Play with gather_facts=true
  hosts: localhost
  gather_facts: true
  tasks:
    - debug:
        msg: DEFAULT_GATHERING={{ lookup('config', 'DEFAULT_GATHERING') }}

- name: Play without gather_facts
  hosts: localhost

With this playbook the default setting (implicit) gathers facts twice:

PLAY [Play with gather_facts=true] *********************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] =>
    msg: DEFAULT_GATHERING=implicit

PLAY [Play without gather_facts] ***********************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

explicit only gathers facts once:

PLAY [Play with gather_facts=true] *********************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] =>
    msg: DEFAULT_GATHERING=explicit

PLAY [Play without gather_facts] ***********************************************

and smart gathers facts zero times because of the cache:

PLAY [Play with gather_facts=true] *********************************************

TASK [debug] *******************************************************************
ok: [localhost] =>
    msg: DEFAULT_GATHERING=smart

PLAY [Play without gather_facts] ***********************************************

However, this configuration (and the cache state) only matters for the play-level setting. The gather_facts task does not have the same logic, it’s just another task.

PLAY [Play with gather_facts=true] *********************************************
TASK [debug] *******************************************************************
ok: [localhost] =>
    msg: '15:37:17'

TASK [gather_facts task] *******************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] =>
    msg: DEFAULT_GATHERING=smart

TASK [debug] *******************************************************************
ok: [localhost] =>
    msg: '15:38:07'
2 Likes

I marked one post as solution, but they all helped to get a better grip of the problem . Thanks to all !

3 Likes