When should modules use/return `ansible_facts`?

Modules can return regular return values as well as ansible_facts, which automatically appear in the global namespace (unless you set INJECT_FACTS_AS_VARS=false - though doing that will likely break a lot of playbooks and roles). Most modules returning ansible_facts have the suffix _facts in the module name, though there are exceptions (for example ansible.builtin.hostname, which updates several hostname-related facts).

Right now the only guidance we give on whether a module should return ansible_facts is here: Developing modules ā€” Ansible Community Documentation

Only use ansible_facts for information that is specific to the host machine, for example network interfaces and their configuration, which operating system and which programs are installed.

I want to start a discussion to elaborate on that, and I would even suggest to limit the use of ansible_facts even further, and basically dissuade its use unless some very explicit exceptions.

The reason I want to start this discussion now is that thereā€™s a new PR in community.general for adding a systemd_facts module, which I personally think should be an _info module since the returned values arenā€™t ā€œproperā€ facts: first you can limit the set of systemd units you want infos on, and second you can specify additional properties to query. Therefore the facts the module returns depend not only on the system (and on the time of querying), but also on the module settings.

Obviously all facts modules somehow depend on the time of invocation - if you list the disks in a system, someone might have pulled one out or put a new one in by the time your role/playbook uses the fact. (Plug and play, yay!) The same is true for mount facts (ansible.builtin.mount_facts) - a task inbetween might have mounted, remounted, or unmounted something - and package facts (ansible.builtin.package_facts) - a task inbetween might have installed, upgraded, or removed a package. Also the facts output of modules like ansible.builtin.mount_facts depends on module options.

Iā€™m still not sure what to think about facts like installed packages, mounts, and services, but generally I would say anything where facts retrieval can be configured (so that the values of the facts depend on these options) should very likely not facts, but _info modules. community.general has its share of examples:

(I think these modules should be fixed, but itā€™s better to wait until data tagging is there before we start deprecating something, since then we can finally properly deprecate return values, including facts :slight_smile: )

The PR also falls into this category from my point of view: depending on the module options you get different output (the facts contain or not contain certain units, and some properties might be there or not).

WDYT?

2 Likes

I think that means basically no modules are allowed to return facts, because ansible.builtin.setup also gives you different output depending on the module options.

As far as I can see it ansible.builtin.setup only allows to determine which facts are returned, and has no options to change the values of returned facts based on module parameters.

ansible.builtin.setup is the filter for gather_facts. It returns either the full set of facts or only those that match a specified query. As a result, the returned facts may vary depending on the selection.

On the other hand, ansible.builtin.mount_facts adjusts its output based on the specified source or allows filtering of the results. Additionally, the collected facts may differ depending on the OS distribution.

The issue is not about which data is collected or included in the results but rather whether these facts should be added to ansible_facts.

For example, in the module from my PR, itā€™s not feasible to collect every single property of every systemd unit because properties vary based on the unit type, and some properties may be missed or duplicated ( some units have more than 200 properties. ). This makes it impossible to create a module that gathers all data while allowing filtering through a single option.

The only viable approach is to give users the ability to specify which properties they want to collect and for which units.

Should this data be added to ansible_facts, or should it be managed using the register function?

1 Like

I have considered all of @felixfontein comments and suggestions on my PR and in this post.
After conducting some tests and discussing with colleagues, I realized that returning values in ansible_facts was not the right choice, at least not in my case.

Modules should return values in ansible_facts only when those values are immutable or at least partially stable.

Returning values that might change during the playbook execution can lead to unintended effects and make value management more difficult rather than easier.

For this reason, I decided to close my PR to avoid causing issues for both the repository and the PR itself. Instead, I have opened a new one under the name systemd_info.

However, I believe this concept should be more explicitly stated in the documentation, or at the very least, there should be a clear guideline to follow.

Additionally, existing modules should be updated accordingly, though I understand that would be a long-term effort.

2 Likes

It doesnā€™t, but it does support optionally filtering unwanted mount types/devices. Itā€™s based off the logic in setup for gathering mounts, which by comparison is inconsistent across OSes and has hardcoded limitations instead of configuration options.

There are a lot of default connection facts modules that support configuration options, some of which impact the values returned Ansible Configuration Settings ā€” Ansible Community Documentation. The gather_facts action plugin allows configuration for those modules and any configured FACTS_MODULES via module_defaults.

After 2.19 (if fixes and updates to inventory intro by bcoca Ā· Pull Request #2416 Ā· ansible/ansible-documentation Ā· GitHub is merged aka ā€œdata taggingā€), we plan do deprecate ā€˜injectā€™ functionality, so all facts will end up living under ansible_facts (with exceptions, like ansible_local).

While module authors can chose to do whatever they want, the distinction was intended to be that _facts are things inherit to the target machine, mostly that rarely change (hence we allow caching this info) and _info was information external to the target itself (for example from AWS account) but somewhat related.

I also plan in the future to remove/break up setup into many modules, since gather_facts already allows to use multiple different modules (in series or parallel) to customize the ā€˜gathered factsā€™ more to each context rather than having a huge and very costly unitary (setup) module that most people donā€™t need.

So IMHO (core has not arrived at a consensus on this) I would allow modules to return facts or not as author sees fit. The categories for _facts and _info as i describe above are already not followed in some cases, but enforcing that or not is another discussion.

In general I would keep _facts for things that are not prone to change, expensive to query and benefit a lot from caching.

2 Likes

service_facts already exists and will stay as _facts as listing existing services and state should be a mostly ā€˜stableā€™ information in most cases. Yes, it can change during a playbook, but that is most things, services in general are expected to be a stable list and stable status, there are contexts where this is not true, but I believe that is the minority right now.

Iā€™m not sure how deep you are going with systemd but I can see it bluring the line easily since it spans soo many more things than ā€˜servicesā€™ at this point, if you just handle services and time/triggered units, this should be fine as _facts (IMHO), but can easily be extended to the point that caching this information would be meaningless.

The systemd_info module was created based on the idea of service_facts.

service_facts is specifically dedicated to services because it was designed with that focus in mind. However, in my opinion, it is not capable of keeping up with the full capabilities of systemd.

The data collection mechanism for systemd units in service_facts is overly simplified, which makes the results unreliable.

The current service_facts module returns fairly standardized values by determining property states and mapping them to basic statuses like stopped, running, and a few others.
For example, the state values do not accurately reflect the actual SubState property. As a result, it cannot properly report states like failed or exited, since it only relies on the state return value.

In my opinion, systemd units are more complex and deserve a dedicated module.

The systemd_info module was created to allow users to collect unit properties in a more precise and efficient way. Currently, it supports .service, .target, .socket, and .mount units.

For now, this module will remain an _info and will return values using register.

1 Like

I donā€™t think this should be enforced. Modules can do whatever they want. But we can have better / more strict guidelines. Folks donā€™t have to adhere to them in general, but we can enforce them in collections like community.general. (Which is mainly what Iā€™m interested in - having more strict guidelines that everyone can use, and that are used for community.general and some other collections.)

I think that is something that would make a good guideline :slight_smile:

1 Like