fact caching refreshing?

Hi,

Been making good usage of smart fact gathering, as sometimes we have 30-40 plays grouped together and find that needing to gather facts for each one is a bit time consuming. We’ve been combining this with fact caching as we’ve some custom facts that are useful to be preserved between runs (I’ll use a different email to cover them as it might be something of interest to put together a better POC to be shared on why we doing that), so timeout is

But in turn, found that really we’d like to ensure that on the start of each run overall run, that the setup facts are refreshed.

For now just added an explicit call to ‘setup’ module at the various plays that would be run directly with a registered variable and when condition to skip running it again if it’s already been done during this execution run.

If didn’t have the custom facts that were useful to be preserved, the memory caching would have solved this, so I’m unsure if this is really a corner case or just that trying to do something at the current limits and we could improve things to cover this as well.

So I was thinking, that at the start of a series of combined plays the ‘smart’ behaviour with a fact store such as jsonfile or redis might be a tad more handy if it fact gathering had a ‘refresh’ mode as well?

So that can always be certain that the run starts with the latest facts from the system, but can benefit from the smart behaviour during the middle where we combine multiple plays rather than needing to be overly explicit with each play to skip gathering, and remember to switch the settings around someone moves code around to add a different play at the start (people forget…).

With the general refactoring for V2, it seems like this would be trivial, and I will probably have a PR for it shortly, but thought I’d start getting used to discussing things on the mailing list anyway as it’s probably easier for questions to be asked/answered here and reference that in the PR.

(btw, it looks like a two line change to add support for this)

in 2.1 you’ll have:

meta: clear_facts

in 2.1 you’ll have:

meta: clear_facts

From the changelog sounds like it would clear out all facts, including any custom ones, when we would just want those gathered by the setup module to be wiped or refreshed.

We don’t keep track of the origin of what goes into ansible_facts.

Which is not what I was suggesting with the 'refresh' idea.

I'm somewhat confused, where you suggesting 'meta: clear_facts' as an
alternative, or as the place where the 'refresh' idea should be added?

Rather than trying to maintain where facts come from, I was suggesting that
if 'refresh' is set, I want what ever is normally gathered to be
re-gathered, which would refresh that set of facts without discarding any
of the the additional facts that were previously returned by other modules
and saved to the fact cache.

The 'clear_facts' would be useful in some cases for us as well, just
slightly different, as it would wipe out everything all the time which
isn't desirable.

https://github.com/electrofelix/ansible/commit/ed7fef10da00e36835e89f4954b8132d28b6e183
looks like it would be sufficient, I just haven't tested it properly yet.

You just seem to have re implemented smart gathering. If a host has been scanned it should not be scanned again unless you ask for it explicitly.

Sure, if using the memory based fact cache:
refresh == smart

However when using redis, jsonfile, etc:
refresh will regather for the first play even if the cache is not expired, without discarding other custom facts added outside of the setup module.

The flush cache will wipe out those additional custom facts that were saved previously and are not generated by the setup module.

Right now we have to make sure setup is explicitly called as the first task in each list of plays (have 30+ plays run as one playbook).

I’ve just realised I should have been emailing ansible-devel not ansible-project with this :frowning:

Sorry, will try to use the correct group in future.