Random undefined variables (facts) since ansible 2.0 upgrade

Some ansible playbook runs will work on most hosts, but 1-3 will fail with an undefined variable that is an ansible fact. Run it a second time and it works.
Example:

fatal: [wdv-sitefinity1]: FAILED! => {“failed”: true, “msg”: “ERROR! The conditional check ‘ansible_os_family == ‘Windows’’ failed. The error was: ERROR! error while evaluating conditional (ansible_os_family == ‘Windows’): ERROR! ‘ansible_os_family’ is undefined\n\nThe error appears to have been in ‘/home/icansible/ansible/roles/newrelic/tasks/main.yml’: line 11, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: create temp directory\n ^ here\n”}

This is still occurring during our full site.yml run, but it’s so far impossible to replicate in a small example. It only occurs on our full site.yml run with no limits, any attempt to limit to certain hosts or tags result in no errors.

I had something like it, not on Windows though,at the time the problem was that I set it not to gather_facts, when I actually tried to use a few facts later on. May not be the same but may be worth checking…

Alex

In our case, the facts are being gathered, and used other plays included in site.yml, but later on in a subsequent play the a fact that was defined previously is suddenly undefined, on some random hosts, but not all. It’s happening on windows and linux hosts.

Do you have fact gathering set to smart in your ansible.cfg?

We do, and have been looking at that. The facts for all hosts are getting gathered/refreshed during the first included playbook in site.yml, and the cache is not supposed to be expiring for 3 hours, with our entire run taking less than 2 hours. The cache files are present on disk, etcetera.

We did launch a site.yml run with fact caching commented out of ansible.cfg, waiting to see how that run works.

Turning off fact caching completely did not improve the situation.

Spoke too soon, turning off fact caching fixed or worked around the problem. Opened a bug report
https://github.com/ansible/ansible/issues/14456