Fact caching (work in progress) infrastructure now on development branch

I merged Brian Coca’s branch and tweaked it a little.

As a result, fact caching can now (in abstract) be configured by a “cache_plugin” option in ansible.cfg.

The default plugin is “bypass”, which is not a plugin, and yields the in-memory caching that Ansible uses now.

An implementation of in memory caching that uses the plugin (for test purposes) is called “memory” and can also be tested with ANSIBLE_CACHE_PLUGIN=memory as an environment variable, but this is silly and not something that would be used in production. From basic testing the in-memory version is somewhere between 25-50% slower than the “bypass” version that uses the native “dict” class directly when running unit tests, however, I don’t think this is terribly important.

The next step is to develop a very basic pickle or BSD or sqlite or other implementation, as well as some larger database variants.

I suspect we may overhaul the architecture a little as this was a quick stab at tweaking what Brian did earlier.

What I have done is make the plugin subclass dict, but not require the plugin itself to subclass dict, but only to define some serializer methods.

In some cases, it may make sense to move more things into the plugin, which we can do as we work on things.

https://github.com/ansible/ansible/blob/devel/lib/ansible/cache/init.py

(notice some slight hacks to increase OO efficiency a bit)

https://github.com/ansible/ansible/blob/devel/lib/ansible/cache/memory.py

(the no-op memory implementation)

For those interested in the “schema”, per se, the keys are always the names of the host records, the values are always dictionaries containing the facts for those hosts, so there’s really very little to it.

It would translate very very easily to storage in things like Riak, Redis, MongoDB, etc.

Additional plugins and tuning are quite welcome.

Hi all,

Quick update – I’ve found out how to make ‘memory’ more performant (just as fast as ‘bypass’) so ‘bypass’ is no longer a thing, plugins are now always used.

It may be true that fact caching plugins may want their own in memory caches in some cases (such as being more for retrieval purposes and the datastore is slowish), which we’ll ferret out as more are developed. If so, they can be implemented inside the plugin.

The next step is to make one for an actual datastore :slight_smile:

This work, specifically commit 3ba0ea064df78387f0898a680c6936bd14a20d9c appears to have broken one of my task combos that gets used in a lot of my playbooks.

In one task include, a directory list is done and the data is stored in a register variable. This is later passed through to another task include so that it can be optionally used in a task within. The data isn't getting passed through, the variable is being detected as undefined.

In fact, if I just simplify matters and do a task with a register: followed immediately by a debug that tries to print that register variable I get an error.

bisecting on one of my patch branches from yesterday actually shows 3ca9b1123ba29f550e0307ef1222c15b7edf163f to be a source of the issue. It could be that the commit 3ba0ea turns on 3ca9b somehow.

-jlk

I am actually working on a fix for this currently. A pull request is coming shortly.

The pull request has been submitted at https://github.com/ansible/ansible/pull/5534

Unfortunately this doesn't resolve the test case I have (as chatted about on IRC)

It seems to not fix the ‘memory’ plugin, but did fix the plugin I was attempting to implement. I’m still looking.

The above commit looks wrong to me as it no longer calls the plugin’s getitem.

I’ll investigate but I was working fine with register earlier in my tests.

I’ve reverted all of this for now, looks like we’ll want to step back and refactor some things before tackling this.

So my original push was trying to preserve the existing (per run) dict as much as possible, but from yesterday’s updates I gather that the project goals seem to have expanded to remove the current dict based cache and make the default an in memory cache plugin, while still keeping dict access semantics. I’ve got a couple of thoughts on how to generalize the plugin better for this case, I’ll probably be able to get to some actual code on Friday/weekend.

For persistent caches, the set_fact module now introduces an ‘interesting’ dilemmas:

  • do we persist them across runs?

  • do we force them to always use the memory cache object?

My initial draft used all enabled plugins in order (inefficient) but good for testing multiple at a time, my intention was to either have just 1 passed in configuration or at most a short list.

Considering the move of the in memory cache into a plugin I think we need to always have cache.memory enabled and use a 2nd or more cache plugins when memory itself fails to produce a result (chaining plugins?).

  • Is there a case that we don’t want memory cache and always want to hit cache?

I’ll keep musing over this a bit more, direction and thoughts are appreciated

Just wanted to post this here, since I didn’t see any mention until browsing the docs themselves (http://docs.ansible.com/playbooks_variables.html#fact-caching) and reading through https://github.com/ansible/ansible/pull/8203, it is now possible to use fact caching by adding the following to ansible.cfg:

[defaults]
fact_caching = redis
fact_caching_timeout = 86400 # seconds

And installing redis (example for RHEL):

yum install redis
service redis start
pip install redis

As noted in the docs, this is new in 1.8, and in ‘beta’ state. But it’s there!