Fact Caching Revisited

Greetings,

I know it has been attempted before and is still slated for the future, but I recently needed fact caching in my personal use of Ansible. I leveraged the work that was already done to fix the bugs that were present and complete a handful of working caching backends: redis, memcached, and a simple file backend. I have been using them in my environments for a couple of weeks now (mostly redis, but testing the others as well), and haven’t had any issues. I am still extremely new to Ansible, and basically only have enough knowledge of the internals to implement the aforementioned functionality. That said, I figured I’d re-open discussion on this topic here before submitting a pull request. I’ve included a link here and below to a feature branch diff against the devel branch for review. Things of note:

  1. Only SETUP_CACHE is leveraging caching backends. VARS_CACHE is untouched as I’m not quite sure I understand the use-case behind caching play variables between playbook executions.
  2. Caching backends have a base class they should extend to ensure the API is properly implemented. All the heavy lifting is done by each caching backend.
  3. Given the existing usage of SETUP_CACHE (eg: dictionary based access), caching backends must be able to return the keys that are being held in cache. There are various ways of doing that can be seen in the diff. Redis is perhaps the most interesting and optimal since it allows usage of sorted sets.
  4. All unit tests pass and the sample playbooks noted as issues in the previous threads are not present. I haven’t had time recently to do so, but I’ll work on running the integration tests as well.
    Hopefully I’m not encroaching on any plans of major refactoring for fact caching since I know it’s been in the pipeline for awhile. I don’t have any strong opinions on the matter, but I figured that I would make what I’ve done available in the event it might be useful.

Diff for Fact Caching Feature Branch: https://github.com/joshdrake/ansible/compare/feature/fact_caching?expand=1&w=1

While I appreciate the interest, fact caching will need to have very rigid design requirements so we are unlikely to take a pull request on it at this time.

Ultimately I see this happening as a combination of a callback plugin to intercept facts, and a vars plugin to provide them.

And it will need to be optimized for database usage.

To elaborate: the vars_plugin theoretically could be returned via a specialized inventory plugin. It’s more efficient to do so for small numbers of hosts. But there are dangers - lots of accesses, if not lazily done, could bog down the system immensely, and make it intractable for large numbers of hosts.

I’m worried about that.

If we have a playbook of 10,000 systems, and we have 50 tasks in that playbook, and -f 200, how does Redis hold up, etc.

Anyway, more of a topic for ansible-devel really.

Let me reverse my earlier logic here - with something like Redis, this is probably 100% fine.

We need to put this in queue, so please send us a PR.

We can test it out to see how it does, databases may be hard, but Redis is not.

Let me know and we can put this through it’s paces.

I think with the initial there may be cache invalidation logic that needs overhaul, so we’ll have to just be really careful about it.

Ergh, meant to submit this to ansible-devel, sorry about that. Anyway, I definitely agree that some potential backends (eg: SQL database) might not be suited for the type of workload here, but that redis should perform well in any conceivable use-case. I’ve submitted a pull request #8203 (https://github.com/ansible/ansible/pull/8203) for continuing the discussion.

Thanks, this should be pretty easy to test out and benchmark.

I’m classifying this as P2 so we can get it some attention earlier.