New 1.2 feature, retrying only hosts with failures

It is sometimes nice to be able to run actions on hosts that previously encountered failures.

Now, in 1.2 (just now), you get a message like this when you have failures:

PLAY RECAP ********************************************************************
to rerun against failed hosts, use -i /etc/ansible/.foo.yml.retry

a.example.com : ok=2 changed=0 unreachable=0 failed=0

b.example.com : ok=1 changed=0 unreachable=0 failed=1

Want to target just those hosts when re-running the same playbook or doing something else?

You can.

ansible all -a “/sbin/reboot” -i /etc/ansible/.foo.yml.retry

The filename used for the retry file is predictable, it’s always derived from the name of the playbook and is put in your inventory directory (so group_vars and host_vars still work as expected).

Note: you may ask why it didn’t just pass “–limit” in instead of using “-i”, well, it could, but what if --limit was already set? Also, the number of hosts in the limit could get very large, so I didn’t like the idea of having to pass in 500 hosts via --limit since that command would look rather ugly. Also using “-i” means you can edit the inventory if you really really want to.

Minor caveat: you must have permissions to write to your inventory directory for this feature to be used.

I hope this allows for some very easy re-runs on playbook content on failed hosts, as well as some new use cases of other varieties.

There may be some minor kinks in this (maybe around child groups or something) but it seems pretty reasonable. Let me know if you encounter any problems, or have questions.

–Michael

Very cool, but need write perms to the inventory directory is kinda gross & wouldn’t work with many setups (I suspect) - any technical barriers to moving that to something like /var/tmp/ansible/?

I thought of a newer implemntation idea over breakfast.

–limit @limitfile

where we let --limit get loaded from a file, and we just write the lists of failed hosts.

I’ll make it work like that.

Wow, thanks!

Hmm… I think this idea might fit better with ansible-commander,
but what about having a way to re-run with this inventory after the playbook is done?

Sorry for the late response.

I’m not sure you would want to automatically retry enough for ansible or acom to have a way to make that automatic, however, you could key off the return code for ansible if you wanted to do this.

I think it would be better to support retries on individual tasks though if that is what you were going for?

Maybe I don’t understand the use case fully.

I agree with this. /etc is for configuration files only, if following the standard. Better to put it in /tmp or /var/tmp.