It is sometimes nice to be able to run actions on hosts that previously encountered failures.
Now, in 1.2 (just now), you get a message like this when you have failures:
PLAY RECAP ********************************************************************
to rerun against failed hosts, use -i /etc/ansible/.foo.yml.retry
a.example.com : ok=2 changed=0 unreachable=0 failed=0
b.example.com : ok=1 changed=0 unreachable=0 failed=1
…
Want to target just those hosts when re-running the same playbook or doing something else?
You can.
ansible all -a “/sbin/reboot” -i /etc/ansible/.foo.yml.retry
The filename used for the retry file is predictable, it’s always derived from the name of the playbook and is put in your inventory directory (so group_vars and host_vars still work as expected).
Note: you may ask why it didn’t just pass “–limit” in instead of using “-i”, well, it could, but what if --limit was already set? Also, the number of hosts in the limit could get very large, so I didn’t like the idea of having to pass in 500 hosts via --limit since that command would look rather ugly. Also using “-i” means you can edit the inventory if you really really want to.
Minor caveat: you must have permissions to write to your inventory directory for this feature to be used.
I hope this allows for some very easy re-runs on playbook content on failed hosts, as well as some new use cases of other varieties.
There may be some minor kinks in this (maybe around child groups or something) but it seems pretty reasonable. Let me know if you encounter any problems, or have questions.
–Michael