iterate over "failed" hosts?

I’ve got an interesting case: playbook is failing on certain hosts either due to SSH connection issues or because some things are not installed.

I realize that I should really just write a playbook that makes all my machines “compliant” with the playbook and then re-run it. However I am curious whether there is a way to do a “group_by” or something similar to pick up all the “failed” hosts, group them by cause and then run some actions against them: be it an addition of a record to DB indicating above failure or attempt at resolving an underlying problem.

BTW I did read http://docs.ansible.com/developing_api.html and http://jpmens.net/2012/12/13/obtaining-remote-data-with-ansible-s-api/ that provide fine examples of doing it via API code. I’m more curious whether I can sneak something in through the playbook…

When Ansible playbooks fail, Ansible generates a retry file to limit a playbook run to just failed hosts.

You can use this file to just target those specific hosts.

I was thinking more “inline” kind of action. While I have all the facts/vars loaded for the entire fleet (in case I need them). Kind of like “branch off to fix things that I know how to fix…” But I guess this goes more and more into the “use Ansible via API” kind of territory?