I’ve got an interesting case: playbook is failing on certain hosts either due to SSH connection issues or because some things are not installed.
I realize that I should really just write a playbook that makes all my machines “compliant” with the playbook and then re-run it. However I am curious whether there is a way to do a “group_by” or something similar to pick up all the “failed” hosts, group them by cause and then run some actions against them: be it an addition of a record to DB indicating above failure or attempt at resolving an underlying problem.
I was thinking more “inline” kind of action. While I have all the facts/vars loaded for the entire fleet (in case I need them). Kind of like “branch off to fix things that I know how to fix…” But I guess this goes more and more into the “use Ansible via API” kind of territory?