I understand that the title may not be clear so let me explain.
Say I have workflow with 3 nodes.
First Node - conducts some basic health checks on a server
Second Node - Installs a piece of software to the host if the First Node was successful
Third Node - Runs some tests on the host and notifies of success fail.
The idea being I am breaking my playbooks into logical reusable nodes.
Now If I have one server this makes sense and works fine.
However say I have 5 hosts in my inventory and the dependencies install correctly on 4 out of 5 of them, I want to continue my workflow for 4 of them.
We might not have captured your intent exactly. Someone talked about re-running a single node until success (I think this might make more sense to define in the playbook).
The use case that you’re describing is a coherent feature request. There’s nothing different design-wise from the retry-on-failed feature except that the new limit is passed to a different job template. One challenge is that a lot of “guard rails” would need to be in place, because this assumes that all the nodes are operating on the same inventory, and all leaves in that entire branch would have to be constrained to that inventory as well.