In our environment, we try to be tolerant of systems in our inventory that are unreachable, as with an inventory of our size we constantly have new ones going MIA. In our deployments we really care about errors on the hosts we CAN reach. To that end we put in a PR to get ansible(-playbook) to exit with different return codes when hosts had errors vs just having unreachable hosts.
Things were great until today, when another engineer put in a change to make use of the raw module (to start with the assumption of NOT having python on the remote system, so raw was used to install python). The change worked fine in a smaller environment where the entire inventory was reachable, but when we hit a larger set with some unreachable hosts we started seeing failed=1 returns, rather than unreachable=1. This made ansible exit with an error, which stopped further jobs in the build flow.
Short term, we've backed out the use of raw, but long term I wanted to bring this up here, to see if there is value in having raw distinguish between unreachable or not, or if this is a known issue with an impossible solution.
Thoughts?
-jlk