My scenario is that I have a hundreds of locations on various different networks that I want to run some plays against. In almost all circumstances I will have 2-3% of these sites that I cannot connect to due to network related issues. I don’t want to create exceptions/errors in AWX for these sites that just happen to be offline while we are running the play, they’ll get picked up again next time.
The behaviour I would like is that when a host is deemed as unreachable,
any future tasks in the play against that host will not be run (no point as they will all fail as host unreachable).
the playbook will continue run all tasks against the other hosts that are reachable
the playbook run will report success (return code = 0)
I was thinking of using the ignore_unreachable flag which seems to mostly do what I want, however based on the documentation and the comments in a issue I just created it seems as though the ignore_unreachable flag is not aligned with my goals, the opposite in fact.
Is there another flag/option that might give me something closer to what I am looking for?
So what you are asking would be the 'default' way ansible operates, it
removes 'unreachable' hosts from the rest of the play and then
continues with the rest of the hosts.
- any future tasks in the play against that host will not be run (no point as they will all fail as host unreachable).
this is the default
- the playbook will continue run all tasks against the other hosts that are reachable
also the default
- the playbook run will report success (return code = 0)
this is not the default, but you can have a `meta:
clear_host_errors` as your last task .. but this might be too big of a
hammer.
There are some cases in which the above is not true, for example,
using serial, if all hosts in a 'serial batch' fail (unreachable
counts) then the whole play fails, you also have max_fail_percentage
to manage how many failures you tolerate.
thanks for the reply Brian. Yes, it sounds like clear_host_errors i think will be too big of a hammer, I just want to ignore unreachable.
I’ll have to figure something else out then, have a couple other scenarios in mind anyways, gonna also look at
something like https://github.com/openstack/ara to have better logging/reporting on playbook runs so I can more easily find the failed hosts and rerun.
ultimately just looking for a nice clean way to monitor and re-run failed executions, as well as easily distinguish things like connection failures vs actual failures where you know that running it again when the connection is up will work. I don’t find AWX gives me a good enough view into this and am looking for a better overall strategy.
hi Brian, I was wondering if there is anything else you can suggest to me. I want to report a successful ansible run when the only thing that failed were unreachable hosts so that it returns success back to AWX.
Is there a callback plugin or some local customization that I can write in the meantime and then contribute back to Ansible core?
Or perhaps some sort of preprocessor that runs through the inventory and removes unreachable/down hosts?
I am interested in two things:
any type of quick workaround. I tried the suggestion you mentioned with regards to the meta_clear_host_errors, but this didn’t work. I posted the output below. Even if it did work, I would be concerned that it would clear other errors other than unreachable.
What is the correct long term solution (enhancement to ansible) to this problem, if you agree this would be useful and can provide some guidance of a solution you support and I can also work on a PR to add to the the core if it’s not too complex. I think this would be useful.
`
$ ansible-playbook -i inventories.unreachable/ unreachable2.yml ; echo “return code from run is: $?”
PLAY [unreachable test] ********************************
TASK [Gathering Facts] *******************************
ok: [host_online]
fatal: [host_unreachable]: UNREACHABLE! => {“changed”: false, “msg”: “Failed to connect to the host via ssh: ssh: connect to host 10.20.3.21 port 22: Connection timed out\r\n”, “unreachable”: true}
TASK [success1] **************************
fatal: [host_unreachable]: UNREACHABLE! => {“changed”: false, “msg”: “Failed to connect to the host via ssh: ssh: connect to host 10.20.3.21 port 22: Connection timed out\r\n”, “unreachable”: true}
to retry, use: --limit @/homenfs/aedwards/unreachable2.retry
I’m thinking about going down this road, if anyone has any better ideas, please share. it seems as though it will work fine, just not sure if there is a better/simpler/more appropriate solution.