Dealing with inconsistencies after failed runs

Hello all,

I have been running on occasion into situations where an incomplete run
could leave a system in inconsistent state. This has mostly happened in
cases where handlers are in use.

As an example, let's say you have a task populating /etc/aliases for
Postfix. In order for Postfix to be able to use the data, /etc/aliases
has to be compiled using the newaliases command. This is a natural fit
for a single Ansible task + handler (this is a very simple case, I've
had some more complex cases too).

The problem stems from the fact that Ansible playbook run could fail
before the handler fires off - leaving the /etc/aliases file updated,
but not the /etc/aliases.db (created via newaliases command).

I am aware of the "force_handlers = True" option, however that won't
help in cases where failure is, say, due to SSH connection dropping
etc.

From what I can tell, this scenario possibly happens exclusively when
using handlers, since those are rather stateless (I'm sure you could
also trigger such behaviour with poorly designed command/shell
combination of tasks).

How do folks usually handle such situations?

Best regards

I think you want special handlers, that started directly after task, not at the the end of all tasks. But I don’t know about this handlers in Ansible.
You can use “register: var1” and “when: var1.changed” in next task.

Would a

  - meta: flush_handlers

task, immediately after the copy task (or whatever populates /etc/aliases,
and does the notify), do what you want in this case? It still wouldn't
help if the SSH connection dies immediately after the copy task, but
there's not a whole lot you *can* do in that case. If it's important
enough, maybe come up with a single shell task that populates the file and
runs the handler, so the whole thing will happen even if SSH dies after
the task, but that seems ugly unless it's essential that a particular
task+handler combo be as close to atomic as possible.

                                      -Josh (jbs@care.com)

(apologies for the automatic corporate disclaimer that follows)

This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.

Hm... Well, would make the window of opportunity smaller, so possibly
one way to about it. I wonder how people out there handle this when
using a lot of VMs (my use-case will be under 10).

Another thing that popped to my mind just now is to have a set of
tagged tasks within a role that would not execute unless tag is
provided that duplicate any handler/task that may need to be re-run to
fix possible inconsistencies. That way in case I detect a failed run, I
could rerun those tasks to fix things up. Any opinions on this? I just
wonder if I could abuse tags in this way, though (i.e. having a when:
tag defined syntax).

Best regards