Ansible best practices: idempotance

I am new to Ansible, and am trying to understand its best practices. My team uses Ansible for system setup and product deployment.

I am reading that idempotance is the foundational virtue, where playbook idempotence is defined as not doing any state changes on consecutive runs.
I am not sure I understand why.

The scenarios I am looking at:

  • Initial deployment of a system: before the playbook starts nothing is installed nor running. After playbook completes, everything is installed and is running.
  • Update of a system: before the playbook starts everything is installed and running. After playbook completes, everything is installed and is running. Some changes may have been applied.

A simple way to implement both scenarios in a single playbook:

  • State of the system

    • not running
    • uninstalled
  • State of the system

  • installed

  • running

The above implementation would require state changes on consecutive invocations.

For initial deployment, the first part is a no-change.

For system update, both parts play.

In order to implement these scenarios in a playbook that doesn’t do state changes on consecutive installations, the playbook would need to be aware of internal dependencies between things that can be in states {installed, uninstalled}, and things that can be in states { running, not running}

  • Install A
    • Notify Runnable-depends-on-A
  • Install B
    • Notify Runnable-depends-on-B

This approach makes Ansible playbook tightly coupled to system dependencies. Should these dependencies change, the playbook becomes yet another place that needs to be updated. Depending on the system implementation specifics, having dependencies specified incorrectly in Ansible playbook may cause a hard-to-reproduce bug.

Appreciate any thoughts on why avoiding state changes on consecutive playbook runs is preferred over simplicity and robustness in implementation.

I am new to Ansible, and am trying to understand its best practices. My
team uses Ansible for system setup and product deployment.

I am reading that idempotance is the foundational virtue, where playbook
idempotence is defined as not doing any state changes on consecutive runs.
I am not sure I understand why.

For me, the reason is simple - you want to be able to run ansible multiple
times on a given machine and be confident that the state it is left in
afterwards is the same every time.

Suppose you have a machine which does not have a web server installed on it,
and you have an ansible playbook which installs and configures the web server.

You want to be able to run that playbook once, and end up with a working web
server.

You want to be able to run that smae playbook again (with no changes) and have
the same working web server (with no changes) on the target machine.

This is partly because playbooks are often nested, so that you might have one
for a web server and another for a mail server; you modify the one for the
mail server but then run both of them; you want the modification you made to
the mail server playbook to take effect but you want the web server on the
target machine to stay just the way it was after the first run of the playbook.

Appreciate any thoughts on why avoiding state changes on consecutive
playbook runs is preferred over simplicity and robustness in
implementation.

Is that a quote from somewhere? Is there anywhere in either the ansible
documentation or any tutorials / online guidelines you have read which say
that "avoiding state changes on consecutive playbook runs is preferred over
simplicity and robustness in implementation"?

Idempotence (avoiding state changes on consecutive runs) is essential.

Simplicity, and robustness in implementation, are highly desirable.

Is there a conflict between the two?

Antony.

Anthony,
thank you for your response.

Would you mind addressing the scenarios and implementation options of
installing/updating the system?
I think there is a conflict between the virtue of avoiding state
changes on consecutive runs, and simplicity in implementation.

Thank you for the example of targeting multiple hosts with a single
playbook, with different results for each host.
A playbook that doesn't do state changes scales better when running
against multiple hosts.

Would you mind addressing the scenarios and implementation options of
installing/updating the system?

If you install, you want ansible to make changes (obviously, I think)

If you install a second time (with the same playbook) you want no further
changes to be made, because what needed to be installed has already been
instealled.

If you update, you want the system to be updated (again, pretty obvious).

If you update a second time (with the same playbook) you want no further
changes to be made, because the system has already been updated, nothing
further needs updating; the system should be left as it is.

I think there is a conflict between the virtue of avoiding state
changes on consecutive runs, and simplicity in implementation.

What do you think that conflict is?

What you would like to see, or what example can you give, as a "simple
implementation" which does not leave the target system in the same state after
every run?

Antony.

If I may chime in.

Ansible is a tool for configuration management which has been aptly been explained but I will explain further to give some detail . The second use case is orchestration.

If you program say with a script you tell it what to do. You have to control from the start to finish the logic what to do. It is in order.

When you use ansible you don’t have to program. You just say I want this thing to look like this and it will do it. But ansible can put things in order too in other words orchestration. This becomes powerful because it allows people to write what could be otherwise complex code into something that is concise.

If you install a second time (with the same playbook) you want no further
changes to be made, because what needed to be installed has already been
instealled.

If you update, you want the system to be updated (again, pretty obvious).

It is not trivial to distinguish between 'install a second time' and 'update'.
When the playbook starts, it is not known whether there are any
changes, and which specific changes have been made.

If you update a second time (with the same playbook) you want no further
changes to be made, because the system has already been updated, nothing
further needs updating; the system should be left as it is.

Installing for the first time and updating-in-place may not be identical.
Consider a system that involves a web server. On initial install the
server is started at the end, after all the apps are installed.
On subsequent update the server is already running. Should the web
server be reloaded? If apps are updated, then yes, if not, then not
necessarily.

Encoding the dependency for each app update to web server reload is
complexity of the system expressed in Ansible.
Not encoding the dependency for each update, and reloading the web
server in all cases is breaking the 'no state changes if nothing got
updated' requirement at least some of the time.

What you would like to see, or what example can you give, as a "simple
implementation" which does not leave the target system in the same state after
every run?

I think leaving the target system in the same state after every run is
essential.
The question is about whether state changes within the run are allowed.

Right it is becoming clearer where you are going with this. Ansible has a few modes to do a job run. If you want to see what is going to be applied you can use check mode. If you want to apply at a more convenient time then use run (which is the default)

>
> If you install a second time (with the same playbook) you want no further
> changes to be made, because what needed to be installed has already been
> installed.
>
> If you update, you want the system to be updated (again, pretty obvious).

It is not trivial to distinguish between 'install a second time' and
'update'. When the playbook starts, it is not known whether there are any
changes, and which specific changes have been made.

No, but it is obvious what the desired outcome is.

If the machine is already in that state, you do not want anything to change.

If the machine is not in that state, you want ansible to bring it into that
state.

> If you update a second time (with the same playbook) you want no further
> changes to be made, because the system has already been updated, nothing
> further needs updating; the system should be left as it is.

Installing for the first time and updating-in-place may not be identical.
Consider a system that involves a web server. On initial install the
server is started at the end, after all the apps are installed.
On subsequent update the server is already running. Should the web
server be reloaded? If apps are updated, then yes, if not, then not
necessarily.

Reloading a web server is (in my opinion) not a state change.

Reconfiguring it would be.

Encoding the dependency for each app update to web server reload is
complexity of the system expressed in Ansible.
Not encoding the dependency for each update, and reloading the web
server in all cases is breaking the 'no state changes if nothing got
updated' requirement at least some of the time.

What's your definition of a state change?

> What you would like to see, or what example can you give, as a "simple
> implementation" which does not leave the target system in the same state
> after every run?

I think leaving the target system in the same state after every run is
essential.

Agreed.

The question is about whether state changes within the run are allowed.

Let's talk about what a "state change" is then :slight_smile:

...although not with me further for now - it's now midnight in my personal
timezone...

Antony.

Wei-Yen,

yes, this is exactly what I am struggling with.

Defining a state is not simple.
At the base level Ansible provides serviced module that recognizes
started, stopped, restarted, reloaded states.
When a service is deployed as part of a larger system, the state space
is different:
- started at version X
- started at version Y
- stopped
- restarted at version X
- restarted at version Y
- etc

The versions are managed by other tasks in the playbook that are
performed in specified sequence.
Depending on at what point in the playbook execution and whether this
task is run, the service may end up in either version X or version Y.

serviced:
   state:
      - restarted

Driving X to Y change is powerful, but seems far from concise.

Why do yoy have started twice? I don’t think I am following properly. Can you please be more specific?

In my playbooks I have never made That complicated.

If I change a state I create a handler to restart the service. (Handler is actions that happen after a state change).

Desired state is defined as { system is installed, services are running }

On initial deployment, the initial state is that nothing is installed
and nothing is loaded.
So playbook installs the system, and starts the services. All good.

On second run, the initial state is that some version of the system is
installed, and services are running.
So playbook checks for changes, installs updates if needed. Should
playbook restart the services? They are already running, but possibly
with old versions.

That’s what handlers are for. If the app has not changed the app will not be reloaded

I can see where the confusion is.

link i am referring to is here Handlers: running operations on change — Ansible Documentation

But for tldr version I have taken snippets from it.

Right.
The app consists of multiple artifacts, that are updated independently.
There are also several services that depend on multiple artifacts.

Consider:
service A and service B depend on artifact1
service A depends on artifact 2
service C depends on artifact1 and artifact3

By restarting each service from a handler, that is notified by
per-artifact update task, this dependency structure is re-implemented
in Ansible.
That's complexity and fragility.

Application should be able to change its dependency structure, for
example, so that
service C now depends on artifacts1, 2 and 3, and ansible playbook
should not have to change. Or at least fail in an obvious way.

Instead, with handler implementation, service C will not be restarted
when artifact 2 is updated, and will continue running its older
version.

You can create data-driven control flow. As a hint, see the role
https://github.com/vbotka/ansible-config-light

You can dynamically create handlers, e.g.
https://ansible-config-light.readthedocs.io/en/latest/guide-variables-handlers.html#handlers

and configure what handlers shall be notified, e.g.
https://ansible-config-light.readthedocs.io/en/latest/guide-variables-files-lineinfile.html#lineinfile

The role will collect the data and create the handlers. See
https://ansible-config-light.readthedocs.io/en/latest/guide.html#setup

So just for clarity,@Vladimir, it looks like your role is dynamically generating handlers based on what the configuration is set?

Right. And the configuration of the files comprises the lists of the
handlers that shall be notified. See the structures of copy,
template, ... , ini_file.
https://ansible-config-light.readthedocs.io/en/latest/guide-variables-files.html#files