Implement a rollback feature when creating Cloud resources

jdlmrh · July 24, 2025, 12:05pm

I use Ansible to create and configure many Cloud resources (VM, VPC, SecurityGroups, Volumes…). Compared to Terraform, Ansible does not offer an easy way to delete the created resources. So I started to create a “callback plugin” that generates a Playbook that can destroy the resources created by another Playbook. This plugin is available on GitHub and on Galaxy sites (see links below). But I’m wondering : couldn’t it be possible to include this “rollback” capability directly in the modules like “amazon.aws." or "google.coud.”, etc…

GitHub link : https://github.com/majeinfo/ansible_rollback_plugin
Galaxy link : https://galaxy.ansible.com/ui/repo/published/majeinfo/resource_cleaner/

markuman · July 24, 2025, 1:23pm

The problem is, after you’ve successfully created a security group, the afterwards failing EC2 module doesn’t know anything about the previous resources (the security group in this case) that needs to be removed/rolled back.

I like the idea of the rollback plugin and I will try it out.
But I guess it’s only suitable in isolated projects (afaiu, that’s the way the terraform people like to work).
In a larger environment with shared ressources, it can accidentally remove resources that might be still needed.

But I get your point and I’ve tried an audience for that long time ago

At work, I start building roles, that does exactly that. E.g. deploy an EC2 instance and also remove it.
So what it does is

collect requirements (VPC ID, Subnet IDs …because we’re stateless)
create security group
deploy instance
set DNS A record and PTR record

Deploying is done via tag --tags ec2.deploy
And when using --tags ec2.destroy it reverses the order in injects the state: absent parameter value.

collect requirements
remove instance
remove security group
remove dns records

- name: deploy windows ec2 instance
  tags:
    - windows_ec2
    - windows_ec2.deploy
  block:
    - name: deploy windows ec2 instance
      with_items:
        - vpc_info.yml
        - securitygroup.yml
        - instance.yml
        - dns.yml
      include_role:
        name: windows_ec2
        tasks_from: "{{ item }}"
      vars:
        state: present

- name: destroy windows ec2 instance
  tags:
    - windows_ec2.destroy
    - never
  block:
    - name: destroy windows ec2 instance
      with_items:
        - vpc_info.yml
        - instance.yml
        - securitygroup.yml
        - dns.yml
      include_role:
        name: windows_ec2
        tasks_from: "{{ item }}"
      vars:
        state: absent

So an instance can be deployed by some little set of required variables, such instance name and subnet name. Beside of the required once, there are optional once, like instance type, additional ebs volumes etc.

In my POV, we need high quality micro roles, that glues that functionality together.
But I’m not sure if such “micro roles” should be developed and shipped e.g. with community.aws or if it needs a new engineering culture on the ansible user side.

jdlmrh · July 25, 2025, 8:37am

Yes, you’re right: I want to mimick Terraform behaviour, where each project is autonomous and as a consequence, created resources are not shared among multiple projects. This missing feature is annoying and is the reason why people learn Ansible AND terraform (or OpenTofu…).

For example, I wrote Playbooks to create K8s clusters. Each cluster has its own resources and when I want to delete a cluster, I need to be sure all the resources are deleted.

I thought that an automatic generation of the “rollback/cleaning” Playbooks would be an elegant solution (but certainly not a perfect solution )

markuman · July 25, 2025, 3:34pm

So I’ve tested it

The problem is, after you’ve successfully created a security group, the afterwards failing EC2 module doesn’t know anything about the previous resources (the security group in this case) that needs to be removed/rolled back.

And that’s the issue. Dependencies and order of removing resources

That’s why the order in my role is a little different in the destroy block.
In that simple case, you must first delete the ec2 instance and afterwards you can remove the security group.
Another option is to detach the security group from all its resources and afterwards delete the ec2 instance. But that makes the step of removing the security group more complex.

Another possibility what a callback_plugin can do: create a statefile (like terraform/tofu). You just need to collect the resource IDs and e.g. the (tag)names, to get a human-friendly name.
But the dependency/order issue exists here as well. It needs some kind of algorithm/dependency detector …

markuman · July 25, 2025, 3:36pm

What I find most annoying is that as soon as something doesn’t work properly, they simply switch to another tool—or even worse—develop a completely new tool instead of improving the existing tool (Ansible in this case)…

Topic		Replies	Views
Rollback in Ansible ? Ansible Project	8	48	May 15, 2019
How I can destroy asnible generated resources in AWS with a playbook? Ansible Developer aws	6	13	September 6, 2019
How to rollback the ansible-playbook tasks ? Ansible Project	1	148	July 3, 2019
Application rollback with ansible Ansible Project	2	4	May 27, 2015
Ansible for openstack Ansible Project	4	4	August 5, 2019

Implement a rollback feature when creating Cloud resources

Related topics