Idiomatic playbook for infra and config

I have a CI process that builds VM images in layers (much like container images are composed of layers).

Each layer is built by creating a snapshot of the parent VM image, creating a new VM from that snapshot, running a playbook against the new VM, and then shutting it down to serve as the base image for the next layer.

Right now, that looks something like:

---
- hosts: all
  tasks:
    - name: step 1 of VM setup
      delegate_to: '{{ vm_server }}'
    - name: step 2 of VM setup
      delegate_to: '{{ vm_server }}'

- name: run template playbook
  import_playbook: site.yml

- hosts: all
  tasks:
    - name: shutdown new template VM
      delegate_to: '{{ vm_server }}'

That playbook is really gross, primarily because when I run it, I limit the run to the new VM, which doesn’t exist yet, which requires some fiddling with dynamic inventory to pretend that it exists before the play begins.

As I see it, I have two directions out of this hole.

First, I could just split this into two playbooks that do the infra setup and tear-down, and change the CI definition to call those and the site.yml individually. That seems idiomatic, but it also makes my CI script 3x larger.

Alternatively, I could change the “run template playbook” step to a command, and invoke ansible-playbook from the ansible playbook. That doesn’t require any changes to the CI definition, and it gets rid of my dynamic inventory problem, but then the inner run of ansible-playbook doesn’t inherit tags or check mode from the outer invocation.

The first option seems obviously better, except that I end up spreading more complexity across two different git repos when I’d rather keep it in just one.

Does anyone have suggestions for a more idiomatic approach, or anything else I should be considering?

I am not sure I fully understand what you are asking, so forgive me if I am way off here…

Specifically to this comment - when I run it, I limit the run to the new VM, which doesn’t exist yet, which requires some fiddling with dynamic inventory to pretend that it exists before the play begins - there is ansible.builtin.meta, where you can refresh inventory. So this:

- name: Refresh inventory to ensure new instances exist in inventory
  ansible.builtin.meta: refresh_inventory

In theory I think you can use ansible.builtin.import_playbook to call a playbook that creates the VM, then run the meta command, then call a playbook to work on the newly created VM…

I haven’t tested this - I use AWS, and I currently have my “creation” playbook separate than my “configuration” playbook. But I have successfully used this to refresh inventory, as I need to do it before I can copy tags to volumes (meaning, I create tags, but then can’t reference them later because they aren’t in the currently-loaded inventory, hence the refresh).

Hope this makes sense and helps!

The VMs I’m creating are named <role>-<env>-<serial>, and my inventory is almost entirely dynamic, consisting of VMs queried at runtime.

What I meant by the quoted section is that if I want to create a new VM, I can’t easily do something like ansible-playbook -l storage-test-20240109 new-image.yml, because that host doesn’t exist when the playbook starts. In order to make it work, I have to have a second dynamic inventory script that tells ansible about a fictional host when the playbook starts, then I update the inventory once the VM really does exist, and then move on to importing the main playbook.

As far as I can tell, I also can’t limit the play to an empty group and later add the new VM to that group. The playbook won’t run unless the “limit” target has some hosts.

And I also can’t import the playbook and delegate it to the new VM.

So as far as I can tell, separating the playbooks is the best option, which pushes me toward running a collection of playbooks through a shell script, and at that point I feel like I’m adding more automation on top of Ansible’s playbook automation.

I don’t see a better option, but this feels inelegant.