AWX workflow design: provisioning EC2 + targeting downstream jobs with dynamic inventory / limits

I’m looking for advice on a clean AWX workflow design to solve this expected workflow.

  1. I need to create an EC2 instance (from scratch, create new EC2 from base OS AMI and then run the provisioning)
  2. Usually these new nodes have a curl callback request to an AWX job template in their user-data so at boot time they’ll ask AWX to configure them, this works well, but it seems impossible to integrate into a workflow. I would like it integrated into the workflow so the user knows there straight away after a few mins if the instance succeeded or not.

One workaround is to create the instance, add it (add_host) to a hardcoded group (ex: just_created as per docs) that the next task will use in its hosts: section so it only targets that, but using add_host only makes sense during the same playbook. And, and, requires me to duplicate existing initial-config playbook that itself imports multiple playbooks all with hosts: all and clone all of those just to change from hosts: all to hosts: just_created.

My idea was using the existing limit capability to:

  1. Start a workflow
  2. Workflow Node 1: Create the EC2 instance
    1. Wait for it to be ready
    2. Force a refresh of the dynamic inventory
    3. Set a new limit {{ ec2.instances[0].private_ip_address }}
  3. Workflow Node 2: Start initial-config with the full inventory, but with the limit changed earlier to just the new instance’s ip {{ ec2.instances[0].private_ip_address }}
    1. This would allow me to use the existing initial-config without any changes at all, as if it would have been initiated by the callback, but inside the AWX workflow, providing the final result to the user at the end.

I understand that apparently the limit cannot be changed dynamically during the workflow execution, but I wonder what would be the recommended way to approach this while avoiding refactoring all the existing playbooks that have their own values for the host: parameter.

I also see as I tested that I cannot have the workflow or job started with a limit of just_created (that would be non-existent or empty and error out), have a 1st task that creates the ec2 instance (delegate_to: localhost), a 2nd one that calls add_host (this can’t be delegated I think it means it runs on the controller), and finally the rest of the playbook using hosts: just_created targeting only the new instance.

Could I have a just_created group with some kind of dummy entry so I could run (delegate_to: localhost) the ec2 instance creation, and on the next step have add_host somehow replacing that dummy entry with the new instance?

I’m going in circles unable to find a solution that could even work. I’m looking for something that’s “AWX idiomatic” and second if possible avoids duplicating existing playbooks or is difficult to understand.

I’m leaning into the “looks impossible currently” mindset and accepting that the current situation of creating the instance in one playbook with a cloud-init user-data that has the curl callback to AWX to self-initiate the provisioning is the best workaround currently. But it leads to decoupled job runs that may require more complex error checking, is not as simple as having a new workflow node running on-failure that just terminates (kills) the instance that failed to provision.

Any ideas?

I am not sure that I completely understand the problem.
You might try something like this not to copy all the playbooks.

$ cat playbook.yml
- name: Playbook
  hosts: "{{ hosts | default('all') }}"
  tasks:
   ...

$ ansible-playbook playbook.yml --extra-vars "hosts=just_created"

Here hosts might not be the best name for this variable.

The set_stats module is designed to pass data from one workflow node to another. I think that is exactly what you are trying to do

Heres an article showing how it works, sort of: CIQ | Passing Ansible Variables in Workflows Using set_stats

I think you can make this super simple, just create a job that has rights within your AWX to start other jobs. Then you can just launch templates/workflows with different limits

So my creating a VM is a playbook that will:
take input data from a survey, create a hostname, add the host with data from survey into a primary data inventory, then it creates the VM from the primary data inventory, then it syncs dynamic inventories, the dynamic+static inventories get merged via a constructed inventory into (via a workflow) and then I can do ops on the running VM using that constructed inventory (firstboot configuration, first OS update to ensure latest packages,…)

Hi,

I forgot to update, thanks to @kks & @mikemorency hints I managed to get something working without too many changes.

Workflow:

  1. Create EC2 (uses a static empty inventory, playbook uses hosts: localhost, no need for an inventory here)
    • After creating the instance uses set_stats to set a “fake” limit variable with the value that matches the inventory name the new instance will have, by default it’s the private ip address, so it would be: ec2_instance_limit: "{{ ec2.instances[0].private_ip_address }}"
  2. initial-config (uses self-refreshing ec2 dynamic inventory)
    • And, here’s the trick, the hosts is modified to be: hosts: "{{ (ec2_instance_limit | default('all')) }}"

This achieves the requirement to be able to set a different “limit” for downstream jobs, as it cannot be done directly, using this combination of set_stats + hosts: . The main limiting factor is that the fake limit variable ec2_instance_limit must exactly match the inventory name format, and that’s brittle, it’ll break if you change the inventory’s hostnames format, but that’s an acceptable trade-off for us as that’s not a very frequent modification.

That hosts: format for the initial-config playbook allows the same job template to be used as a callback target as well as integrated in this workflow, minimizing duplication.

Thank you everyone for your help and suggestions, very appreciated :+1:

1 Like