AWX main site (controller and hopnodes) is up&running on premises.
Many remote execution nodes provide distributed job execution at remote sites.
Remote execution nodes connect to controller via hop nodes, but the connection might become unstable at times.
In AWX, representing each remote site “X”, there’s one remote execution instance named “X”, one instance group named “X” containing this instance “X” and one inventory named “X” referencing this instance group “X” (instance group fallback is disabled) so that a job using this inventory “X” will run exactly on that particular remote site “X”.
The inventories are populated using inventory sources as the hosts/groups structure of these remote sites resemble each other to a high degree (there are eight classes of host/group sets).
In principle, this works. However, because of the inventory’s reference to a specific instance group, the job that refreshes the inventory source get always executed on the remote instance. Instead, I’d like to run the refreshes job on the controller site on premises for the following reasons:
When populating the topology in AWX, the remote execution nodes only exist in AWX as their install bundle has only been prepared just yet. The nodes are not available for refreshing the inventory source. The job gets stuck in the “Pending” state.
For future inventory source refreshs, the connection to the remote isn’t guaranteed to be available. Again, the job will get stuck.
It doesn’t make much sense to run a job maintaining metadata on the remote site.
What I have tried so far:
I could list the main site as a secondary instance group for each inventory. However, this might result in regular jobs meant to run on the remote site get executed on the main node when the remote site is temporarily unavailable.
I could remove the reference to the instance group “X” in inventory “X” for the time the inventory source is refreshed. Again, jobs beeing queued at this moment will be misrouted to the main site.
I could remove the reference to the instance group “X” in inventory “X” permanently and create my regular jobs by referencing not only inventory “X” but also instance group “X”. This works, but I can no longer “run command” in from the inventory directly as the wizard for “run command” does not ask for the instance group to be used. I don’t want to lose the “run command” feature.
I could get rid of the inventory sources by just having eight “template” inventories populated by inventory sources (not referencing any instance group) and create the actual inventories representing the remote sites as a static copy of these template inventories.
At the moment, the last approach is the workaround I’m thinking of implementing. However, it’s till a workaround, so I’m wondering:
Is there any possibility to have the job refreshing the inventory source run on the main site despite the fixed reference from inventory “X” to instance group “X”?
Is there an entirely different approach to have regular jobs run on specific sites just by referencing the inventory which doesn’t have any of the flaw mentioned?
I think I’m following what you’re looking to do (bear with me, still working on the first coffee):
Inventory refreshes run from the main AWX cluster
Jobs that use those inventories run from an instance group
Take a look at constructed inventories. These are inventories based on other inventories but allow you to target other options different than the original. We currently use these for deploying VMs to Azure, synced via the dynamic inventory plugin. The main inventory has the Azure plugin as the source targeting the default instance group. We then have constructed inventories (based on the main) that allow us to specify another instance group that can connect to the new VMs at the OS-level. The job templates then use the constructed inventories.
thanks for your input! Just tried this (again) and I should have mentioned it before: The contructed inventory approach suffers from the very same problem as the one with source invetories: The sync job will be assigned to the instance group of the inventory beeing created, not the one associated with the input inventory, even if I assign the main site instance group to the input inventory.
Interesting, I’ll have to do a test at lunch. Looking at our job history, I’m seeing the main inventory syncs happening from the default instance group (as configured) and the templates based on the associated constructed inventories running their configured instance groups (different then default). We do use workflow templates and have some manual refreshes to control when hosts are added/removed from the inventories during deployment. This may be playing a role.
I did another test with our sandbox setup:
Main Inventory - synced via Azure ARM, setup to use instance group 1
Constructed Inventory - synced from main inventory, setup to use instance group 2
Job Template - setup to use constructive inventory, no instance group specified on the job template
Ran a sync on the main inventory, which executed on instance group 1. Ran the job template and see that the constructed inventory sync executed on instance group 2 (same for the job template).
The caveat is that the main inventory sync has to be ran manually since the job template will only run the sync from main to constructed (at least to my knowledge). We get around this by running a manual sync of the main inventory as part of our workflow template.
I’ll try to follow your setup as closely as I can get here in our setup and give it a try tomorrow! Thanks for now, I’ll get back to you with my observations!
In my case, the main inventories are static inventories (those mentioned eight classes of host/group sets), no sync needed here. On the constructed inventory, I wanted to have the sync job run on the main site (instance group 1 in your case) and templated jobs on the remote site (instance group 2).
So in in mine as well as your example both the sync jobs and the templated jobs on the constructed inventory run on instance group 2. Which leads to my problem: I can’t populate the inventory until the remote instance is connected and the connection is available at the time of sync trigger sync.
What I understand from your description is, that you don’t neccesarily want to sync on trigger but on templated job run instead. And indeed, this might be a valid approach for me: If someone wants to run ad hoc tasks on hosts or groups in the AWX Web UI using an at the time unpopulated inventory, they simply have to click sync first. On the other hand, I could check “sync on job run” so the inventories are populated each time a job runs. Of course the sync job will still run on the remote site but the templated job will do so anyways, so a working connection is assumed anyways.