Delegation issues

vibhor_agarwalin · February 26, 2025, 3:52am

Ok, kindly give me sometime.
This is our production instance, so need to create a test setup & replicate.

motorbass · February 26, 2025, 9:15am

Alright, else just past the result and blur every important data.

wayt · February 28, 2025, 4:29pm

Unpacking The Problem

Error is can’t connect to host (precise words i don’t have handy).
In effect delegation doesn’t fail, connecting to remote server from localhost fails which is allowed only from AWX master server.

It sounds like we are using delegation to then try to get somewhere else and that communication is only allowed from the AWX Master Server. I’m curious what this is defined as. Is this currently the ingress to the AWX application (hostname possibly resolved to IP)?

Therefore the real problem appears to be that you need to provide the team managing access to this resource a stable IP address that they can expect to see traffic originate (sourced) from when Ansible tries to connect to them.

Some Basics

Just to level set let’s cover a few areas to ensure we are on the same page.

Implicit vs Explicit localhost

We have 2 ideas when we delegate to localhost. Ansible will provide a localhost inventory entry if there is not one provided. This is the implicit localhost and will have ansible_connection set to local. The explicit localhost is when we’ve defined localhost in our inventory and a control node will defer to the variables set for that host.

I see you have an inventory defined with localhost so it will be using explicit localhost.

How Are Jobs Ran?

You mentioned you used to run AWX 15 which does map to the start of the introduction of execution environments but if it was ran using docker-compose that would explain why you saw traffic egressing onto the network from a stable IP address (the server running AWX). In almost all cases that would have been a single node with networking setup to source NAT traffic as it egressed the server from the network docker leveraged. You could have configured docker to bridge to the LAN and gave containerized processes the ability to speak directly to external hosts but that is uncommon.

In AWX and Ansible Tower prior to the introduction of execution environments job isolation was done on AWX and Ansible Tower nodes in process-isolation. The implicit localhost had significant access what was perceived as the host running AWX or Ansible Tower. This also led to us seeing more traffic to managed nodes as egressing onto the network from the host running AWX or Ansible Tower.

In AWX and Ansible Controller today with execution environments job isolation is done by leveraging containers. When ran on Kubernetes that defaults to a container groups. This is why you see a pod scheduled to run a copy of the defined execution environment. The implicit localhost in this context lands you inside the container which in Kubernetes is a pod that may or may not be even on the host as the AWX web pod. It will have a unique IP from the pod network and the way traffic egresses is dependent on how you’ve configured Kubernetes. That could be the pod IP, or source NAT (masquerade) to the host IP address, or something fancier like an EgressGateway.

Solving the Problem

If you are running a single k3s node and your pod network is not directly routable I’d expect traffic from pods to use source NAT (masquerade) to the only node IP facing (following routing) the managed node as it egresses the underlying host. If this is not true; read on.

In more complicated Kubernetes environments consider the following to approaches to provide a stable source IP for traffic going to managed nodes.

Kubernetes Specific

Use an EgressGateway to tie the pod running a job to a specific IP or a smaller range of IP addresses. You can modify the pod specification to ensure any labels that the EgressGateway is looking to match on are applied when a job is launched.

AWX Specific

Use external hop and execution nodes leveraging the mesh ingress feature of AWX. You could then create an traditional VM with a stable IP address to execute or possibly hop through even if needed so your job runs with a source IP address and in a part of the network that satisfies your security constraints.

hugonz · February 28, 2025, 5:28pm

That’s all true, but apparently all tasks that were not delegated to localhost are/were working fine and connecting to other hosts outside the cluster. I don’t think this problem has anything to do with network permissions in the cluster, as long as a pod can access its own loopback interface.

I’m not a kubernetes expert, so I don´t know what would be configured in the security policies that would prevent a running process from accessing its own loopback interface.

wayt · February 28, 2025, 5:39pm

@hugonz I edited the above to possibly explain myself better and offer some additional information.

hugonz · February 28, 2025, 5:47pm

Thanks, as I said, I’m not a k8s expert, so I could use the insight.

vibhor_agarwalin · March 3, 2025, 9:34am

Thanks a lot folks for the inputs.

Created a simple scenario to showcase issues when delegating to localhost compared to directly coding AWX server name.

Though in my case, localhost is explicit, but it contains “ansible_connection=local”.
This should be equivalent to implicit right?

Let’s say our AWX is running on server “A” & we want to connect to host “B”.

Playbook

hosts: all
gather_facts: no

tasks:

name: Test
shell: hostname
register: test
delegate_to: localhost
tags: [always]

name: Output
debug:
var: test.stdout
tags: [always]

This fails as below, though not a network issue but clearly mentions running on pod.

TASK [Test] ********************************************************************
fatal: [B → localhost]: FAILED! => {“changed”: true, “cmd”: “hostname”, “delta”: “0:00:00.004326”, “end”: “2025-03-03 09:14:46.668916”, “msg”: “non-zero return code”, “rc”: 127, “start”: “2025-03-03 09:14:46.664590”, “stderr”: “/bin/sh: line 1: hostname: command not found”, “stderr_lines”: [“/bin/sh: line 1: hostname: command not found”], “stdout”: “”, “stdout_lines”: }

However, if i change the playbook delegation statement as below:

delegate_to: A

Then all goes well.

In nutshell, see issues while running commands while delegating to localhost as it’s a pod with a totally different environment. Now hiccups can be anything like network as detailed in this thread originally or a simply “hostname” demo’ed now.

What’s the strategy to create universally acceptable playbooks delegating to AWX server?
If we code the server name here, it’s not universal as playbooks won’t run on other AWX instances.

Phew, hope i was clear .
Thanks.

motorbass · March 3, 2025, 10:47am

@vibhor_agarwalin , pretty clear, thanks for the updates

When you say “Though in my case, localhost is explicit, but it contains “ansible_connection=local”.
This should be equivalent to implicit right?”

We agree that localhost in your case has been explicitly created within hosts menu in AWX right ? If not please create it with the ansible_connection=local parameters and let us know if the output is the same

vibhor_agarwalin · March 3, 2025, 10:54am

It’s there, kindly refer snap.

wayt · March 3, 2025, 1:54pm

@vibhor_agarwalin what is the reason you want to delegate to the pod running awx-web? If you need a stable address for people to set as the source address of automation jobs see my previous post.

hugonz · March 3, 2025, 3:21pm

Notice that the command task is working, just that the hostname command is not present within the execution container. You should try with a command that exists, like date -I.

vibhor_agarwalin · March 4, 2025, 5:26am

Aim is to delegate to AWX server as this host only has the proper network policies.
Read your post, but honestly couldn’t make out how to get a stable address.

This was just an example to show that “delegate_to: localhost” runs on pod & not on AWX server.

As mentioned above, want to get the task to run on AWX server by using a generic name & not coding the AWX server name.

wayt · March 4, 2025, 5:46am

Is there anything I can do to clarify? Delegating to localhost will cause the task to run on the pod in the container group. Is this a single node k3s deployment? Is the pod network setup to be routable directly or does it PAT out the node address following routing towards the target you want to manage?

By adding an AWX Mesh Ingress to your AWX deployment you provide a way to connect dedicated external execution nodes to your AWX environment. You can then use these dedicated execution nodes to run the jobs that need a specific source address(es).

By using an EgressGateway you tell Kubernetes how to let traffic egress your cluster; giving you a way to set a(n) address(es) for traffic as it leaves Kubernetes.

hugonz · March 4, 2025, 6:15am

I think I got what you want to do. So, if the only thing you’re looking for is a playbook that doesn’t have a server hardcoded in the delegation, would it work for you to specify it like so?

- name: Delegate this task
  ansible.builtin.shell: hostname
  delegate_to: "{{ groups['awx_hosts'] | random }}"

This way, you keep the list of awx hosts (just the one in your case) in an inventory group awx_hosts and the playbook is reusable in other awx installations. You could even use a previous play to populate that group dynamically.

vibhor_agarwalin · March 4, 2025, 6:41am

Yes

How do i figure that out? Believe you would have guessed my understanding of kubernetes by now

Sounds interesting, let me try reading this more.

Worth a try, please allow me sometime to evaluate.

vibhor_agarwalin · March 7, 2025, 3:39am

Thanks folks for the help & inputs.

For the time being I added a variable mentioning AWX server & delegating to the same.
This way our playbooks are standard & portable.

However, the ideal solution should be mesh ingress which is suggested here.
Need to work more to get to that front.

Topic		Replies	Views
Firewall rules for awx Get Help awx , ansible-core	1	843	January 10, 2024
How to connect to host from within POd/container AWX Project awx , kubernetes	0	7	July 12, 2021
Node or Pod IP based on destination Get Help awx , kubernetes , gcp	4	50	November 10, 2024
Using AWX remotely Get Help awx	1	457	October 17, 2023
Unable to connect to a specific subnet from AWX tower plus DNS resolution not working. AWX Project awx , kubernetes	8	60	March 14, 2023