Ok, kindly give me sometime.
This is our production instance, so need to create a test setup & replicate.
Alright, else just past the result and blur every important data.
Unpacking The Problem
Error is can’t connect to host (precise words i don’t have handy).
In effect delegation doesn’t fail, connecting to remote server from localhost fails which is allowed only from AWX master server.
It sounds like we are using delegation to then try to get somewhere else and that communication is only allowed from the AWX Master Server. I’m curious what this is defined as. Is this currently the ingress to the AWX application (hostname possibly resolved to IP)?
Therefore the real problem appears to be that you need to provide the team managing access to this resource a stable IP address that they can expect to see traffic originate (sourced) from when Ansible tries to connect to them.
Some Basics
Just to level set let’s cover a few areas to ensure we are on the same page.
Implicit vs Explicit localhost
We have 2 ideas when we delegate to localhost. Ansible will provide a localhost inventory entry if there is not one provided. This is the implicit localhost and will have ansible_connection set to local. The explicit localhost is when we’ve defined localhost in our inventory and a control node will defer to the variables set for that host.
I see you have an inventory defined with localhost so it will be using explicit localhost.
How Are Jobs Ran?
You mentioned you used to run AWX 15 which does map to the start of the introduction of execution environments but if it was ran using
docker-compose
that would explain why you saw traffic egressing onto the network from a stable IP address (the server running AWX). In almost all cases that would have been a single node with networking setup to source NAT traffic as it egressed the server from the network docker leveraged. You could have configured docker to bridge to the LAN and gave containerized processes the ability to speak directly to external hosts but that is uncommon.
In AWX and Ansible Tower prior to the introduction of execution environments job isolation was done on AWX and Ansible Tower nodes in process-isolation. The implicit localhost had significant access what was perceived as the host running AWX or Ansible Tower. This also led to us seeing more traffic to managed nodes as egressing onto the network from the host running AWX or Ansible Tower.
In AWX and Ansible Controller today with execution environments job isolation is done by leveraging containers. When ran on Kubernetes that defaults to a container groups. This is why you see a pod scheduled to run a copy of the defined execution environment. The implicit localhost in this context lands you inside the container which in Kubernetes is a pod that may or may not be even on the host as the AWX web pod. It will have a unique IP from the pod network and the way traffic egresses is dependent on how you’ve configured Kubernetes. That could be the pod IP, or source NAT (masquerade) to the host IP address, or something fancier like an EgressGateway.
Solving the Problem
If you are running a single k3s
node and your pod network is not directly routable I’d expect traffic from pods to use source NAT (masquerade) to the only node IP facing (following routing) the managed node as it egresses the underlying host. If this is not true; read on.
In more complicated Kubernetes environments consider the following to approaches to provide a stable source IP for traffic going to managed nodes.
Kubernetes Specific
Use an EgressGateway to tie the pod running a job to a specific IP or a smaller range of IP addresses. You can modify the pod specification to ensure any labels that the EgressGateway is looking to match on are applied when a job is launched.
AWX Specific
Use external hop and execution nodes leveraging the mesh ingress feature of AWX. You could then create an traditional VM with a stable IP address to execute or possibly hop through even if needed so your job runs with a source IP address and in a part of the network that satisfies your security constraints.
That’s all true, but apparently all tasks that were not delegated to localhost
are/were working fine and connecting to other hosts outside the cluster. I don’t think this problem has anything to do with network permissions in the cluster, as long as a pod can access its own loopback interface.
I’m not a kubernetes expert, so I don´t know what would be configured in the security policies that would prevent a running process from accessing its own loopback interface.
@hugonz I edited the above to possibly explain myself better and offer some additional information.
Thanks, as I said, I’m not a k8s expert, so I could use the insight.
Thanks a lot folks for the inputs.
Created a simple scenario to showcase issues when delegating to localhost compared to directly coding AWX server name.
Though in my case, localhost is explicit, but it contains “ansible_connection=local”.
This should be equivalent to implicit right?
Let’s say our AWX is running on server “A” & we want to connect to host “B”.
Playbook
hosts: all
gather_facts: notasks:
name: Test
shell: hostname
register: test
delegate_to: localhost
tags: [always]name: Output
debug:
var: test.stdout
tags: [always]
This fails as below, though not a network issue but clearly mentions running on pod.
TASK [Test] ********************************************************************
fatal: [B → localhost]: FAILED! => {“changed”: true, “cmd”: “hostname”, “delta”: “0:00:00.004326”, “end”: “2025-03-03 09:14:46.668916”, “msg”: “non-zero return code”, “rc”: 127, “start”: “2025-03-03 09:14:46.664590”, “stderr”: “/bin/sh: line 1: hostname: command not found”, “stderr_lines”: [“/bin/sh: line 1: hostname: command not found”], “stdout”: “”, “stdout_lines”: }
However, if i change the playbook delegation statement as below:
delegate_to: A
Then all goes well.
In nutshell, see issues while running commands while delegating to localhost as it’s a pod with a totally different environment. Now hiccups can be anything like network as detailed in this thread originally or a simply “hostname” demo’ed now.
What’s the strategy to create universally acceptable playbooks delegating to AWX server?
If we code the server name here, it’s not universal as playbooks won’t run on other AWX instances.
Phew, hope i was clear .
Thanks.
@vibhor_agarwalin , pretty clear, thanks for the updates
When you say “Though in my case, localhost is explicit, but it contains “ansible_connection=local”.
This should be equivalent to implicit right?”
We agree that localhost in your case has been explicitly created within hosts menu in AWX right ? If not please create it with the ansible_connection=local parameters and let us know if the output is the same
@vibhor_agarwalin what is the reason you want to delegate to the pod running awx-web? If you need a stable address for people to set as the source address of automation jobs see my previous post.
Notice that the command task is working, just that the hostname
command is not present within the execution container. You should try with a command that exists, like date -I
.
Aim is to delegate to AWX server as this host only has the proper network policies.
Read your post, but honestly couldn’t make out how to get a stable address.
This was just an example to show that “delegate_to: localhost” runs on pod & not on AWX server.
As mentioned above, want to get the task to run on AWX server by using a generic name & not coding the AWX server name.
Is there anything I can do to clarify? Delegating to localhost
will cause the task to run on the pod in the container group. Is this a single node k3s deployment? Is the pod network setup to be routable directly or does it PAT out the node address following routing towards the target you want to manage?
By adding an AWX Mesh Ingress to your AWX deployment you provide a way to connect dedicated external execution nodes to your AWX environment. You can then use these dedicated execution nodes to run the jobs that need a specific source address(es).
By using an EgressGateway you tell Kubernetes how to let traffic egress your cluster; giving you a way to set a(n) address(es) for traffic as it leaves Kubernetes.
I think I got what you want to do. So, if the only thing you’re looking for is a playbook that doesn’t have a server hardcoded in the delegation, would it work for you to specify it like so?
- name: Delegate this task
ansible.builtin.shell: hostname
delegate_to: "{{ groups['awx_hosts'] | random }}"
This way, you keep the list of awx hosts (just the one in your case) in an inventory group awx_hosts
and the playbook is reusable in other awx installations. You could even use a previous play to populate that group dynamically.
Yes
How do i figure that out? Believe you would have guessed my understanding of kubernetes by now
Sounds interesting, let me try reading this more.
Worth a try, please allow me sometime to evaluate.
Thanks folks for the help & inputs.
For the time being I added a variable mentioning AWX server & delegating to the same.
This way our playbooks are standard & portable.
However, the ideal solution should be mesh ingress which is suggested here.
Need to work more to get to that front.