I’m facing a challenge with running Docker commands inside a custom execution environment in AWX (version 24.0.0), deployed on Amazon EKS via the awx-operator. Specifically, I’m trying to use the docker_container module in an Ansible role to deploy node_exporter as a container.
However, upon execution, I encounter the following error:
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"}
This suggests an issue with running Docker commands inside a Docker (or in my case, possibly Podman) container, where the Docker client is available, but not the server. My initial investigations reveal that this could be related to the complexities of Docker-in-Docker (DinD) or Docker alongside Docker setups within the AWX execution environments.
Given the nature of the issue, direct configuration changes to the awx-operator for Docker server inclusion seem impractical, if not impossible. Therefore, I am exploring alternative approaches or configurations that could potentially resolve this challenge. The goal is to achieve a seamless execution of the docker_container module within AWX’s custom execution environment, allowing for Docker operations as part of Ansible tasks.
Has anyone in the community faced a similar issue or achieved functionality akin to Docker-in-Docker or Docker alongside Docker within AWX’s execution environments? Any insights, guidance, or suggestions on configuring the execution environment or alternative solutions would be immensely appreciated.
The docker_container module needs to be able to talk to the Docker Daemon. Whether it can do that via a socket file (that’s the standard way) or connect to it via TCP or TCP+TLS, that’s for you to determine (because it depends on your environment).
I guess the first question you have to answer is: which Docker Daemon do you want to talk to? And how is it configured?
Also: are you running everything on localhost in the EE? Or are you connecting to a target machine with SSH? I think the common case is the second, but your text seems to indicate the first.
Firstly, thank you for your insightful questions. To clarify the execution context and address your points:
Execution on Target Machines vs. Execution Environment (EE): Ideally, I was exploring whether it’s possible to directly execute Ansible on each target machine, bypassing the need for Docker operations within the AWX EE. My initial attempts to set up such a configuration didn’t yield success. Is there a recommended approach or best practice for achieving Ansible execution directly on target machines in a way that would circumvent the complexities of interacting with Docker daemons from within the AWX EE?
Docker Daemon Connection: Since our infrastructure primarily revolves around Amazon EC2 instances, the most straightforward approach for us would be to interact with the Docker Daemon via the socket file on each EC2 instance. This setup was straightforward when we were launching everything directly on the instances using user data scripts that downloaded and executed Ansible playbooks as part of instance initialization. We are now migrating these operations to AWX for better manageability and scalability.
In this migration process, our objective has been to replicate the direct execution model we previously had, now within AWX, to maintain control over the container deployment and management lifecycle on our EC2 instances. However, connecting to the Docker Daemon on EC2 instances from within the AWX EE poses a challenge. I assume the standard approach would be to ensure the Docker socket is exposed and accessible to the AWX EE, possibly through volume mounting or a similar mechanism, though I’m uncertain how this aligns with best practices, especially in terms of security and operational efficiency within the AWX framework.
Could you provide any advice or insights on how best to approach this scenario? Specifically, on managing direct Docker operations from within AWX tasks targeted at EC2 instances and whether there are more efficient methodologies or practices that we should consider to streamline this process?
Thank you again for your assistance and the valuable direction.
Hi, is there any reason to use the playbook targeting localhost?
In other words, why can’t we just target the playbook to EC2 instances instead of localhost?
Correct me if my understanding is wrong, but if you want to run playbooks to provision new EC2 instances, check the feature that called “Provisioning Callback”.
Thank you very much for your swift response and for suggesting the use of the Provisioning Callback feature in AWX.
I have attempted to follow your advice by targeting my playbook execution using the Provisioning Callback mechanism. However, I’ve encountered a series of HTTP 400 errors until I constructed a request somewhat like this:
However, the response I received was: {"msg":"No matching host could be found!"}
Until now, our playbooks have targeted localhost due to their initial setup and configuration. I have an EC2 instances inventory filtered by the tag Type: “local-test” within AWX, which includes several instances. My objective is for the playbook specified in the template to execute on a newly launched instance (and not on the rest of the inventory). Is this achievable? And if so, why might I be receiving the message that no matching host could be found?
Any insights or further guidance you can offer would be immensely helpful as we work to streamline our deployment process.
@alvaroc20
I’m still not sure if Provisioning Callback is the best solution for you, but anyway, thanks for giving it a try.
Sorry if I am presenting new complex issue to avoid your complex issue, but I’m adding comment in case you still want to try Provisioning Callback.
The error {"msg":"No matching host could be found!"} is very commonly encountered.
When AWX is running on Kubernetes, HTTP requests to the callback URL are forwarded multiple times within the Kubernetes cluster, causing AWX to misidentify the IP address of the HTTP request originator as the IP address inside the Kubernetes cluster rather than the correct IP address of the remote host.
To troubleshoot this:
Check the logs of web pod to see if the access source is correctly identified
Invoke the following curl command for example.
curl https://<FQDN>/api/v2/ping/
Then check the logs of web pod by kubectl -n <namespace> logs deployment/<instancename>-web -c <instancename>-web:
In this log, trailing 192.168.0.221 is the detected remote host. Ensure this is correct IP address of your remote host.
If this is incorrect, configure the proxy server or ingress controller (depends on your setup) to keep the original access source, with an X-FORWARDED-FOR or X-REAL-IP header for example.
Configure AWX to use specific header to determine remote host IP address
In Remote Host Headers in Miscellaneous System settings page in AWX, append HTTP_X_FORWARDED_FOR or HTTP_X_REAL_IP.
Configure your inventory to include remote host
Your inventory has to have the host as one of the following:
The IP address detected above
The hostname obtained by reverse DNS lookup of the IP address detected above