Challenges with Docker Commands in AWX Custom Execution Environment on Amazon EKS

Hi,

I’m facing a challenge with running Docker commands inside a custom execution environment in AWX (version 24.0.0), deployed on Amazon EKS via the awx-operator. Specifically, I’m trying to use the docker_container module in an Ansible role to deploy node_exporter as a container.

Here’s a brief overview of the task:

- name: install node_exporter as container
  docker_container:
    name: node_exporter
    image: "{{ node_exporter_image }}"
    memory: "{{ node_exporter_memory }}"
    restart_policy: always
    security_opts: "apparmor=unconfined"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
      - /run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket:ro
    command: "{{ node_exporter_command }}"
    ports:
      - "9100:9100"

However, upon execution, I encounter the following error:

fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"}

This suggests an issue with running Docker commands inside a Docker (or in my case, possibly Podman) container, where the Docker client is available, but not the server. My initial investigations reveal that this could be related to the complexities of Docker-in-Docker (DinD) or Docker alongside Docker setups within the AWX execution environments.

Given the nature of the issue, direct configuration changes to the awx-operator for Docker server inclusion seem impractical, if not impossible. Therefore, I am exploring alternative approaches or configurations that could potentially resolve this challenge. The goal is to achieve a seamless execution of the docker_container module within AWX’s custom execution environment, allowing for Docker operations as part of Ansible tasks.

Has anyone in the community faced a similar issue or achieved functionality akin to Docker-in-Docker or Docker alongside Docker within AWX’s execution environments? Any insights, guidance, or suggestions on configuring the execution environment or alternative solutions would be immensely appreciated.

https://github.com/ansible/awx/issues/15039

Thank you for your time and assistance.

Best regards.

The docker_container module needs to be able to talk to the Docker Daemon. Whether it can do that via a socket file (that’s the standard way) or connect to it via TCP or TCP+TLS, that’s for you to determine (because it depends on your environment).

I guess the first question you have to answer is: which Docker Daemon do you want to talk to? And how is it configured?

Also: are you running everything on localhost in the EE? Or are you connecting to a target machine with SSH? I think the common case is the second, but your text seems to indicate the first.

2 Likes

Hello @felixfontein ,

Firstly, thank you for your insightful questions. To clarify the execution context and address your points:

  1. Execution on Target Machines vs. Execution Environment (EE): Ideally, I was exploring whether it’s possible to directly execute Ansible on each target machine, bypassing the need for Docker operations within the AWX EE. My initial attempts to set up such a configuration didn’t yield success. Is there a recommended approach or best practice for achieving Ansible execution directly on target machines in a way that would circumvent the complexities of interacting with Docker daemons from within the AWX EE?
  2. Docker Daemon Connection: Since our infrastructure primarily revolves around Amazon EC2 instances, the most straightforward approach for us would be to interact with the Docker Daemon via the socket file on each EC2 instance. This setup was straightforward when we were launching everything directly on the instances using user data scripts that downloaded and executed Ansible playbooks as part of instance initialization. We are now migrating these operations to AWX for better manageability and scalability.

In this migration process, our objective has been to replicate the direct execution model we previously had, now within AWX, to maintain control over the container deployment and management lifecycle on our EC2 instances. However, connecting to the Docker Daemon on EC2 instances from within the AWX EE poses a challenge. I assume the standard approach would be to ensure the Docker socket is exposed and accessible to the AWX EE, possibly through volume mounting or a similar mechanism, though I’m uncertain how this aligns with best practices, especially in terms of security and operational efficiency within the AWX framework.

Could you provide any advice or insights on how best to approach this scenario? Specifically, on managing direct Docker operations from within AWX tasks targeted at EC2 instances and whether there are more efficient methodologies or practices that we should consider to streamline this process?

Thank you again for your assistance and the valuable direction.

Hi, is there any reason to use the playbook targeting localhost?
In other words, why can’t we just target the playbook to EC2 instances instead of localhost?

Correct me if my understanding is wrong, but if you want to run playbooks to provision new EC2 instances, check the feature that called “Provisioning Callback”.

You can dynamically update hosts in inventory using bundled inventory plugin for EC2 to make Provisioning Callback work without manual intervention.

Hi @kurokobo

Thank you very much for your swift response and for suggesting the use of the Provisioning Callback feature in AWX.

I have attempted to follow your advice by targeting my playbook execution using the Provisioning Callback mechanism. However, I’ve encountered a series of HTTP 400 errors until I constructed a request somewhat like this:

#!/bin/bash
AWX_HOST="http://private-domain.lan/"
CALLBACK_TOKEN="******"
JOB_TEMPLATE_ID="21"
BEARER_TOKEN="*******************"
CSRF_TOKEN="*************************"

curl --location "$AWX_HOST/api/v2/job_templates/$JOB_TEMPLATE_ID/callback/" \
     --header "Content-Type: application/json" \
     --header "Authorization: Bearer $BEARER_TOKEN" \
     --header "Cookie: csrftoken=$CSRF_TOKEN" \
     --data "{\"host_config_key\": \"$CALLBACK_TOKEN\"}"

However, the response I received was: {"msg":"No matching host could be found!"}

Until now, our playbooks have targeted localhost due to their initial setup and configuration. I have an EC2 instances inventory filtered by the tag Type: “local-test” within AWX, which includes several instances. My objective is for the playbook specified in the template to execute on a newly launched instance (and not on the rest of the inventory). Is this achievable? And if so, why might I be receiving the message that no matching host could be found?

Any insights or further guidance you can offer would be immensely helpful as we work to streamline our deployment process.

Thank you once again for your assistance.

@alvaroc20
I’m still not sure if Provisioning Callback is the best solution for you, but anyway, thanks for giving it a try.

Sorry if I am presenting new complex issue to avoid your complex issue, but I’m adding comment in case you still want to try Provisioning Callback.

The error {"msg":"No matching host could be found!"} is very commonly encountered.

When AWX is running on Kubernetes, HTTP requests to the callback URL are forwarded multiple times within the Kubernetes cluster, causing AWX to misidentify the IP address of the HTTP request originator as the IP address inside the Kubernetes cluster rather than the correct IP address of the remote host.

To troubleshoot this:

Check the logs of web pod to see if the access source is correctly identified

Invoke the following curl command for example.

curl https://<FQDN>/api/v2/ping/

Then check the logs of web pod by kubectl -n <namespace> logs deployment/<instancename>-web -c <instancename>-web:

10.42.0.1 - - [10/Apr/2024:12:42:17 +0000] "GET /api/v2/ping/ HTTP/1.1" 200 467 "-" "curl/7.76.1" "192.168.0.221"

In this log, trailing 192.168.0.221 is the detected remote host. Ensure this is correct IP address of your remote host.

If this is incorrect, configure the proxy server or ingress controller (depends on your setup) to keep the original access source, with an X-FORWARDED-FOR or X-REAL-IP header for example.

Configure AWX to use specific header to determine remote host IP address

In Remote Host Headers in Miscellaneous System settings page in AWX, append HTTP_X_FORWARDED_FOR or HTTP_X_REAL_IP.

Configure your inventory to include remote host

Your inventory has to have the host as one of the following:

  • The IP address detected above
  • The hostname obtained by reverse DNS lookup of the IP address detected above

Forgot to write this.

As tested in my home lab, no BEARER_TOKEN and CSRF_TOKEN are required. Only CALLBACK_TOKEN is enough.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.