AWX Mesh ingress Looses Route

,

Title: AWX Task Pod Losing Connection to Mesh Ingress and Control Service

Description:

We are experiencing an issue with our AWX Execution Environment where the Ansible task pod is losing its routing to the mesh ingress. The issue manifests as a series of warnings in our logs, which I’ve included below:

WARNING 2024/03/10 16:18:27 Could not read in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
WARNING 2024/03/10 16:18:27 Could not close connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection
WARNING 2024/03/10 16:18:49 Timing out connection awx-mesh-ingress, idle for the past 21s
WARNING 2024/03/10 16:18:50 Could not copy to stdout file /tmp/receptor/awx-task-6f99fd97-hb4zx/BfYOr4LS/stdout: INTERNAL_ERROR (local): no connection to next hop
...

The first two warnings suggest an issue with the connection to the control service via the Unix socket at /var/run/receptor/receptor.sock. The connection appears to have been closed unexpectedly or prematurely.

The third warning indicates that the connection to awx-mesh-ingress has been idle for 21 seconds and thus has been timed out. This could be due to network issues, or the awx-mesh-ingress service might be down or not responding.

The subsequent warnings suggest that there's an issue with the connection to the next hop in the network. This could be due to network issues, or the next hop might be down or not responding.

We have attempted to troubleshoot these issues by checking the status and logs of the control service and the awx-mesh-ingress service, verifying the network connectivity, and checking the network routes. However, the issue persists.

Any help or guidance on how to resolve this issue would be greatly appreciated.

@firdous_ahmed_reshi Welcome to the Forum.

In Release 24.0.0 · ansible/awx · GitHub & Release 2.13.0 · ansible/awx-operator · GitHub which was released yesterday we made a few fixes in Mesh Networking. If it’s possible could you please update to AWX 2.4.0 and let us know if the issue still happens, and provide a fresh copy of the error lines.