Hi all,
i am running AWX (24.3.1) in Kubernetes, managed by AWX Operator (2.16.1). awx-task
is running in two replicas. We have also additional instance group with Execution Node (Receptor 1.4.7) outside of the Kubernetes cluster.
The idea was to run Kubespray (Kubernetes management using Ansible) to upgrade Kubernetes cluster where also AWX is deployed. The Kubespray job is running from Execution Node (so it’s outside of Kubernetes infrastructure and it’s not affected by draining of Kubernetes nodes).
The problem is when Kubespray is doing the upgrade (and thus drain) the Kubernetes node where the awx-task
pod replica which created the job on Execution Node is running. When this job which is running the Kubespray Ansible becomes orphaned, receptor kills this job in next few seconds without trying to migrate the job to another, existing awx-task
pod replica.
Is this behavior normal, or am i missing some settings? Using environment variable RECEPTOR_KUBE_SUPPORT_RECONNECT=enabled
seems not not to resolve this issue.