AWX Crashes when launch 7 concurrent jobs

Hi,

I have AWX running on a VM with the following spec:

  • 4 vCPU (report as AMD EPYC 7402P 24-Core Processor)
  • 16 GB of RAM
  • 28GB virtual disk 18GB used 8.7G Free

OS - Ubuntu 22.04.1 LTS
AWX - About reports the version as 21.4.0

When a large number of concurrent jobs are started at the same time, AWX crashes, with errors in the browser 404 or 502. Sometimes it recovers and I can login but the jobs will have crashed with “Task was marked as running but was not present in the job queue, so it has been marked as failed.” other times it doesn’t respond and I reboot the server.

It feels like a resource issue, but I’m not sure where to look as K3s is not an area I have much knowledge in.

What is the likely cause ?

Looks like the Redis container is having issues :

I was following the logs and I started the workflow template and the connection to the container was lost

1:M 04 Mar 2023 04:40:29.124 * Background saving terminated with success
1:signal-handler (1677905080) Received SIGTERM scheduling shutdown…
1:M 04 Mar 2023 04:44:40.786 # User requested shutdown…
1:M 04 Mar 2023 04:44:40.786 * Saving the final RDB snapshot before exiting.
1:M 04 Mar 2023 04:44:40.795 * DB saved on disk
1:M 04 Mar 2023 04:44:40.795 * Removing the unix socket file.
1:M 04 Mar 2023 04:44:40.795 # Redis is now ready to exit, bye bye…
rpc error: code = NotFound desc = an error occurred when try to find container “76b43903e9dabc1f72e0f70b07e07e34ebc18520bf5f23ebdd0535a1d19b8f3a”: not foundroot@server:~# kubectl -n awx logs pod/awx-788749fb7f-vc9w5 -f
Defaulted container “redis” out of: redis, awx-web, awx-task, awx-ee, init (init)
unable to retrieve container logs for containerd://8ae010256adb598c6f842f821c4a960809a9c2e8dae37edde6d73e0e68f94cbdroot@server:~# kubectl -n awx logs pod/awx-788749fb7f-vc9w5 -f
Defaulted container “redis” out of: redis, awx-web, awx-task, awx-ee, init (init)
unable to retrieve container logs for containerd://8ae010256adb598c6f842f821c4a960809a9c2e8dae37edde6d73e0e68f94cbdroot@server:~# kubectl -n awx logs pod/awx-788749fb7f-z558t -f
Defaulted container “redis” out of: redis, awx-web, awx-task, awx-ee, init (init)
1:C 04 Mar 2023 04:49:55.581 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 04 Mar 2023 04:49:55.581 # Redis version=7.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 04 Mar 2023 04:49:55.581 # Configuration loaded
1:M 04 Mar 2023 04:49:55.582 * monotonic clock: POSIX clock_gettime
1:M 04 Mar 2023 04:49:55.582 * Running mode=standalone, port=0.
1:M 04 Mar 2023 04:49:55.582 # Server initialized
1:M 04 Mar 2023 04:49:55.583 * The server is now ready to accept connections at /var/run/redis/redis.sock
1:signal-handler (1677905534) Received SIGTERM scheduling shutdown…
1:M 04 Mar 2023 04:52:14.604 # User requested shutdown…
1:M 04 Mar 2023 04:52:14.604 * Saving the final RDB snapshot before exiting.
1:M 04 Mar 2023 04:52:14.617 * DB saved on disk
1:M 04 Mar 2023 04:52:14.618 * Removing the unix socket file.
1:M 04 Mar 2023 04:52:14.619 # Redis is now ready to exit, bye bye…

root@server# kubectl -n awx get all
NAME READY STATUS RESTARTS AGE
pod/awx-788749fb7f-gvtv9 0/4 ContainerStatusUnknown 45 (30d ago) 166d
pod/awx-788749fb7f-4l99n 0/4 ContainerStatusUnknown 4 13d
pod/awx-788749fb7f-hmqdg 0/4 ContainerStatusUnknown 2 3h44m
pod/awx-788749fb7f-f4xmj 0/4 ContainerStatusUnknown 3 3h10m
pod/awx-788749fb7f-s8rr5 0/4 ContainerStatusUnknown 3 103m
pod/awx-postgres-13-0 1/1 Running 14 (26m ago) 166d
pod/awx-operator-controller-manager-7f89bd5797-lwjpx 2/2 Running 23 (26m ago) 138d
pod/awx-788749fb7f-vc9w5 0/4 ContainerStatusUnknown 5 (26m ago) 46m
pod/awx-788749fb7f-z558t 0/4 ContainerStatusUnknown 4 12m
pod/awx-788749fb7f-qzzbg 0/4 Pending 0 4m46s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-operator-controller-manager-metrics-service ClusterIP 10.43.88.44 8443/TCP 203d
service/awx-postgres-13 ClusterIP None 5432/TCP 203d
service/awx-service ClusterIP 10.43.137.182 80/TCP 203d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx-operator-controller-manager 1/1 1 1 203d
deployment.apps/awx 0/1 1 0 203d

NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-5d7b85bc77 0 0 0 203d
replicaset.apps/awx-operator-controller-manager-7f89bd5797 1 1 1 203d
replicaset.apps/awx-788749fb7f 1 1 0 200d

NAME READY AGE
statefulset.apps/awx-postgres-13 1/1 203d
root@server:~#

I saw there was a new running instance, and that shut down " Received SIGTERM scheduling shutdown…" and became " ContainerStatusUnknown"

Why would the Redis container shutdown without an visible errors ?

what does kubectl describe on one of the ContainerStatusUnknown pods report?

verify that the nodes these job pods were assigned to are healthy by running “kubectl get node”

We suspect the underlying nodes are unhealthy (memory issues maybe) and causing the pods to crash

AWX Team

Thanks for getting back to me,

run kubectl -n awx delete deployment awx

which has cleared them from the list.

And the started a 2 new job ,each copies files to 2 servers.
I have lost access to AWX , getting gateway error , but now getting “not found” on the jobs page.

ansible:~/awx-on-k3s/base# kubectl get node
NAME STATUS ROLES AGE VERSION
gglvansible Ready control-plane,master 209d v1.25.6+k3s1
ansible:~/awx-on-k3s/base#

root@ansible:~/awx-on-k3s/base# kubectl -n awx get all
NAME READY STATUS RESTARTS AGE
pod/awx-postgres-13-0 1/1 Running 21 (85m ago) 171d
pod/awx-operator-controller-manager-68d6f576b4-7672r 2/2 Running 0 79m
pod/automation-job-1157-x7rb4 1/1 Running 0 7m38s
pod/automation-job-1156-tvhjj 1/1 Running 0 7m40s
pod/awx-9668dcb98-nzg5q 0/4 ContainerStatusUnknown 3 46m
pod/awx-9668dcb98-dh56c 4/4 Running 0 6m5s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-operator-controller-manager-metrics-service ClusterIP 10.43.88.44 8443/TCP 209d
service/awx-postgres-13 ClusterIP None 5432/TCP 209d
service/awx-service ClusterIP 10.43.137.182 80/TCP 209d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx-operator-controller-manager 1/1 1 1 209d
deployment.apps/awx 1/1 1 1 46m

NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-operator-controller-manager-68d6f576b4 1 1 1 79m
replicaset.apps/awx-operator-controller-manager-7f89bd5797 0 0 0 209d
replicaset.apps/awx-9668dcb98 1 1 1 46m

NAME READY AGE
statefulset.apps/awx-postgres-13 1/1 209d
root@ansible:~/awx-on-k3s/base#

root@ansible:~/awx-on-k3s/base# kubectl -n awx describe pod awx-9668dcb98-nzg5q
Name: awx-9668dcb98-nzg5q
Namespace: awx
Priority: 0
Service Account: awx
Node: ansible/10.20.7.10
Start Time: Fri, 10 Mar 2023 10:47:16 +1300
Labels: app.kubernetes.io/component=awx
app.kubernetes.io/managed-by=awx-operator
app.kubernetes.io/name=awx
app.kubernetes.io/operator-version=1.3.0
app.kubernetes.io/part-of=awx
app.kubernetes.io/version=21.13.0
pod-template-hash=9668dcb98
Annotations: checksum-configmaps-config: f561cc65d89b4e3678076eccafe63ac9
checksum-configmaps-pre_stop_scripts: 68b329da9893e34099c7d8ad5cb9c940
checksum-secret-bundle_cacert: 276fa68835904533a2a8b68b5a128047
checksum-secret-ldap_cacert: 276fa68835904533a2a8b68b5a128047
checksum-secret-receptor_ca: 4ee07b571170b38048a66949f955f0dc
checksum-secret-receptor_work_signing: 796f98b768de8340c4167ba74a0b0094
checksum-secret-route_tls: d41d8cd98f00b204e9800998ecf8427e
checksum-secret-secret_key: 37ec43cc1be555e4ba78f4425301865f
checksum-secrets-app_credentials: 1754fa7c60d3bf69b54d2ffcc10bca10
checksum-storage-persistent: 68b329da9893e34099c7d8ad5cb9c940
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Container awx-ee was using 1008644Ki, which exceeds its request of 0. Container redis was using 36Ki, which exceeds its request of 0. Container awx-task was using 873704Ki, which exceeds its request of 0. Container awx-web was using 360Ki, which exceeds its request of 0.
IP: 10.42.0.12
IPs:
IP: 10.42.0.12
Controlled By: ReplicaSet/awx-9668dcb98
Init Containers:
init:
Container ID: containerd://bb5e6a3ea197a81cc1fd1446b8e63435a75bffe67affcd7f16268f951a46f41f
Image: quay.io/ansible/awx-ee:latest
Image ID: quay.io/ansible/awx-ee@sha256:58fecfd22a9b8e4d639107391e867d95cc587720dcdb3cc974b930552058fbb6
Port:
Host Port:
Command:
/bin/sh
-c
hostname=$MY_POD_NAME
receptor --cert-makereq bits=2048 commonname=$hostname dnsname=$hostname nodeid=$hostname outreq=/etc/receptor/tls/receptor.req outkey=/etc/receptor/tls/receptor.key
receptor --cert-signreq req=/etc/receptor/tls/receptor.req cacert=/etc/receptor/tls/ca/receptor-ca.crt cakey=/etc/receptor/tls/ca/receptor-ca.key outcert=/etc/receptor/tls/receptor.crt verify=yes
mkdir -p /etc/pki/ca-trust/extracted/{java,pem,openssl,edk2}
update-ca-trust

State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 10 Mar 2023 10:47:17 +1300
Finished: Fri, 10 Mar 2023 10:47:18 +1300
Ready: True
Restart Count: 0
Environment:
MY_POD_NAME: awx-9668dcb98-nzg5q (v1:metadata.name)
Mounts:
/etc/pki/ca-trust/extracted from ca-trust-extracted (rw)
/etc/pki/ca-trust/source/anchors/bundle-ca.crt from awx-bundle-cacert (ro,path=“bundle-ca.crt”)
/etc/receptor/tls/ from awx-receptor-tls (rw)
/etc/receptor/tls/ca/receptor-ca.crt from awx-receptor-ca (ro,path=“tls.crt”)
/etc/receptor/tls/ca/receptor-ca.key from awx-receptor-ca (ro,path=“tls.key”)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f96x7 (ro)
init-projects:
Container ID: containerd://3e7d78e9a26e26b0fe717e169c00db72fb8ae7350b0fd3721b68bd02551aae7e
Image: quay.io/centos/centos:stream9
Image ID: quay.io/centos/centos@sha256:3332c6692307ba0bdd916c8681a9a7184ca7630de3706aef3476d4ceb286531f
Port:
Host Port:
Command:
/bin/sh
-c
chmod 775 /var/lib/awx/projects
chgrp 1000 /var/lib/awx/projects

State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 10 Mar 2023 10:47:18 +1300
Finished: Fri, 10 Mar 2023 10:47:18 +1300
Ready: True
Restart Count: 0
Environment:
MY_POD_NAME: awx-9668dcb98-nzg5q (v1:metadata.name)
Mounts:
/var/lib/awx/projects from awx-projects (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f96x7 (ro)
Containers:
redis:
Container ID:
Image: docker.io/redis:7
Image ID:
Port:
Host Port:
Args:
redis-server
/etc/redis.conf
State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was terminated
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted. The container used to be Running
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 1
Requests:
cpu: 50m
memory: 64Mi
Environment:
Mounts:
/data from awx-redis-data (rw)
/etc/redis.conf from awx-redis-config (ro,path=“redis.conf”)
/var/run/redis from awx-redis-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f96x7 (ro)
awx-web:
Container ID:
Image: quay.io/ansible/awx:21.13.0
Image ID:
Port: 8052/TCP
Host Port: 0/TCP
Args:
/usr/bin/launch_awx.sh
State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was terminated
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted. The container used to be Running
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 1
Environment:
MY_POD_NAMESPACE: awx (v1:metadata.namespace)
UWSGI_MOUNT_PATH: /
Mounts:
/etc/nginx/nginx.conf from awx-nginx-conf (ro,path=“nginx.conf”)
/etc/openldap/certs/ldap-ca.crt from awx-ldap-cacert (ro,path=“ldap-ca.crt”)
/etc/pki/ca-trust/extracted from ca-trust-extracted (rw)
/etc/pki/ca-trust/source/anchors/bundle-ca.crt from awx-bundle-cacert (ro,path=“bundle-ca.crt”)
/etc/receptor/signing/work-public-key.pem from awx-receptor-work-signing (ro,path=“work-public-key.pem”)
/etc/receptor/tls/ca/receptor-ca.crt from awx-receptor-ca (ro,path=“tls.crt”)
/etc/receptor/tls/ca/receptor-ca.key from awx-receptor-ca (ro,path=“tls.key”)
/etc/tower/SECRET_KEY from awx-secret-key (ro,path=“SECRET_KEY”)
/etc/tower/conf.d/credentials.py from awx-application-credentials (ro,path=“credentials.py”)
/etc/tower/conf.d/execution_environments.py from awx-application-credentials (ro,path=“execution_environments.py”)
/etc/tower/conf.d/ldap.py from awx-application-credentials (ro,path=“ldap.py”)
/etc/tower/settings.py from awx-settings (ro,path=“settings.py”)
/var/lib/awx/projects from awx-projects (rw)
/var/lib/awx/rsyslog from rsyslog-dir (rw)
/var/run/awx-rsyslog from rsyslog-socket (rw)
/var/run/redis from awx-redis-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f96x7 (ro)
/var/run/supervisor from supervisor-socket (rw)
awx-task:
Container ID: containerd://7de76df9cdc0621fd1acf2a73f80a59fb3eb9a2007142a34a88c430afc06bce9
Image: quay.io/ansible/awx:21.13.0
Image ID: quay.io/ansible/awx@sha256:111c5acb675f2e156d99e6ddfbaeb0b482a2fe37e7a28e6d9ffcacb0141620c9
Port:
Host Port:
Args:
/usr/bin/launch_awx_task.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 10 Mar 2023 10:47:19 +1300
Finished: Fri, 10 Mar 2023 11:28:03 +1300
Ready: False
Restart Count: 0
Environment:
SUPERVISOR_WEB_CONFIG_PATH: /etc/supervisord.conf
AWX_SKIP_MIGRATIONS: 1
MY_POD_UID: (v1:metadata.uid)
MY_POD_IP: (v1:status.podIP)
MY_POD_NAMESPACE: awx (v1:metadata.namespace)
Mounts:
/etc/pki/ca-trust/extracted from ca-trust-extracted (rw)
/etc/pki/ca-trust/source/anchors/bundle-ca.crt from awx-bundle-cacert (ro,path=“bundle-ca.crt”)
/etc/receptor/ from awx-receptor-config (rw)
/etc/receptor/signing/work-private-key.pem from awx-receptor-work-signing (ro,path=“work-private-key.pem”)
/etc/tower/SECRET_KEY from awx-secret-key (ro,path=“SECRET_KEY”)
/etc/tower/conf.d/credentials.py from awx-application-credentials (ro,path=“credentials.py”)
/etc/tower/conf.d/execution_environments.py from awx-application-credentials (ro,path=“execution_environments.py”)
/etc/tower/conf.d/ldap.py from awx-application-credentials (ro,path=“ldap.py”)
/etc/tower/settings.py from awx-settings (ro,path=“settings.py”)
/var/lib/awx/projects from awx-projects (rw)
/var/lib/awx/rsyslog from rsyslog-dir (rw)
/var/run/awx-rsyslog from rsyslog-socket (rw)
/var/run/receptor from receptor-socket (rw)
/var/run/redis from awx-redis-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f96x7 (ro)
/var/run/supervisor from supervisor-socket (rw)
awx-ee:
Container ID:
Image: quay.io/ansible/awx-ee:latest
Image ID:
Port:
Host Port:
Args:
/bin/sh
-c
if [ ! -f /etc/receptor/receptor.conf ]; then
cp /etc/receptor/receptor-default.conf /etc/receptor/receptor.conf
sed -i “s/HOSTNAME/$HOSTNAME/g” /etc/receptor/receptor.conf
fi
exec receptor --config /etc/receptor/receptor.conf

State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was terminated
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted. The container used to be Running
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 1
Environment:
Mounts:
/etc/pki/ca-trust/extracted from ca-trust-extracted (rw)
/etc/pki/ca-trust/source/anchors/bundle-ca.crt from awx-bundle-cacert (ro,path=“bundle-ca.crt”)
/etc/receptor/ from awx-receptor-config (rw)
/etc/receptor/receptor-default.conf from awx-default-receptor-config (rw,path=“receptor.conf”)
/etc/receptor/signing/work-private-key.pem from awx-receptor-work-signing (ro,path=“work-private-key.pem”)
/etc/receptor/tls/ from awx-receptor-tls (rw)
/etc/receptor/tls/ca/receptor-ca.crt from awx-receptor-ca (ro,path=“tls.crt”)
/var/lib/awx/projects from awx-projects (rw)
/var/run/receptor from receptor-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f96x7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ca-trust-extracted:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-bundle-cacert:
Type: Secret (a volume populated by a Secret)
SecretName: awx-custom-certs
Optional: false
awx-ldap-cacert:
Type: Secret (a volume populated by a Secret)
SecretName: awx-custom-certs
Optional: false
awx-application-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: awx-app-credentials
Optional: false
awx-receptor-tls:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-receptor-ca:
Type: Secret (a volume populated by a Secret)
SecretName: awx-receptor-ca
Optional: false
awx-receptor-work-signing:
Type: Secret (a volume populated by a Secret)
SecretName: awx-receptor-work-signing
Optional: false
awx-secret-key:
Type: Secret (a volume populated by a Secret)
SecretName: awx-secret-key
Optional: false
awx-settings:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-awx-configmap
Optional: false
awx-nginx-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-awx-configmap
Optional: false
awx-redis-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-awx-configmap
Optional: false
awx-redis-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-redis-data:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
supervisor-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
rsyslog-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
receptor-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
rsyslog-dir:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-receptor-config:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-default-receptor-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-awx-configmap
Optional: false
awx-projects:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: awx-projects-claim
ReadOnly: false
kube-api-access-f96x7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 53m default-scheduler Successfully assigned awx/awx-9668dcb98-nzg5q to ansible
Normal Pulled 53m kubelet Container image “quay.io/ansible/awx-ee:latest” already present on machine
Normal Created 53m kubelet Created container init
Normal Started 53m kubelet Started container init
Normal Pulled 52m kubelet Container image “quay.io/centos/centos:stream9” already present on machine
Normal Created 52m kubelet Created container init-projects
Normal Started 52m kubelet Started container init-projects
Normal Pulled 52m kubelet Container image “docker.io/redis:7” already present on machine
Normal Created 52m kubelet Created container redis
Normal Started 52m kubelet Started container redis
Normal Pulled 52m kubelet Container image “quay.io/ansible/awx:21.13.0” already present on machine
Normal Created 52m kubelet Created container awx-web
Normal Started 52m kubelet Started container awx-web
Normal Pulled 52m kubelet Container image “quay.io/ansible/awx:21.13.0” already present on machine
Normal Created 52m kubelet Created container awx-task
Normal Started 52m kubelet Started container awx-task
Normal Pulled 52m kubelet Container image “quay.io/ansible/awx-ee:latest” already present on machine
Normal Created 52m kubelet Created container awx-ee
Normal Started 52m kubelet Started container awx-ee
Warning Evicted 12m kubelet The node was low on resource: ephemeral-storage. Container awx-ee was using 1008644Ki, which exceeds its request of 0. Container redis was using 36Ki, which exceeds its request of 0. Container awx-task was using 873704Ki, which exceeds its request of 0. Container awx-web was using 360Ki, which exceeds its request of 0.
Normal Killing 12m kubelet Stopping container redis
Normal Killing 12m kubelet Stopping container awx-ee
Normal Killing 12m kubelet Stopping container awx-task
Normal Killing 12m kubelet Stopping container awx-web
Warning ExceededGracePeriod 12m kubelet Container runtime did not kill the pod within specified grace period.
root@ansible:~/awx-on-k3s/base#

Last time I tried this there where no events.

this looks like a possible issue

Warning Evicted 12m kubelet The node was low on resource: ephemeral-storage. Container awx-ee was using 1008644Ki, which exceeds its request of 0. Container redis was using 36Ki, which exceeds its request of 0. Container awx-task was using 873704Ki, which exceeds its request of 0. Container awx-web was using 360Ki, which exceeds its request of 0.

and I see that there are no resources configured in awx.yaml

yeah the pods are being evicted due to low resources. It might be worth playing around and setting the requests values for the web, ee, and awx containers

https://github.com/ansible/awx-operator#containers-resource-requirements

AWX Team

Thank you for that assistance.

I configured the resource limits and then after that I discovered that the system was running out of disk space causing it to fall over.