AWX crashes suddenly

I have AWX running on a VM with the following spec:

  • 8 vCPU
  • 16 GB of RAM

OS - Ubuntu 22.04.1 LTS
AWX - AWX 23.5.0
running on K3S node :v1.28.4+k3s2

The AWX crashes suddenly even sometime with one jobs with low resource error,could you please suggest how can I increase resources for all the pods

Warning Evicted 9m3s kubelet The node was low on resource: ephemeral-storage. Threshold quantity: 536346631, available: 1062996Ki. Container redis was using 112Ki, request is 0, has larger consumption of ephemeral-storage. Container awx-task was using 180332Ki, request is 0, has larger consumption of ephemeral-storage. Container awx-ee was using 84764Ki, request is 0, has larger consumption of ephemeral-storage. Container awx-rsyslog was using 24Ki, request is 0, has larger consumption of ephemeral-storage

pod/awx-task-847b58df79-jt5v7 0/4 ContainerStatusUnknown 12 (42m ago) 17h
pod/awx-operator-controller-manager-66d787886f-9s8gs 0/2 Error 9 (42m ago) 17h
pod/awx-task-847b58df79-9jrnt 0/4 ContainerStatusUnknown 3 26m
pod/automation-job-1816-mzcx7 0/1 Error 0 12m
pod/awx-task-847b58df79-tqzdq 0/4 Init:1/2 0 12m
pod/awx-operator-controller-manager-66d787886f-7jcck 0/2 Error 1 26m

Can we do it without reinstalling.

Hi, increasing free space for /var/lib/rancher may help you.

On k3s node, ephemeral storage consumes under this path.

2 Likes

Thanks,I have extended filesystem and will monitor it.

This. I’ve had to increase my ephemeral storage a few times.

For live troubleshooting, use “kubectl get pods” to find your task pod name. Use the -n argument for that command if you configured AWX in a namespace.

Then use “kubectl describe pod <pod_name>” to get details about the pod, again use -n if you used a namespace.

This is the error I see when I have ephemeral storage issues:
0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.