Increasing Replicas from 1 to 2 Fails

Brandon_Morris · December 9, 2022, 9:18pm

AWX version: 21.9.0
awx-operator version: 1.1.0

I have awx-operator deployed in a k3s multi-master cluster. I increased the # of replicas in the awx CRD from 1 to 2. awx deployment and replicaset correctly get increased.

the new awx pod gets scheduled but fails to init.

Attached are the awx-manager logs. Those seem to indicate a problem with the “Apply deployment resources” TASK.

kubectl events for the failed pod in the same attachement.

My PVCs are also set to RWX.

(attachments)

troubleshooting_replica_Increase.txt (52.6 KB)

AWX_Project · December 14, 2022, 7:46pm

Hello,
It sounds like the project’s PVC may be the culprit. Project persistence is not required. When the pods come back up they will pull your projects anew from SCM. Perhaps try reconfiguring this to not require project persistence and see if that helps. Please let us know if it does.

This does sound like it may be a bug with the project’s persistence feature. Is there an issue associated with this report?

-AWX Team

Brandon_Morris · December 15, 2022, 2:55am

Thank you for the suggestion! I will give this a go in the morning and will report back.

Brandon_Morris · December 15, 2022, 7:00pm

Changing project persistence to false did the trick. I now have two instance deployed and both are reporting healthy when looking at them from within the AWX GUI.

The only thing I am seeing now is that the PVC for the Postgres pod is complaining about a failed mount:

Warning FailedMount 60s (x487 over 16h) kubelet MountVolume.MountDevice failed for volume “pvc-790c4558-4718-4a5f-93ac-df7fa8383988” : rpc error: code = FailedPrecondition desc = volume pvc-790c4558-4718-4a5f-93ac-df7fa8383988 requires shared access but is not marked for shared use

I don’t seem to be seeing any adverse affects from the above. The GUI is still accessible, controlplane jobs have ran on both instances. Normal jobs being executed in the default ContainerGroup are working just fine.

Do I just ignore this or is something just waiting to leap out and bite me?

Thank you for the help!

AWX_Project · December 16, 2022, 6:14pm

yeah probably can ignore the warning. wondering what “kubectl describe pvc/pvc-790c4558-4718-4a5f-93ac-df7fa8383988” returns? what is this pvc intended for?

Brandon_Morris · December 19, 2022, 7:17pm

This is the postgres-13 PVC.

There are no events when I look under the describe for both the PV and the PVC.

PVC and PV output attached.

(attachments)

PVC_Troubleshooting.txt (2.01 KB)

AWX_Project · December 21, 2022, 7:58pm

although benign, the warning might be something we wish to fix. would you mind opening an issue in awx-operator for this? thanks!

AWX Team

Topic		Replies	Views
AWX restore fails (postgres part succeeds however) AWX Project awx	4	23	August 3, 2022
AWX 19 kubernetes AWX Project awx , kubernetes	15	25	September 6, 2022
Why does awx-operator scale my awx-task and awx-web to 0 replicas after startup Get Help awx , awx-operator , kubernetes	7	438	October 19, 2024
AWX pods not deployed by operator (0.13.0) on K8s 1.22 AWX Project awx , kubernetes	0	7	September 28, 2022
Installation AWX using awx-operator 0.9.0 on Kubernetes cluster AWX Project awx , kubernetes	9	54	July 1, 2021

Increasing Replicas from 1 to 2 Fails

Related topics