The jobs ran from AWX is failing intermitently, not much information in the logs either.
Hoping someone here would’ve seen it earlier and found a fix for it.
Let me tell my AWX setup,
Deployed awx (9.0) on Kubernetes cluster with external postgres db. Scaled two replicas and mapped it to instance groups.
awx-0 (instance group 1) & awx-1 (instance group 2)
When I’m submitting the job to either of the instance groups, it fails intermittently. Looks like the job isnt getting scheduled on a timely manner and it fails to execute.
Please see the error logs from the celery container.