AWX Job scheduling issue

hello !

would anyone have an idea what could cause the awx task pods to take upwards of ~10 minutes to launch a job in kubernetes (container groups)?

in my setup you can assume there’s:

  • 4 web pods
  • 6 task pods (limits = 1CPU, 8Gi )

e.g logs from the task pod (task container)

│ 2026-01-29 02:22:40,607 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-job_events-408215                                                                                                    │
│ 2026-01-29 02:22:40,607 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-jobs-summary                                                                                                         │
│ 2026-01-29 02:23:16,180 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-control-limit_reached_1 has no subscribers, shutting down.                                                                        │
│ 2026-01-29 02:23:16,181 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-jobs-status_changed has no subscribers, shutting down.                                                                            │
│ 2026-01-29 02:23:20,612 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-jobs-summary has no subscribers, shutting down.                                                                                   │
│ 2026-01-29 02:23:20,612 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-job_events-408215 has no subscribers, shutting down.                                                                              │
│ 2026-01-29 02:23:20,779 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-jobs-status_changed                                                                                                  │
│ 2026-01-29 02:23:20,779 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-project_update_events-408217                                                                                         │
│ 2026-01-29 02:23:20,779 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-jobs-summary                                                                                                         │
│ 2026-01-29 02:23:56,813 INFO     [-] awx.main.wsrelay Starting producer for 172.22.12.133-project_update_events-408217                                                                                        │
│ 2026-01-29 02:23:56,813 INFO     [-] awx.main.wsrelay Starting producer for 172.22.12.133-jobs-summary                                                                                                        │
│ 2026-01-29 02:24:00,783 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-jobs-summary has no subscribers, shutting down.                                                                                   │
│ 2026-01-29 02:24:00,783 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-project_update_events-408217 has no subscribers, shutting down.                                                                   │
│ 2026-01-29 02:24:00,783 INFO     [-] awx.main.wsrelay Producer 172.22.14.47-jobs-status_changed has no subscribers, shutting down.                                                                            │
│ 2026-01-29 02:24:23,277 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-jobs-status_changed                                                                                                  │
│ 2026-01-29 02:24:23,277 INFO     [-] awx.main.wsrelay Starting producer for 172.22.14.47-control-limit_reached_1                                                                                              │
│ 2026-01-29 02:24:24,603 INFO     [-] awx.main.wsrelay Starting producer for 172.22.12.133-job_events-408215                                                                                                   │
│ 2026-01-29 02:24:31,537 INFO     [-] awx.main.wsrelay Producer 172.22.10.1-jobs-summary has no subscribers, shutting down.                                                                                    │
│ 2026-01-29 02:24:31,537 INFO     [-] awx.main.wsrelay Producer 172.22.10.1-job_events-408215 has no subscribers, shutting down.                                                                               │
│ 2026-01-29 02:24:54,607 INFO     [-] awx.main.wsrelay Producer 172.22.12.133-job_events-408215 has no subscribers, shutting down.

UI view

@Denney-tech any idea?

Hey @cnfrancis

Could you provide the community with some more details about the infrastructure please. It may help the community with helping you troubleshoot this.

  • What is the version of AWX?
  • What is the underlying Kubernetes deployment: K3s, OpenShift etc. Versions would be helpful too.
  • You mentioned container groups, are you referring to using a custom instance group in AWX. Do you have any custom Kubernetes deployment manifest for the automation job pods and you have assigned this to your Job Template in AWX?
  • Have you increased the logging to debug in AWX.
  • Do you see the automation job pod launch inside the awx namespace?
    kubectl -n awx get po
    kubectl -n awx describe po <AWX Automation Job POD name>
    
    If you can see the POD, do the Events in K8s give you any indication as to why the POD isn’t starting?
  • Do the logs for the AWX task pod show any errors?
2 Likes

Sorry I’m late to the party. @dbrennand has already asked for most/all the relevant information to help you troubleshoot further.

Without seeing any of that, I would guess that you have run out of available resources on your k8s cluster. The awx-task pods are deployed to the control plane as control nodes. These can only run system level jobs like the management cleanup, project and inventory syncs, etc. These cannot run user-defined automation jobs from job_templates. You can deploy execution nodes for AWX, but those cannot live in the control plane either. Hybrid nodes are not supported in AWX; you would need AAP to deploy those. With this in mind, you probably don’t need 6 task pods, and you probably don’t need so many web pods unless you have a lot of users. I would recommend adjusting those down to 2 pods each.

As for your container groups and the automation-jobs, I suspect that the long startup times is due to AWX throwing the job into queue and waiting for enough resource availability to spin up a new pod. The default allocations are usually fine, so long as the cluster has plenty of resources, but if the cluster has any policy enforcement, you might need to adjust the container groups’ spec to include the resource limits and requests.

This is the default instance group (container group type) and you can check the Customize pod spec here to specify the resources in the containers sub key of the spec. This snippet is from AAP 2.6, so your UI may look a bit different.

TL;DR
You might not have enough resources on your k8s cluster

  • Limit AWX to 2 web and 2 task pods (frees up resources allocated to the extra pods)
  • [Optional] - Add resource requests/limits to the container group specs

There could be something else happening, but we need more info to determine what exactly.