Awx jobs taking a long time to start

Hi,

Awx jobs are taking a long time to start. Although the job starts, it shows “Waiting for console output” for approx. 30 mins and then it starts executing it. In awx deployed on development environment, the jobs were running fast and everything was fine.
Even though the same version is deployed, facing the above issue on our stage environment.
The version of awx deployed is 23.5.0.

Any inputs on this would be really helpful.

Thanks in advance
awx

Depends on your projects settings + environment settings it may takes very long to download docker image + update your Git project for instance.

Do you always check updates + download your docker image ?
Same question for your projects repositories?

Hi @motorbass,
Sorry for the delay in getting back.
I tried removing the “Update revision on launch” and tried but that didn’t help. But we tried moving to NFS and that improved the performance a bit. But still there is a waiting time of about 5 to 6 minutes until a job actually starts.

And we have upgraded the version to 24.4.0

when you launch any AWX jobs, did you check how long your pods are getting online ? probably something is stuck or long to deploy and the result is your output that takes too much time ?
Simple cli to show that :

kubectl get pods --namespace=your-awx-namespace

Then retrieve the pod’s name that have been deployed for your job. Then look logs with

kubectl logs --tail=100 your-pod-name

You’ll get last 100 logs lines.

Hello @motorbass
From and when the job is started till 6 to 7 minutes when it actually starts giving the output, the wax-task pod shows the below logs. So maybe it’s waiting to get the work id unit?

2024-07-23 11:43:59,067 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-project_update_events-127 has no subscribers, shutting down.
2024-07-23 11:45:51,672 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-schedules-changed
2024-07-23 11:45:54,772 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-job_events-126
2024-07-23 11:46:01,675 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-schedules-changed has no subscribers, shutting down.
2024-07-23 11:47:32,647 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 preparing playbook {"type": "job", "task_id": 126, "state": "preparing_playbook", "work_unit_id": null, "task_name": "xxx"}
2024-07-23 11:47:32,757 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 running playbook {"type": "job", "task_id": 126, "state": "running_playbook", "work_unit_id": null, "task_name": "xxx"}
2024-07-23 11:48:22,835 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-schedules-changed
2024-07-23 11:48:24,793 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-job_events-126 has no subscribers, shutting down.
2024-07-23 11:48:26,456 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-job_events-126
2024-07-23 11:48:32,838 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-schedules-changed has no subscribers, shutting down.

After 6 minutes when the job actually started the logs are

2024-07-23 11:49:52,947 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 work unit id received {"type": "job", "task_id": 126, "state": "work_unit_id_received", "work_unit_id": "LT8o06Un", "task_name": "xxx"}
2024-07-23 11:49:53,014 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 work unit id assigned {"type": "job", "task_id": 126, "state": "work_unit_id_assigned", "work_unit_id": "LT8o06Un", "task_name": "xxx"}

Weird.
About pods logs, I was talking about the pod that is deployed when you job is launched, not the awx-tasks pod.
Are you using a “central” AWX or do you have a multi node design ?
Else, does your job need to download everytime Execution Environment ?

Maybe this link may help you ? Missing job output and log lines (Task was destroyed but it is pending!) · Issue #14158 · ansible/awx · GitHub

Hmm there is one strange thing that I notice. With this version of awx, I don’t have an automation pod running when the job is started. Is this something specific with the version?

awx-migration-24.4.0-8s6pf                         0/1     Completed   0             56d
awx-operator-controller-manager-6867869bb-5mxr6    2/2     Running     5 (17d ago)   56d
awx-postgres-15-0                                  1/1     Running     0             56d
awx-task-6f667f7686-9f6gt                          4/4     Running     0             40d
awx-web-5dbb7c9bc5-98bvk                           3/3     Running     0             40d

Could you send a screenshot of Administration> Topology View ?
It seems you got a multi nodes design isn’t it ?