Awx jobs are taking a long time to start. Although the job starts, it shows “Waiting for console output” for approx. 30 mins and then it starts executing it. In awx deployed on development environment, the jobs were running fast and everything was fine.
Even though the same version is deployed, facing the above issue on our stage environment.
The version of awx deployed is 23.5.0.
Hi @motorbass,
Sorry for the delay in getting back.
I tried removing the “Update revision on launch” and tried but that didn’t help. But we tried moving to NFS and that improved the performance a bit. But still there is a waiting time of about 5 to 6 minutes until a job actually starts.
when you launch any AWX jobs, did you check how long your pods are getting online ? probably something is stuck or long to deploy and the result is your output that takes too much time ?
Simple cli to show that :
kubectl get pods --namespace=your-awx-namespace
Then retrieve the pod’s name that have been deployed for your job. Then look logs with
Hello @motorbass
From and when the job is started till 6 to 7 minutes when it actually starts giving the output, the wax-task pod shows the below logs. So maybe it’s waiting to get the work id unit?
2024-07-23 11:43:59,067 INFO [-] awx.main.wsrelay Producer 10.240.0.206-project_update_events-127 has no subscribers, shutting down.
2024-07-23 11:45:51,672 INFO [-] awx.main.wsrelay Starting producer for 10.240.0.206-schedules-changed
2024-07-23 11:45:54,772 INFO [-] awx.main.wsrelay Starting producer for 10.240.0.206-job_events-126
2024-07-23 11:46:01,675 INFO [-] awx.main.wsrelay Producer 10.240.0.206-schedules-changed has no subscribers, shutting down.
2024-07-23 11:47:32,647 INFO [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 preparing playbook {"type": "job", "task_id": 126, "state": "preparing_playbook", "work_unit_id": null, "task_name": "xxx"}
2024-07-23 11:47:32,757 INFO [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 running playbook {"type": "job", "task_id": 126, "state": "running_playbook", "work_unit_id": null, "task_name": "xxx"}
2024-07-23 11:48:22,835 INFO [-] awx.main.wsrelay Starting producer for 10.240.0.206-schedules-changed
2024-07-23 11:48:24,793 INFO [-] awx.main.wsrelay Producer 10.240.0.206-job_events-126 has no subscribers, shutting down.
2024-07-23 11:48:26,456 INFO [-] awx.main.wsrelay Starting producer for 10.240.0.206-job_events-126
2024-07-23 11:48:32,838 INFO [-] awx.main.wsrelay Producer 10.240.0.206-schedules-changed has no subscribers, shutting down.
After 6 minutes when the job actually started the logs are
2024-07-23 11:49:52,947 INFO [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 work unit id received {"type": "job", "task_id": 126, "state": "work_unit_id_received", "work_unit_id": "LT8o06Un", "task_name": "xxx"}
2024-07-23 11:49:53,014 INFO [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 work unit id assigned {"type": "job", "task_id": 126, "state": "work_unit_id_assigned", "work_unit_id": "LT8o06Un", "task_name": "xxx"}
Weird.
About pods logs, I was talking about the pod that is deployed when you job is launched, not the awx-tasks pod.
Are you using a “central” AWX or do you have a multi node design ?
Else, does your job need to download everytime Execution Environment ?
Hmm there is one strange thing that I notice. With this version of awx, I don’t have an automation pod running when the job is started. Is this something specific with the version?
AWX is building up the private data dir. In this period of time it copies over project files, playbook variables, and inventory hosts file over to a temporary directory (located on /tmp). It then zips up this directory and submits it to receptor, and then receptor spins up the automation job pod and starts running the playbook.
I had a project dir with some installer files in it which bloated it out to 1.4 GB, so to zip this up and then unzip it into a /tmp dir was wasting several minutes, while AWX looks like it is sittng there doing nothing.
As soon as i shrunk project dir, jobs launch within 10 seconds or so.
Oh, and I’m pretty sure you DO have an automation pod - but it does not get created until the zip/transfer/unzip is done. And then it will run and exit probably pretty quicky, so it is easy to miss.