Awx jobs taking a long time to start

Hi,

Awx jobs are taking a long time to start. Although the job starts, it shows “Waiting for console output” for approx. 30 mins and then it starts executing it. In awx deployed on development environment, the jobs were running fast and everything was fine.
Even though the same version is deployed, facing the above issue on our stage environment.
The version of awx deployed is 23.5.0.

Any inputs on this would be really helpful.

Thanks in advance
awx

Depends on your projects settings + environment settings it may takes very long to download docker image + update your Git project for instance.

Do you always check updates + download your docker image ?
Same question for your projects repositories?

Hi @motorbass,
Sorry for the delay in getting back.
I tried removing the “Update revision on launch” and tried but that didn’t help. But we tried moving to NFS and that improved the performance a bit. But still there is a waiting time of about 5 to 6 minutes until a job actually starts.

And we have upgraded the version to 24.4.0

when you launch any AWX jobs, did you check how long your pods are getting online ? probably something is stuck or long to deploy and the result is your output that takes too much time ?
Simple cli to show that :

kubectl get pods --namespace=your-awx-namespace

Then retrieve the pod’s name that have been deployed for your job. Then look logs with

kubectl logs --tail=100 your-pod-name

You’ll get last 100 logs lines.

Hello @motorbass
From and when the job is started till 6 to 7 minutes when it actually starts giving the output, the wax-task pod shows the below logs. So maybe it’s waiting to get the work id unit?

2024-07-23 11:43:59,067 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-project_update_events-127 has no subscribers, shutting down.
2024-07-23 11:45:51,672 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-schedules-changed
2024-07-23 11:45:54,772 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-job_events-126
2024-07-23 11:46:01,675 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-schedules-changed has no subscribers, shutting down.
2024-07-23 11:47:32,647 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 preparing playbook {"type": "job", "task_id": 126, "state": "preparing_playbook", "work_unit_id": null, "task_name": "xxx"}
2024-07-23 11:47:32,757 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 running playbook {"type": "job", "task_id": 126, "state": "running_playbook", "work_unit_id": null, "task_name": "xxx"}
2024-07-23 11:48:22,835 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-schedules-changed
2024-07-23 11:48:24,793 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-job_events-126 has no subscribers, shutting down.
2024-07-23 11:48:26,456 INFO     [-] awx.main.wsrelay Starting producer for 10.240.0.206-job_events-126
2024-07-23 11:48:32,838 INFO     [-] awx.main.wsrelay Producer 10.240.0.206-schedules-changed has no subscribers, shutting down.

After 6 minutes when the job actually started the logs are

2024-07-23 11:49:52,947 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 work unit id received {"type": "job", "task_id": 126, "state": "work_unit_id_received", "work_unit_id": "LT8o06Un", "task_name": "xxx"}
2024-07-23 11:49:53,014 INFO     [cca6366412624da2aeb28ac39499755b] awx.analytics.job_lifecycle job-126 work unit id assigned {"type": "job", "task_id": 126, "state": "work_unit_id_assigned", "work_unit_id": "LT8o06Un", "task_name": "xxx"}

Weird.
About pods logs, I was talking about the pod that is deployed when you job is launched, not the awx-tasks pod.
Are you using a “central” AWX or do you have a multi node design ?
Else, does your job need to download everytime Execution Environment ?

Maybe this link may help you ? Missing job output and log lines (Task was destroyed but it is pending!) · Issue #14158 · ansible/awx · GitHub

Hmm there is one strange thing that I notice. With this version of awx, I don’t have an automation pod running when the job is started. Is this something specific with the version?

awx-migration-24.4.0-8s6pf                         0/1     Completed   0             56d
awx-operator-controller-manager-6867869bb-5mxr6    2/2     Running     5 (17d ago)   56d
awx-postgres-15-0                                  1/1     Running     0             56d
awx-task-6f667f7686-9f6gt                          4/4     Running     0             40d
awx-web-5dbb7c9bc5-98bvk                           3/3     Running     0             40d

Could you send a screenshot of Administration> Topology View ?
It seems you got a multi nodes design isn’t it ?

This is the topology view

Hi Manu,
Almost certainly, you just have a large project dir, see here: https://groups.google.com/g/awx-project/c/EslXf4E9Z5M

to quote:

AWX is building up the private data dir. In this period of time it copies over project files, playbook variables, and inventory hosts file over to a temporary directory (located on /tmp). It then zips up this directory and submits it to receptor, and then receptor spins up the automation job pod and starts running the playbook.

I had a project dir with some installer files in it which bloated it out to 1.4 GB, so to zip this up and then unzip it into a /tmp dir was wasting several minutes, while AWX looks like it is sittng there doing nothing.

As soon as i shrunk project dir, jobs launch within 10 seconds or so.

1 Like

Oh, and I’m pretty sure you DO have an automation pod - but it does not get created until the zip/transfer/unzip is done. And then it will run and exit probably pretty quicky, so it is easy to miss.