I’ve found that sometimes, in long-ish job runs, say 10-20 minutes or longer, the log output in the “Output” tab of the job that’s currently running will stop updating. When I look at the pod logs for the automation-task pod, the job is still running and logging, but the AWX UI is not updating with the new logs. This makes it seem like the job is stuck.
Often, when the job completes, the log data will load into the textarea, but while it’s running it’s stuck.
AWX 22.3.0, running in Google Kubernetes Engine (GKE).
“Pod logs stop being pulled…” - notes the final AWX job state is failed; in my case, the job eventually succeeds and is marked as such; the issue is during the execution of the run.
“Missing job output…” - also seems to be a case where the logs are unrecoverable.
In my case, this is just during the execution; I can’t tell if the job is actually still progressing without diving in using a the Kubernetes pod logs (using the Lens app). Also I should note that it’s not every execution; maybe 1 in 10 or so, but enough where it’s problematic.
For what it’s worth, I’ve now upgraded to 24.6.1 and this issue persists. In the UI, job has seem “hung” for about 10 minutes, but I can look at the pod logs of the “automation-job” and see that it’s still working. Would it make more sense to file an issue on Github for this?
This sounds more like the websocket stream is failing to be established or disconnects prematurely. If you open the developer console of your web browser, do you see warnings/errors about “wss://” connections? These are what streams the output to your session.
Unfortunately, issues with websockets streams are a recurring problem without any clear solution.