Hello,
I am currently experiencing an issue with AWX where a job fails with the status “failed.” Unfortunately, the job does not provide any relevant error messages to help diagnose the problem, and we only get logs up to the success task. I am unable to determine the cause of the failure.
If anyone has encountered a similar issue or has suggestions on how to troubleshoot this further, I would greatly appreciate your input.
Attaching the screenhot for your refernce.
NB : we have deployed the application n a OCP cluster
A few things you can do:
1- Download the logs from the download icon and see if you get the full report
2- Refresh a few time the UI
3- Assuming this is k8s, find the pod the that was used to run your job, usually the nomenclature is automation-job-- and check the logs for that pod
I tired making the log level to 3 ( debug) But will the same… I was bale to see the detailed output of all tasks… But only upto the success once.
The failed task output itself is not showing.
Can you share what your tasks look like at the point of failure? It might help to know a little more context about what Ansible is trying to do.
From what it looks like, you’re running a sync operation on a pulp-rpm repository (are you using TheForeman/RedHatSatellite?), which may be a long-running task. It also looks like this is happening as an included task in a loop from the previous task (). Something weird could be happening with the loop vars, or perhaps the sync takes a long time to run and is causing the task to hang/timeout in a weird manner.
Have you confirmed that at least one of the pulp repo syncs actually occur, and do they finish successfully? Does pulp have timestamps to show how long the sync takes?
Can you share what your tasks look like at the point of failure? It might help to know a little more context about what Ansible is trying to do.
Its not some specific task that getting failed . AWX does not say about any failure .The task is to sync package from cdn redhat and is done everyday at midnight . Day before AWX was showing till 3rd sync task , there as yesterday it was showing till 8th sync task
From what it looks like, you’re running a sync operation on a pulp-rpm repository (are you using TheForeman/RedHatSatellite?), which may be a long-running task. It also looks like this is happening as an included task in a loop from the previous task (). Something weird could be happening with the loop vars, or perhaps the sync takes a long time to run and is causing the task to hang/timeout in a weird manner.
Have you confirmed that at least one of the pulp repo syncs actually occur, and do they finish successfully? Does pulp have timestamps to show how long the sync takes?
Yes ,sync with multiple repos worked. Same loop was working some time back (almost a month before) .Pacakges are updated incrementally and hence it wont take that long for a single task .
ahhh you have you answer then, do you know what are the memory resources defined for your instance groups/container groups?
you need to see first if you have enough memory request, then increase your memory limit to support bursts (e.g perhaps 2x your request). You can do so via podspec override in your container group/instance group