Hi Ansible Community,
I’m encountering an issue in my AWX/Ansible Tower environment and would appreciate any insights or guidance on how to resolve it.
Error Details
The following error appears in the logs:
ERROR 2025/02/06 15:12:00 Error locating unit: K5Sp0VXg
ERROR 2025/02/06 15:12:00 unknown work unit K5Sp0VXg
This error occurs when the system is unable to locate a specific work unit (K5Sp0VXg
). It seems to be related to task management, but I’m unsure of the root cause.
Environment Details
- AWX Version: [22.7
- Deployment Method: [AKS.]
- Database: PostgreSQL
- Logs: No additional errors in the task or web pod logs.
Steps Taken So Far
- Checked the status of AWX task pods – all are running without issues.
- Searched the database for the work unit
K5Sp0VXg
:
The query returned no results, indicating the work unit is missing.SELECT * FROM main_unifiedjob WHERE uuid = 'K5Sp0VXg';
- Verified task synchronization – the task was submitted via the AWX API, but it seems it wasn’t recorded in the database.
- Restarted AWX task pods to clear any transient issues.
Questions
- I have two worker node and this is happening on only one worker node.
- What could cause a work unit to go missing in the database?
- Are there known issues with task synchronization in AWX/Ansible Tower?
- How can I prevent this issue from recurring?
- Is there a way to recover or recreate the missing work unit without disrupting the system?
Additional Context
- This issue occurs intermittently, and I’ve noticed similar errors for other work units (e.g.,
wMvEP6LC
,9tzJOrvg
). - The system is configured to automatically clean up completed tasks after 30 days.
Any help or suggestions would be greatly appreciated!
Thanks in advance,
Regards,
Manish Singh