Error Locating Unit: `K5Sp0VXg` - Unknown Work Unit in AWX/Ansible Tower

Hi Ansible Community,

I’m encountering an issue in my AWX/Ansible Tower environment and would appreciate any insights or guidance on how to resolve it.

Error Details

The following error appears in the logs:

ERROR 2025/02/06 15:12:00 Error locating unit: K5Sp0VXg
ERROR 2025/02/06 15:12:00 unknown work unit K5Sp0VXg

This error occurs when the system is unable to locate a specific work unit (K5Sp0VXg). It seems to be related to task management, but I’m unsure of the root cause.

Environment Details

  • AWX Version: [22.7
  • Deployment Method: [AKS.]
  • Database: PostgreSQL
  • Logs: No additional errors in the task or web pod logs.

Steps Taken So Far

  1. Checked the status of AWX task pods – all are running without issues.
  2. Searched the database for the work unit K5Sp0VXg:
    SELECT * FROM main_unifiedjob WHERE uuid = 'K5Sp0VXg';
    
    The query returned no results, indicating the work unit is missing.
  3. Verified task synchronization – the task was submitted via the AWX API, but it seems it wasn’t recorded in the database.
  4. Restarted AWX task pods to clear any transient issues.

Questions

  1. I have two worker node and this is happening on only one worker node.
  2. What could cause a work unit to go missing in the database?
  3. Are there known issues with task synchronization in AWX/Ansible Tower?
  4. How can I prevent this issue from recurring?
  5. Is there a way to recover or recreate the missing work unit without disrupting the system?

Additional Context

  • This issue occurs intermittently, and I’ve noticed similar errors for other work units (e.g., wMvEP6LC, 9tzJOrvg).
  • The system is configured to automatically clean up completed tasks after 30 days.

Any help or suggestions would be greatly appreciated!

Thanks in advance,

Regards,
Manish Singh

Any input please on above request.