Error Locating Unit: `K5Sp0VXg` - Unknown Work Unit in AWX/Ansible Tower

manish_singh · February 6, 2025, 3:27pm

Hi Ansible Community,

I’m encountering an issue in my AWX/Ansible Tower environment and would appreciate any insights or guidance on how to resolve it.

Error Details

The following error appears in the logs:

ERROR 2025/02/06 15:12:00 Error locating unit: K5Sp0VXg
ERROR 2025/02/06 15:12:00 unknown work unit K5Sp0VXg

This error occurs when the system is unable to locate a specific work unit (K5Sp0VXg). It seems to be related to task management, but I’m unsure of the root cause.

Environment Details

AWX Version: [22.7
Deployment Method: [AKS.]
Database: PostgreSQL
Logs: No additional errors in the task or web pod logs.

Steps Taken So Far

Checked the status of AWX task pods – all are running without issues.
Searched the database for the work unit K5Sp0VXg:
```
SELECT * FROM main_unifiedjob WHERE uuid = 'K5Sp0VXg';
```
The query returned no results, indicating the work unit is missing.
Verified task synchronization – the task was submitted via the AWX API, but it seems it wasn’t recorded in the database.
Restarted AWX task pods to clear any transient issues.

Questions

I have two worker node and this is happening on only one worker node.
What could cause a work unit to go missing in the database?
Are there known issues with task synchronization in AWX/Ansible Tower?
How can I prevent this issue from recurring?
Is there a way to recover or recreate the missing work unit without disrupting the system?

Additional Context

This issue occurs intermittently, and I’ve noticed similar errors for other work units (e.g., wMvEP6LC, 9tzJOrvg).
The system is configured to automatically clean up completed tasks after 30 days.

Any help or suggestions would be greatly appreciated!

Thanks in advance,

Regards,
Manish Singh

manish_singh · February 6, 2025, 10:39pm

Any input please on above request.

2and3makes23 · March 25, 2025, 5:22pm

Looking for already finished work units?

We see the same thing, there also seems to be a related issue receptor#758.

Even though jobs run without error, receptor on Execution Node strangely logs every work unit with Error locating unit and unknown after the resp. job has finished. (See example below)

What could be the reason for this?

Is it possible receptor tries to receptorctl work release after jobs finish and fails because of something like ansible-runner --delete/podman run --rm?

Our setup

AWX 24.6.1 (Openshift, PostgreSQL) + Execution Nodes

Same receptor version on awx-task instances and Execution Nodes

receptor on ENs from GitHub release, hence *v*1.5.3):

# receptorctl --socket /var/run/receptor/receptor.sock version
Warning: receptorctl and receptor are different versions, they may not be compatible
receptorctl  1.5.3
receptor     v1.5.3

Custom EE image with

ENTRYPOINT ["/opt/builder/bin/entrypoint", "dumb-init"]
CMD ["ansible-runner", "worker", "--private-data-dir=/runner"]

awx-task instance: Log of job

2025-03-25 16:49:12,331 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 pre run {"type": "job", "task_id": 221782, "state": "pre_run", "work_unit_id": null, "task_name": "awx-tools/sleep"}
2025-03-25 16:49:13,315 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 preparing playbook {"type": "job", "task_id": 221782, "state": "preparing_playbook", "work_unit_id": null, "task_name": "awx-tools/sleep"}
2025-03-25 16:49:13,433 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 running playbook {"type": "job", "task_id": 221782, "state": "running_playbook", "work_unit_id": null, "task_name": "awx-tools/sleep"}
2025-03-25 16:49:13,985 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 work unit id received {"type": "job", "task_id": 221782, "state": "work_unit_id_received", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:14,074 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 work unit id assigned {"type": "job", "task_id": 221782, "state": "work_unit_id_assigned", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:38,693 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 221782
2025-03-25 16:49:38,744 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 post run {"type": "job", "task_id": 221782, "state": "post_run", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:38,984 INFO     [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 finalize run {"type": "job", "task_id": 221782, "state": "finalize_run", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:39,467 INFO     [-] awx.analytics.job_lifecycle job-221782 stats wrapup finished {"type": "job", "task_id": 221782, "state": "stats_wrapup_finished", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}

Execution Node: work unit

On Execution Node when looking for work units before/while processing they show up as expected:

# receptorctl --socket /var/run/receptor/receptor.sock work list
Warning: receptorctl and receptor are different versions, they may not be compatible
{
    "awxtask5bf78fc7b74s29djGTsAhAv": {
        "Detail": "Running: PID 845686",
        "ExtraData": {
            "Params": "worker --private-data-dir=/opt/awx/awx_tmp/awx_221782_54ztx1uq --delete",
            "Pid": 845678
        },
        "State": 1,
        "StateName": "Running",
        "StdoutSize": 11094,
        "WorkType": "ansible-runner"
    }
}

Execution Node: receptor.log

On a Execution Node every work unit seems to be logged as ‘unknown’ after it has been processed like so:

# tail /var/log/receptor/receptor.log
ERROR 2025/03/25 17:48:38 : unknown work unit awxtask5bf78fc7b74s29djhpWRQXR
ERROR 2025/03/25 17:48:53 Error locating unit: awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:48:53 : unknown work unit awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:49:11 Error locating unit: awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:49:11 : unknown work unit awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:49:38 Error locating unit: awxtask5bf78fc7b74s29djGTsAhAv
ERROR 2025/03/25 17:49:38 : unknown work unit awxtask5bf78fc7b74s29djGTsAhAv

AWX /api/v2/jobs/221782/

  ...
  "failed": false,
  "started": "2025-03-25T16:49:12.043306Z",
  "finished": "2025-03-25T16:49:38.813620Z", # UTC == receptor log 17:49
  "elapsed": 26.77,
  "job_args": "[\"podman\", \"run\", \"--rm\", \"--tty\", \"--interactive\", \"--workdir\", \"/runner/project\", \"-v\", \"/opt/awx/awx_tmp/awx_221782_54ztx1uq/:/runner/:Z\", \"--env-file\", \"/opt/awx/awx_tmp/awx_221782_54ztx1uq/artifacts/221782/env.list\", \"--quiet\", \"--name\", \"ansible_runner_221782\", \"--user=root\", \"--log-level=info\", \"--mount=type=bind,src=/home/awx/mounts/10-awx-ssh.conf,dst=/etc/ssh/ssh_config.d/10-awx-ssh.conf,relabel=shared,ro=true\", \"--network=slirp4netns:enable_ipv6=true\", \"--userns=keep-id:uid=1001,gid=0\", \"--user=runner\", \"--cap-drop=ALL\", \"--pull=missing\", \"image-registry...\", \"ansible-playbook\", \"-u\", \"root\", \"--diff\", \"-l\", \"localhost\", \"-i\", \"/runner/inventory\", \"-e\", \"@/runner/env/extravars\", \"sleep.yml\"]",
    "job_cwd": "/runner/project",

golakiyaalice · March 26, 2025, 1:54am

@2and3makes23 @manish_singh

github.com/ansible/receptor

receptor: Error locating unit

opened 07:16AM - 13 Oct 22 UTC

anxstj

type:bug component:receptor

### Please confirm the following - [X] I agree to follow this project's [code o…f conduct](https://docs.ansible.com/ansible/latest/community/code_of_conduct.html). - [X] I have checked the [current issues](https://github.com/ansible/awx/issues) for duplicates. - [X] I understand that AWX is open source software provided for free and that I might not receive a timely response. ### Bug Summary My receptor services on my execution nodes show the following errors: ``` ERROR 2022/09/27 16:07:41 Error locating unit: SLpl8dHZ ERROR 2022/09/27 16:07:41 unknown work unit SLpl8dHZ ``` It seems that it shows up whenever a job finishes. The jobs are working, though. And AWX doesn't show any additional error messages. What could cause this? And how can I debug it? I'm running AWX 21.5.0 and receptor 1.2.0+g72a97e5 Receptor is installed with the AWX image: Dockerfile: ``` COPY --from={{ receptor_image }} /usr/bin/receptor /usr/bin/receptor ``` Makefile: ``` RECEPTOR_IMAGE ?= [quay.io/ansible/receptor:devel](http://quay.io/ansible/receptor:devel) ``` ### AWX version 21.5.0 ### Select the relevant components - [ ] UI - [ ] API - [ ] Docs - [ ] Collection - [ ] CLI - [X] Other ### Installation method docker development environment ### Modifications no ### Ansible version 2.12.2 ### Operating system Debian 11 ### Web browser Firefox ### Steps to reproduce Create a setup with two controller nodes and two execution nodes. Then execute a job on one of the execution nodes. The job should succeed, but receptor will log a similar error message as mentioned above with the end of the job. ### Expected results No error message. ### Actual results ``` ERROR 2022/09/27 16:07:41 Error locating unit: SLpl8dHZ ERROR 2022/09/27 16:07:41 unknown work unit SLpl8dHZ ``` ### Additional information _No response_

I was able to fix this.
my executor was running behind the firewall and podman was not able to fetch the image from the quay.io registry.
Either get your container launched using the image available in your environment or either make sure your executor is able to reach to the quay repos.
This issue can be closed.

2and3makes23 · April 2, 2025, 8:52am

This must be a different issue than you @golakiyaalice had. What you describe (no access to image) does not match the scenario described here (jobs are executed successfully using resp. images).

Topic		Replies	Views
receptor error: Error locating unit AWX Project awx	4	145	October 14, 2022
Execution Environment failed to run any job on awx-ee:latest and awx-ee:23.6.0 Get Help awx , collections , ansible-core , awx-operator , kubernetes , ee	4	1765	February 11, 2024
Awx-manage: command not found Get Help awx , awx-operator , kubernetes	0	56	March 14, 2025
Receptor error - ModuleNotFoundError: No module named 'ansible_runner' Get Help awx , ansible-runner , receptor	2	538	August 17, 2024
AWX 19.4/19.5 Error in asyn task AWX Project awx , windows , kubernetes	6	2	February 22, 2022