Looking for already finished work units?
We see the same thing, there also seems to be a related issue receptor#758.
Even though jobs run without error, receptor on Execution Node strangely logs every work unit with Error locating unit
and unknown
after the resp. job has finished. (See example below)
What could be the reason for this?
Is it possible receptor tries to receptorctl work release
after jobs finish and fails because of something like ansible-runner --delete
/podman run --rm
?
Our setup
- AWX 24.6.1 (Openshift, PostgreSQL) + Execution Nodes
- Same receptor version on awx-task instances and Execution Nodes
- receptor on ENs from GitHub release, hence
*v*1.5.3
):
# receptorctl --socket /var/run/receptor/receptor.sock version
Warning: receptorctl and receptor are different versions, they may not be compatible
receptorctl 1.5.3
receptor v1.5.3
- Custom EE image with
ENTRYPOINT ["/opt/builder/bin/entrypoint", "dumb-init"]
CMD ["ansible-runner", "worker", "--private-data-dir=/runner"]
awx-task instance: Log of job
2025-03-25 16:49:12,331 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 pre run {"type": "job", "task_id": 221782, "state": "pre_run", "work_unit_id": null, "task_name": "awx-tools/sleep"}
2025-03-25 16:49:13,315 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 preparing playbook {"type": "job", "task_id": 221782, "state": "preparing_playbook", "work_unit_id": null, "task_name": "awx-tools/sleep"}
2025-03-25 16:49:13,433 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 running playbook {"type": "job", "task_id": 221782, "state": "running_playbook", "work_unit_id": null, "task_name": "awx-tools/sleep"}
2025-03-25 16:49:13,985 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 work unit id received {"type": "job", "task_id": 221782, "state": "work_unit_id_received", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:14,074 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 work unit id assigned {"type": "job", "task_id": 221782, "state": "work_unit_id_assigned", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:38,693 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 221782
2025-03-25 16:49:38,744 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 post run {"type": "job", "task_id": 221782, "state": "post_run", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:38,984 INFO [769d7fc23c3d40bcbfb3174366dc93fa] awx.analytics.job_lifecycle job-221782 finalize run {"type": "job", "task_id": 221782, "state": "finalize_run", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
2025-03-25 16:49:39,467 INFO [-] awx.analytics.job_lifecycle job-221782 stats wrapup finished {"type": "job", "task_id": 221782, "state": "stats_wrapup_finished", "work_unit_id": "awxtask5bf78fc7b74s29djGTsAhAv", "task_name": "awx-tools/sleep"}
Execution Node: work unit
On Execution Node when looking for work units before/while processing they show up as expected:
# receptorctl --socket /var/run/receptor/receptor.sock work list
Warning: receptorctl and receptor are different versions, they may not be compatible
{
"awxtask5bf78fc7b74s29djGTsAhAv": {
"Detail": "Running: PID 845686",
"ExtraData": {
"Params": "worker --private-data-dir=/opt/awx/awx_tmp/awx_221782_54ztx1uq --delete",
"Pid": 845678
},
"State": 1,
"StateName": "Running",
"StdoutSize": 11094,
"WorkType": "ansible-runner"
}
}
Execution Node: receptor.log
On a Execution Node every work unit seems to be logged as ‘unknown’ after it has been processed like so:
# tail /var/log/receptor/receptor.log
ERROR 2025/03/25 17:48:38 : unknown work unit awxtask5bf78fc7b74s29djhpWRQXR
ERROR 2025/03/25 17:48:53 Error locating unit: awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:48:53 : unknown work unit awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:49:11 Error locating unit: awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:49:11 : unknown work unit awxtask5bf78fc7b74s29d3nz2ZCmX
ERROR 2025/03/25 17:49:38 Error locating unit: awxtask5bf78fc7b74s29djGTsAhAv
ERROR 2025/03/25 17:49:38 : unknown work unit awxtask5bf78fc7b74s29djGTsAhAv
AWX /api/v2/jobs/221782/
...
"failed": false,
"started": "2025-03-25T16:49:12.043306Z",
"finished": "2025-03-25T16:49:38.813620Z", # UTC == receptor log 17:49
"elapsed": 26.77,
"job_args": "[\"podman\", \"run\", \"--rm\", \"--tty\", \"--interactive\", \"--workdir\", \"/runner/project\", \"-v\", \"/opt/awx/awx_tmp/awx_221782_54ztx1uq/:/runner/:Z\", \"--env-file\", \"/opt/awx/awx_tmp/awx_221782_54ztx1uq/artifacts/221782/env.list\", \"--quiet\", \"--name\", \"ansible_runner_221782\", \"--user=root\", \"--log-level=info\", \"--mount=type=bind,src=/home/awx/mounts/10-awx-ssh.conf,dst=/etc/ssh/ssh_config.d/10-awx-ssh.conf,relabel=shared,ro=true\", \"--network=slirp4netns:enable_ipv6=true\", \"--userns=keep-id:uid=1001,gid=0\", \"--user=runner\", \"--cap-drop=ALL\", \"--pull=missing\", \"image-registry...\", \"ansible-playbook\", \"-u\", \"root\", \"--diff\", \"-l\", \"localhost\", \"-i\", \"/runner/inventory\", \"-e\", \"@/runner/env/extravars\", \"sleep.yml\"]",
"job_cwd": "/runner/project",