AWX Slowness issue - Long start time for jobs-AWX 24.6.1

I have recently installed AWX 24.6.1 in RHEL 9 machine and i have synced my projects playbook but while i try simple ping playbook, it took 2 mins to complete the job. I have attached screen shot. Please help me to fix it.
I have referred previous topics and applied necessary changed but still same slowness issue. I have attached refence image.

To be more clear.

Long start time for jobs

I’ve got a issue with jobs taking a “long” time to start 1 min 30 secs. scaling up the resources on the node has also been done but no improvements. Also I have referred AWX forum and changed some settings but still no luck.

I have pasted one playbook output and it took 2 min 29 secs to complete. Even simple ping play book take 1 min 30sec for single host.

Please advise us incase of any suggestions to fix the error.

I have attached logs. For your reference.

AWX slowness log

@vinothravindhran
Hi, I saw you’ve posted the same question on my repo today, but this forum is a good place to discuss this topic, so I’m writing here.

Roughly, whole lifecycle of the job is following:

  • Start the job from Web UI (Web)
  • Dispatch the job (Task)
  • Request to Kubernetes API to create the new ephemeral automation job pod to run playbook (Receptor)
  • Schedule the pod to the available node (Kubernetes)
  • Pull the container image from container registory if necessary (Kubernetes)
  • Transfer the project content by using Kubernetes API from the task pod to the automation job pod (Receptor and Kubernetes)
  • Run the playbook (Ansible)
  • Transfer the result by using Kubernetes API from the automation job pod to the task pod (Receptor and Kubernetes)
  • Process the result logs (Task)

So you should investigate which steps consume much time by:

  • Watch the logs of the task pod
  • Watch the result of kubectl -n awx get pod to see how soon the automation job pod is visible
  • Gather the events of the automation job pod by kubectl -n awx describe pod <automation job pod name>

Have you configured the image pull policy for your EE? How many nodes does your Kubernetes cluster have?

@kurokobo
Thank you very much for your continuous support.

Please find my inline response.
Please let know if needed more information to troubleshoot further.

Watch the logs of the task pod

kubectl -n awx describe pod

[root@g20-ansible-cxp ~]# kubectl -n awx describe pod automation-job-416-4242l
Name: automation-job-416-4242l
Namespace: awx
Priority: 0
Service Account: default
Node: g20-ansible-cxp/IP address
Start Time: Tue, 24 Sep 2024 18:37:06 -0700
Labels: ansible-awx=d6ede91a-7d86-4ba3-afca-49c4d07be3e6
ansible-awx-job-id=416
Annotations:
Status: Succeeded
IP: 10.42.0.46
IPs:
IP: 10.42.0.46
Containers:
worker:
Container ID: containerd://de93df6da8b96bbe9062afb95d1ad654329bb18ec542a374511a076bf26401f1
Image: Quay
Image ID: quay.io/ansible/awx-ee@sha256:89593d2a0268acacdbd1c075d8810ab50d961cf7769e36de19219bb6cb4efcc1
Port:
Host Port:
Args:
ansible-runner
worker
–private-data-dir=/runner
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 24 Sep 2024 18:37:07 -0700
Finished: Tue, 24 Sep 2024 18:37:24 -0700
Ready: False
Restart Count: 0
Requests:
cpu: 250m
memory: 100Mi
Environment:
ANSIBLE_RUNNER_KEEPALIVE_SECONDS: 30
Mounts:
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 86s default-scheduler Successfully assigned awx/automation-job-416-4242l to g20-ansible-cxp
Normal Pulled 86s kubelet Container image “Quay” already present on machine
Normal Created 86s kubelet Created container worker
Normal Started 86s kubelet Started container worker
[root@g20-ansible-cxp ~]#

Could you please gather new logs for the playbook with simple ping test for 1 host?
Your previous logs contain part of full logs;

  • The logs for the id 389 (ping_test): just a part of whole processes.
  • The logs for the id 390 (AZUL JAVA install): I can see it takes 2.5 min but I have no idea if this is too long since I don’t know what the playbook AZUL JAVA install does.

Thank you for sharing Job lifecycle details.
Here i have attached both Ping job log details and AWX-task log details.

It seems jobs started around 2024-09-26 04:23:03 and completed at 2024-09-26 04:24:44.
total 1 mins 30sec.

JOB logs:-
[root@server ~]# kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
automation-job-419-2hglm 0/1 Completed 0 28s
awx-operator-controller-manager-745b55d94b-ctcfb 2/2 Running 4 (6d17h ago) 15d
awx-postgres-15-0 1/1 Running 6 (6d17h ago) 51d
awx-task-55d8c9dd9f-4l7w6 4/4 Running 0 3m11s
awx-web-59498d8589-98bph 3/3 Running 6 (6d17h ago) 28d
[root@server ~]# kubectl logs automation-job-419-2hglm -n awx
{“status”: “starting”, “runner_ident”: “419”, “command”: [“ssh-agent”, “sh”, “-c”, “trap ‘rm -f /runner/artifacts/419/ssh_key_data’ EXIT && ssh-add /runner/artifacts/419/ssh_key_data && rm -f /runner/artifacts/419/ssh_key_data && ansible-playbook -u ansible-cxp -l targer_server.com -i /runner/inventory/hosts -e @/runner/env/extravars ping.yml”], “env”: {“KUBERNETES_SERVICE_PORT_HTTPS”: “443”, “KUBERNETES_SERVICE_PORT”: “443”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_SERVICE_HOST”: “10.43.106.140”, “HOSTNAME”: “automation-job-419-2hglm”, “AWX_SERVICE_PORT_80_TCP”: “tcp://10.43.61.35:80”, “PWD”: “/runner”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP_PORT”: “8443”, “PIP_BREAK_SYSTEM_PACKAGES”: “1”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP_PROTO”: “tcp”, “AWX_SERVICE_SERVICE_PORT”: “80”, “HOME”: “/runner”, “KUBERNETES_PORT_443_TCP”: “tcp://10.43.0.1:443”, “ANSIBLE_RUNNER_KEEPALIVE_SECONDS”: “30”, “AWX_SERVICE_PORT_80_TCP_ADDR”: “10.43.61.35”, “AWX_SERVICE_PORT”: “tcp://10.43.61.35:80”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_PORT”: “tcp://10.43.106.140:8443”, “AWX_SERVICE_PORT_80_TCP_PORT”: “80”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_SERVICE_PORT_HTTPS”: “8443”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_SERVICE_PORT”: “8443”, “SHLVL”: “0”, “KUBERNETES_PORT_443_TCP_PROTO”: “tcp”, “KUBERNETES_PORT_443_TCP_ADDR”: “10.43.0.1”, “AWX_SERVICE_SERVICE_HOST”: “10.43.61.35”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP”: “tcp://10.43.106.140:8443”, “AWX_SERVICE_SERVICE_PORT_HTTP”: “80”, “KUBERNETES_SERVICE_HOST”: “10.43.0.1”, “KUBERNETES_PORT”: “tcp://10.43.0.1:443”, “KUBERNETES_PORT_443_TCP_PORT”: “443”, “PATH”: “/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”, “AWX_OPERATOR_CONTROLLER_MANAGER_METRICS_SERVICE_PORT_8443_TCP_ADDR”: “10.43.106.140”, “AWX_SERVICE_PORT_80_TCP_PROTO”: “tcp”, “LC_CTYPE”: “C.UTF-8”, “ANSIBLE_BASE_ALLOW_CUSTOM_ROLES”: “True”, “ANSIBLE_BASE_ALLOW_CUSTOM_TEAM_ROLES”: “False”, “ANSIBLE_BASE_ALLOW_SINGLETON_ROLES_API”: “False”, “ANSIBLE_BASE_ALLOW_SINGLETON_TEAM_ROLES”: “False”, “ANSIBLE_BASE_ALLOW_SINGLETON_USER_ROLES”: “True”, “ANSIBLE_BASE_ALLOW_TEAM_ORG_ADMIN”: “False”, “ANSIBLE_BASE_ALLOW_TEAM_ORG_MEMBER”: “False”, “ANSIBLE_BASE_ALLOW_TEAM_ORG_PERMS”: “True”, “ANSIBLE_BASE_ALLOW_TEAM_PARENTS”: “True”, “ANSIBLE_BASE_ALL_REST_FILTERS”: “(‘ansible_base.rest_filters.rest_framework.type_filter_backend.TypeFilterBackend’, ‘ansible_base.rest_filters.rest_framework.field_lookup_backend.FieldLookupBackend’, ‘rest_framework.filters.SearchFilter’, ‘ansible_base.rest_filters.rest_framework.order_backend.OrderByBackend’)”, “ANSIBLE_BASE_BYPASS_ACTION_FLAGS”: “{}”, “ANSIBLE_BASE_BYPASS_SUPERUSER_FLAGS”: “[‘is_superuser’]”, “ANSIBLE_BASE_CACHE_PARENT_PERMISSIONS”: “True”, “ANSIBLE_BASE_CHECK_RELATED_PERMISSIONS”: “[‘use’, ‘change’, ‘view’]”, “ANSIBLE_BASE_CREATOR_DEFAULTS”: “[‘change’, ‘delete’, ‘execute’, ‘use’, ‘adhoc’, ‘approve’, ‘update’, ‘view’]”, “ANSIBLE_BASE_CUSTOM_VIEW_PARENT”: “awx.api.generics.APIView”, “ANSIBLE_BASE_EVALUATIONS_IGNORE_CONFLICTS”: “True”, “ANSIBLE_BASE_MANAGED_ROLE_REGISTRY”: “{}”, “ANSIBLE_BASE_ORGANIZATION_MODEL”: “main.Organization”, “ANSIBLE_BASE_PERMISSION_MODEL”: “main.Permission”, “ANSIBLE_BASE_RESOURCE_CONFIG_MODULE”: “awx.resource_api”, “ANSIBLE_BASE_REST_FILTERS_RESERVED_NAMES”: “(‘page’, ‘page_size’, ‘format’, ‘order’, ‘order_by’, ‘search’, ‘type’, ‘host_filter’, ‘count_disabled’, ‘no_truncate’, ‘limit’, ‘validate’)”, “ANSIBLE_BASE_ROLE_CREATOR_NAME”: “{cls.name} Creator”, “ANSIBLE_BASE_ROLE_PRECREATE”: “{}”, “ANSIBLE_BASE_ROLE_SYSTEM_ACTIVATED”: “True”, “ANSIBLE_BASE_TEAM_MODEL”: “main.Team”, “ANSIBLE_FACT_CACHE_TIMEOUT”: “0”, “ANSIBLE_FORCE_COLOR”: “True”, “ANSIBLE_HOST_KEY_CHECKING”: “False”, “ANSIBLE_INVENTORY_UNPARSED_FAILED”: “True”, “ANSIBLE_PARAMIKO_RECORD_HOST_KEYS”: “False”, “AWX_PRIVATE_DATA_DIR”: “/tmp/awx_419_ymu0347b”, “JOB_ID”: “419”, “INVENTORY_ID”: “3”, “PROJECT_REVISION”: “81e55f00bbb2ef6cfedf83a80248c4faa9a383a7”, “ANSIBLE_RETRY_FILES_ENABLED”: “False”, “MAX_EVENT_RES”: “700000”, “AWX_HOST”: “https://target_server.com/”, “ANSIBLE_SSH_CONTROL_PATH_DIR”: “/runner/cp”, “ANSIBLE_COLLECTIONS_PATHS”: “/runner/requirements_collections:~/.ansible/collections:/usr/share/ansible/collections”, “ANSIBLE_ROLES_PATH”: “/runner/requirements_roles:~/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles”, “ANSIBLE_COLLECTIONS_PATH”: “/runner/requirements_collections:~/.ansible/collections:/usr/share/ansible/collections”, “ANSIBLE_CALLBACK_PLUGINS”: “/usr/local/lib/python3.9/site-packages/ansible_runner/display_callback/callback”, “ANSIBLE_STDOUT_CALLBACK”: “awx_display”, “AWX_ISOLATED_DATA_DIR”: “/runner/artifacts/419”, “RUNNER_OMIT_EVENTS”: “False”, “RUNNER_ONLY_FAILED_EVENTS”: “False”}, “cwd”: “/runner/project”}
{“status”: “running”, “runner_ident”: “419”}
{“event”: “verbose”, “uuid”: “e5118139-ddb4-4c90-8570-67fc63d6373c”, “counter”: 1, “stdout”: “Identity added: /runner/artifacts/419/ssh_key_data (/runner/artifacts/419/ssh_key_data)”, “start_line”: 0, “end_line”: 1, “runner_ident”: “419”, “created”: “2024-09-26T04:24:41.939000+00:00”}
{“event”: “verbose”, “uuid”: “095754cc-ec62-45fb-b026-2eb33cb590f0”, “counter”: 2, “stdout”: “\u001b[1;35m[WARNING]: Invalid characters were found in group names but not replaced, use\u001b[0m”, “start_line”: 1, “end_line”: 2, “runner_ident”: “419”, “created”: “2024-09-26T04:24:42.370356+00:00”}
{“event”: “verbose”, “uuid”: “d3b20e84-39b7-45c1-aa8c-008a44e52627”, “counter”: 3, “stdout”: “\u001b[1;35m-vvvv to see details\u001b[0m”, “start_line”: 2, “end_line”: 3, “runner_ident”: “419”, “created”: “2024-09-26T04:24:42.370491+00:00”}
{“uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “counter”: 4, “stdout”: “”, “start_line”: 3, “end_line”: 3, “runner_ident”: “419”, “event”: “playbook_on_start”, “job_id”: 419, “pid”: 22, “created”: “2024-09-26T04:24:42.586005+00:00”, “event_data”: {“playbook”: “ping.yml”, “playbook_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”}}
{“uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”, “counter”: 5, “stdout”: “\r\nPLAY [ping test] ***************************************************************”, “start_line”: 3, “end_line”: 5, “runner_ident”: “419”, “event”: “playbook_on_play_start”, “job_id”: 419, “pid”: 22, “created”: “2024-09-26T04:24:42.598605+00:00”, “parent_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “event_data”: {“playbook”: “ping.yml”, “playbook_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “play”: “ping test”, “play_uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”, “play_pattern”: “all”, “name”: “ping test”, “pattern”: “all”, “uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”}}
{“uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”, “counter”: 6, “stdout”: “\r\nTASK [run ping module in servers] **********************************************”, “start_line”: 5, “end_line”: 7, “runner_ident”: “419”, “event”: “playbook_on_task_start”, “job_id”: 419, “pid”: 22, “created”: “2024-09-26T04:24:42.609071+00:00”, “parent_uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”, “event_data”: {“playbook”: “ping.yml”, “playbook_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “play”: “ping test”, “play_uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”, “play_pattern”: “all”, “task”: “run ping module in servers”, “task_uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”, “task_action”: “ping”, “resolved_action”: “ansible.builtin.ping”, “task_args”: “”, “task_path”: “/runner/project/ping.yml:5”, “name”: “run ping module in servers”, “is_conditional”: false, “uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”}}
{“uuid”: “2747f276-c416-464c-aea0-d70f84781d97”, “counter”: 7, “stdout”: “”, “start_line”: 7, “end_line”: 7, “runner_ident”: “419”, “event”: “runner_on_start”, “job_id”: 419, “pid”: 22, “created”: “2024-09-26T04:24:42.609831+00:00”, “parent_uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”, “event_data”: {“playbook”: “ping.yml”, “playbook_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “play”: “ping test”, “play_uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”, “play_pattern”: “all”, “task”: “run ping module in servers”, “task_uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”, “task_action”: “ping”, “resolved_action”: “ansible.builtin.ping”, “task_args”: “”, “task_path”: “/runner/project/ping.yml:5”, “host”: “target_server.com”, “uuid”: “2747f276-c416-464c-aea0-d70f84781d97”}}
{“uuid”: “56096ed1-75af-4632-8c89-d2a3e4c5305f”, “counter”: 8, “stdout”: “\u001b[0;32mok: [target_server.com]\u001b[0m”, “start_line”: 7, “end_line”: 8, “runner_ident”: “419”, “event”: “runner_on_ok”, “job_id”: 419, “pid”: 22, “created”: “2024-09-26T04:24:43.547383+00:00”, “parent_uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”, “event_data”: {“playbook”: “ping.yml”, “playbook_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “play”: “ping test”, “play_uuid”: “4a57d9cc-2a29-1d14-9c24-000000000138”, “play_pattern”: “all”, “task”: “run ping module in servers”, “task_uuid”: “4a57d9cc-2a29-1d14-9c24-00000000013a”, “task_action”: “ping”, “resolved_action”: “ansible.builtin.ping”, “task_args”: “”, “task_path”: “/runner/project/ping.yml:5”, “host”: “target_server.com”, “remote_addr”: “target_server.com”, “res”: {“ping”: “pong”, “invocation”: {“module_args”: {“data”: “pong”}}, “ansible_facts”: {“discovered_interpreter_python”: “/usr/libexec/platform-python”}, “_ansible_no_log”: false, “changed”: false}, “start”: “2024-09-26T04:24:42.609769+00:00”, “end”: “2024-09-26T04:24:43.547214+00:00”, “duration”: 0.937445, “event_loop”: null, “uuid”: “56096ed1-75af-4632-8c89-d2a3e4c5305f”}}
{“uuid”: “aafc16e9-f752-4bbf-887c-186742ec4fdc”, “counter”: 9, “stdout”: "\r\nPLAY RECAP *********************************************************************\r\n\u001b[0;32mtarger_server.com\u001b[0m : \u001b[0;32mok=1 \u001b[0m changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ", “start_line”: 8, “end_line”: 12, “runner_ident”: “419”, “event”: “playbook_on_stats”, “job_id”: 419, “pid”: 22, “created”: “2024-09-26T04:24:43.556178+00:00”, “parent_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “event_data”: {“playbook”: “ping.yml”, “playbook_uuid”: “7a90bc3f-8acb-4a3d-83bc-c122ab938bc2”, “changed”: {}, “dark”: {}, “failures”: {}, “ignored”: {}, “ok”: {“target_server.com”: 1}, “processed”: {“target_server.com”: 1}, “rescued”: {}, “skipped”: {}, “artifact_data”: {}, “uuid”: “aafc16e9-f752-4bbf-887c-186742ec4fdc”}}
{“status”: “successful”, “runner_ident”: “419”}
{“zipfile”: 2009}
UEsDBBQAAAAIABQjOlldp5fXUAYAANkSAAAHAAAAY29tbWFuZL1YbXObOBD+K4w/NHczZ7DjtHXz6TCWHRoMDMhOfZ2ORsZyTIuBAk7iZnK//VYCHPDbpenN5YPRy75o99ldrfLY8KLViobzxqX0uZGmyya9ZWHW+ENqpEv+2/T4b5bQWDpLVlJzISnJOgxZotAk8xfUy1Llov1BAVbyjW3InGb0TEKfdCy9eSMJgfP5C3g49Uvlc1oapv4sYM04oJtZFH2TmuvtmvcQS81Auj1vNYNbb3Pn/6DzVdj+TmVv+V1mvtT0t1r88A7sjZKNsozSLJWaTPqz3GPhncIewPY7mqRS7Ie38mYVNL6AQ7x77rFGSRkn0VfmCbcBE+w8Nq7HPeSYCCOXuMiZ6BoituVgcoWx7XLei4sOpz9CV6FQbz4Ry0aOii2HaJaJHcswkENGqqkO+RdhR9ee2cvvleUKMe2WfNGR2613cvuixQXyDVMdIb5J11m0opkfhc2v0awJrm6eL2+DVam4dvZui2DN5myZF18qSi75XVvuvL3sCtH2Tb/il9ecPlcEpnNVW1d0C1/Yuk16DlKviTt1MRoRW9WuQY7waPs/0OdY2CoM3HXBLjzdwpm5IysmVyAV0gvhu35rye3LEmLT1XsGIs7YNOGM1wjZqqFPuE4woS+s67ROYELUft95BltAcoj8FHav8dyuvCLKLruV2D103LoPXxvh9YTq/mK+7Maae2VMDL7QOgHqXsQco6sDBNgfC6962taQfHVc/wxMe859xulAraqftjBrxwcnAv+oV+sl0FbxlUiydZooQeTRQElnfnhZmW+nzxtikE/h59eLQx3CSkE9GuU7sWFoRMNTW9QLTR7jQbNbzf6e6iKiGoZ1Q7Sxi60RgePlpQ0na3aaFCO1Qj+gQXqMwdXNoYGwZebkRLX1l7O8Rs0YfPMiU4RwyxmCp0e6+W8KttQjNIIoejE54D960Uls1UEmPklKHORiMtANjBxB+NtZ0YWQGU2ZnLA0Iws/yFiSFpOErth9lHyTs03Mij0g9r6xcC5jWBuIpV6+cvaH9HKJC58FcxJAO7SOtyIHfNEQaxWZe5y5QJfRxFvmJ/gp1VEyr5hh8VlvU+r7fc91vamtuhB6GtYhQAaGOhTee3w6RumOATYRSFviz2d+StJ1zJJ1Cqf9sseqqdoVKlAUqOuuC+qOAwrk2jVAaqgY9Xc5Pp+BGu4Tb0nDWzG689n9Ib3QoPAC00cDdWzggvuZbc4ClokRe2DeOh8Wwul8GXliEENTeZdvxdD1nlSYl4CJjm4Kc0Vrd/8g09iXoaFnie+lMiT6BATssaOJaoxVjoRL9KFpOYjXxoGhayeCPy+ZfZHX4LKh7mJnegRDyDrV1P8SKsjI6iNxra6oH0Kk3NIQGnTege7xPSOww2WzZOWn6SEeSEhr7Gi5CfqQM44NVPoDojZaJx4j4JhDrNtc5hOo6GAg75SL1I5pDiD/ktT/ISaLKIEGmo9EDmwHZLbh41RkFB/xfOdf/tAo0kiEU7QOMzL3UwqJJnIzjEgG3aRXgB74K1/Iv6OBLyJhP58ECmXclb39oxekMiEh5CghT5KWMAovncO8toMEOzoCoSAqem6etROeIkeDQ5TPOmSY0VWVbgBSigzF+ghZY7xttbYkVo6jYTmHNPHWA1rlaZ62cNscvAF0cwL5YDlTMjYhN1xAdKDqxuHDA4E60q8tQF+znP5Wxc5lB9e97QgXkL6KVdLXxQGVbBUrEGUEXlFks1q3OhfvZ5zho9UjulAIO3zh+VD5ct7kONZHBE5x0ETnMS/6rjZ7+3bRas1ms3O2eOct2HzR7dBu6/yi610sKP1AOzB/X3tEQBMz5XEMdzsy1V5h6/b4I/UTpDyviU4e2O9b/K+0rOzollkWp9Cz8Wd05WVdvKGVqkbXvSq7KcJbta1Diudx8ZYqqDXec2l5ueHUbpU0Yd/XfsJW8ChPiRcFATyrIc3Ty78VuTiFUl3OW74lTZhyYLvmFtHsbDvJQ/qSKGA1TfnCAR3FBsu8+tIpO/8fMzXoSnrwKCa2MR7q+dVVaZMDf6bEm2wZhR35g5L6GWvGcEtDQUtLyaQ4I5Qk/t8VAmwBv9iVclCDHvchdbdaizpLCt4ypnTXyu/UWroc+kcP5yhewdZIx3mg1tOv3DaNaZHKe1RPT/8AUEsDBBQAAAAIABUjOlmBPS1lDAAAAAoAAAAGAAAAc3RhdHVzKy5NTk4tLk4rzQEAUEsDBBQAAAAIABUjOlkh39v0AwAAAAEAAAACAAAAcmMzAABQSwMEFAAAAAAAFSM6WQAAAAAAAAAAAAAAAAsAAABqb2JfZXZlbnRzL1BLAQIUAxQAAAAIABQjOlldp5fXUAYAANkSAAAHAAAAAAAAAAAAAACAgQAAAABjb21tYW5kUEsBAhQDFAAAAAgAFSM6WYE9LWUMAAAACgAAAAYAAAAAAAAAAAAAAICBdQYAAHN0YXR1c1BLAQIUAxQAAAAIABUjOlkh39v0AwAAAAEAAAACAAAAAAAAAAAAAACAgaUGAAByY1BLAQIUAxQAAAAAABUjOlkAAAAAAAAAAAAAAAALAAAAAAAAAAAAEADAQcgGAABqb2JfZXZlbnRzL1BLBQYAAAAABAAEANIAAADxBgAAAAA={“eof”: true}
[root@server ~]

AWX TASK logs:-
[root@server ~]# kubectl logs awx-task-55d8c9dd9f-4l7w6 -n awx
[wait-for-migrations] Waiting for database migrations…
[wait-for-migrations] Attempt 1
Instance Group already registered controlplane
Instance Group already registered default
Successfully registered instance None
(changed: True)
2024-09-26 04:22:50,311 INFO RPC interface ‘supervisor’ initialized
2024-09-26 04:22:50,311 INFO RPC interface ‘supervisor’ initialized
2024-09-26 04:22:50,311 CRIT Server ‘unix_http_server’ running without any HTTP authentication checking
2024-09-26 04:22:50,311 CRIT Server ‘unix_http_server’ running without any HTTP authentication checking
2024-09-26 04:22:50,311 INFO supervisord started with pid 7
2024-09-26 04:22:50,311 INFO supervisord started with pid 7
2024-09-26 04:22:51,313 INFO spawned: ‘superwatcher’ with pid 37
2024-09-26 04:22:51,313 INFO spawned: ‘superwatcher’ with pid 37
2024-09-26 04:22:51,315 INFO spawned: ‘dispatcher’ with pid 38
2024-09-26 04:22:51,315 INFO spawned: ‘dispatcher’ with pid 38
2024-09-26 04:22:51,317 INFO spawned: ‘callback-receiver’ with pid 39
2024-09-26 04:22:51,317 INFO spawned: ‘callback-receiver’ with pid 39
2024-09-26 04:22:51,318 INFO spawned: ‘wsrelay’ with pid 40
2024-09-26 04:22:51,318 INFO spawned: ‘wsrelay’ with pid 40
READY
2024-09-26 04:22:52,320 INFO success: superwatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:52,320 INFO success: superwatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:52,320 INFO success: wsrelay entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:52,320 INFO success: wsrelay entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:53,091 INFO [-] awx.main.commands.run_callback_receiver Callback receiver started with pid=39
2024-09-26 04:22:53,098 INFO [-] awx.main.dispatch Running worker dispatcher listening to queues [‘tower_broadcast_all’, ‘tower_settings_change’, ‘awx-task-55d8c9dd9f-4l7w6’]
2024-09-26 04:22:53,110 INFO [-] awx.main.wsrelay Active instance with hostname awx-task-55d8c9dd9f-4l7w6 is registered.
2024-09-26 04:22:53,111 INFO [-] awx.main.wsrelay Starting Websocket Relayer…
2024-09-26 04:22:53,177 WARNING [-] awx.main.tasks.system Rejoining the cluster as instance awx-task-55d8c9dd9f-4l7w6. Prior last_seen None
2024-09-26 04:22:53,187 INFO [-] awx.main.dispatch Dispatcher listener connection established
2024-09-26 04:22:53,187 WARNING [-] awx.main.dispatch.periodic Scheduler next run unexpectedly over 20 seconds in future: 25.91101598739624
2024-09-26 04:23:03,120 INFO [-] awx.main.wsrelay Adding {‘awx-web-59498d8589-98bph’} to websocket broadcast list
2024-09-26 04:23:03,125 INFO [-] awx.main.wsrelay Connection from awx-task-55d8c9dd9f-4l7w6 to 10.42.0.9 established.
2024-09-26 04:23:03,126 INFO [-] awx.main.wsrelay Starting producer for metrics
2024-09-26 04:23:12,672 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 waiting {“type”: “job”, “task_id”: 419, “state”: “waiting”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:12,789 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 pre run {“type”: “job”, “task_id”: 419, “state”: “pre_run”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:12,920 WARNING [-] awx.main.wsrelay Producer 10.42.0.9-control-limit_reached_1 not found.
2024-09-26 04:23:12,921 WARNING [-] awx.main.wsrelay Producer 10.42.0.9-jobs-status_changed not found.
2024-09-26 04:23:13,886 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 preparing playbook {“type”: “job”, “task_id”: 419, “state”: “preparing_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:13,934 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-control-limit_reached_1
2024-09-26 04:23:13,935 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-status_changed
2024-09-26 04:23:13,990 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 running playbook {“type”: “job”, “task_id”: 419, “state”: “running_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:16,473 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-job_events-419
2024-09-26 04:23:16,474 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-summary
2024-09-26 04:23:21,479 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:21,479 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:21,479 INFO success: callback-receiver entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:21,479 INFO success: callback-receiver entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:56,124 WARNING [6c472bf82d8b4b69865c5834341ddef9] awx.main.tasks.system Heartbeat skew - interval=62.9260, expected=60
2024-09-26 04:23:56,137 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.main.tasks.system Host awx-task-55d8c9dd9f-krms6 Automatically Deprovisioned.
2024-09-26 04:24:27,200 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 work unit id received {“type”: “job”, “task_id”: 419, “state”: “work_unit_id_received”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:27,212 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 work unit id assigned {“type”: “job”, “task_id”: 419, “state”: “work_unit_id_assigned”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:43,995 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 419
2024-09-26 04:24:43,996 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 post run {“type”: “job”, “task_id”: 419, “state”: “post_run”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:44,096 INFO [-] awx.analytics.job_lifecycle job-419 stats wrapup finished {“type”: “job”, “task_id”: 419, “state”: “stats_wrapup_finished”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:44,272 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 finalize run {“type”: “job”, “task_id”: 419, “state”: “finalize_run”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:53,961 INFO [-] awx.main.wsrelay Producer 10.42.0.9-job_events-419 has no subscribers, shutting down.
2024-09-26 04:24:53,999 INFO [-] awx.main.wsrelay Producer 10.42.0.9-jobs-summary has no subscribers, shutting down.
[root@server ~]#

Summary

This text will be hidden

@vinothravindhran
Thanks for updating,

By filtering job 419, we can see the state running_playbook takes over 1 minute:

2024-09-26 04:23:12,672 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 waiting {“type”: “job”, “task_id”: 419, “state”: “waiting”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:12,789 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 pre run {“type”: “job”, “task_id”: 419, “state”: “pre_run”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:13,886 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 preparing playbook {“type”: “job”, “task_id”: 419, “state”: “preparing_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:13,990 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 running playbook {“type”: “job”, “task_id”: 419, “state”: “running_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:24:27,200 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 work unit id received {“type”: “job”, “task_id”: 419, “state”: “work_unit_id_received”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:27,212 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 work unit id assigned {“type”: “job”, “task_id”: 419, “state”: “work_unit_id_assigned”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:43,996 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 post run {“type”: “job”, “task_id”: 419, “state”: “post_run”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:44,272 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 finalize run {“type”: “job”, “task_id”: 419, “state”: “finalize_run”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}

At this state, AWX requests Receptor to invoke work unit, then Receptor requests Kubernetes API to create new pod.

There is a possibility that your Kubernetes consumes much time to schedule requested pod on the available nodes.

  • Could you share the logs from both awx-task and awx-ee containers in the awx-task pod for the playbook with simple ping test for 1 host?
    • kubectl -n awx logs -f deployment/awx-task -c awx-task
    • kubectl -n awx logs -f deployment/awx-task -c awx-ee
  • Does your K3s host have sufficient CPU and RAM resources? If not, increase # of cores and add RAM, then try again
1 Like

Thank you @kurokobo.

I have attached EE and Task container logs.

Does your K3s host have sufficient CPU and RAM resources? If not, increase # of cores and add RAM, then try again

Yes i have doubled CPU and Memory size and tried but still same slowness.
Current CPU and Memory - 8cpu and 32Gb memory.

[root@server~]# kubectl -n awx logs -f deployment/awx-task -c awx-ee
INFO 2024/09/26 04:22:43 Running control service control
INFO 2024/09/26 04:22:43 Initialization complete
INFO 2024/09/26 04:24:46 Detected Error: EOF for pod awx/automation-job-419-2hglm. Will retry 5 more times.
INFO 2024/09/26 04:24:46 Detected Error: EOF for pod awx/automation-job-419-2hglm. Will retry 4 more times.
INFO 2024/09/26 04:24:47 Detected Error: EOF for pod awx/automation-job-419-2hglm. Will retry 3 more times.
INFO 2024/09/26 04:24:47 Detected Error: EOF for pod awx/automation-job-419-2hglm. Will retry 2 more times.
INFO 2024/09/26 04:24:47 Detected Error: EOF for pod awx/automation-job-419-2hglm. Will retry 1 more times.
ERROR 2024/09/26 04:24:47 Error reading from pod awx/automation-job-419-2hglm: EOF
WARNING 2024/09/26 04:24:47 Could not read in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
WARNING 2024/09/26 04:24:47 Could not close connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection
INFO 2024/09/26 07:56:31 Detected Error: EOF for pod awx/automation-job-420-2psdz. Will retry 5 more times.
INFO 2024/09/26 07:56:31 Detected Error: EOF for pod awx/automation-job-420-2psdz. Will retry 4 more times.
INFO 2024/09/26 07:56:32 Detected Error: EOF for pod awx/automation-job-420-2psdz. Will retry 3 more times.
INFO 2024/09/26 07:56:32 Detected Error: EOF for pod awx/automation-job-420-2psdz. Will retry 2 more times.
INFO 2024/09/26 07:56:32 Detected Error: EOF for pod awx/automation-job-420-2psdz. Will retry 1 more times.
ERROR 2024/09/26 07:56:32 Error reading from pod awx/automation-job-420-2psdz: EOF
WARNING 2024/09/26 07:56:32 Could not read in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
WARNING 2024/09/26 07:56:32 Could not close connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection
INFO 2024/09/27 04:22:43 Known Connections:
INFO 2024/09/27 04:22:43 Routing Table:
INFO 2024/09/27 15:22:35 Detected Error: EOF for pod awx/automation-job-421-vvzff. Will retry 5 more times.
INFO 2024/09/27 15:22:35 Detected Error: EOF for pod awx/automation-job-421-vvzff. Will retry 4 more times.
INFO 2024/09/27 15:22:35 Detected Error: EOF for pod awx/automation-job-421-vvzff. Will retry 3 more times.
INFO 2024/09/27 15:22:36 Detected Error: EOF for pod awx/automation-job-421-vvzff. Will retry 2 more times.
INFO 2024/09/27 15:22:36 Detected Error: EOF for pod awx/automation-job-421-vvzff. Will retry 1 more times.
ERROR 2024/09/27 15:22:36 Error reading from pod awx/automation-job-421-vvzff: EOF
WARNING 2024/09/27 15:22:36 Could not read in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
WARNING 2024/09/27 15:22:36 Could not close connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection
^C
[root@server ~]#


[root@server ~]# kubectl -n awx logs -f deployment/awx-task -c awx-task
[wait-for-migrations] Waiting for database migrations…
[wait-for-migrations] Attempt 1
Instance Group already registered controlplane
Instance Group already registered default
Successfully registered instance None
(changed: True)
2024-09-26 04:22:50,311 INFO RPC interface ‘supervisor’ initialized
2024-09-26 04:22:50,311 INFO RPC interface ‘supervisor’ initialized
2024-09-26 04:22:50,311 CRIT Server ‘unix_http_server’ running without any HTTP authentication checking
2024-09-26 04:22:50,311 CRIT Server ‘unix_http_server’ running without any HTTP authentication checking
2024-09-26 04:22:50,311 INFO supervisord started with pid 7
2024-09-26 04:22:50,311 INFO supervisord started with pid 7
2024-09-26 04:22:51,313 INFO spawned: ‘superwatcher’ with pid 37
2024-09-26 04:22:51,313 INFO spawned: ‘superwatcher’ with pid 37
2024-09-26 04:22:51,315 INFO spawned: ‘dispatcher’ with pid 38
2024-09-26 04:22:51,315 INFO spawned: ‘dispatcher’ with pid 38
2024-09-26 04:22:51,317 INFO spawned: ‘callback-receiver’ with pid 39
2024-09-26 04:22:51,317 INFO spawned: ‘callback-receiver’ with pid 39
2024-09-26 04:22:51,318 INFO spawned: ‘wsrelay’ with pid 40
2024-09-26 04:22:51,318 INFO spawned: ‘wsrelay’ with pid 40
READY
2024-09-26 04:22:52,320 INFO success: superwatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:52,320 INFO success: superwatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:52,320 INFO success: wsrelay entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:52,320 INFO success: wsrelay entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-26 04:22:53,091 INFO [-] awx.main.commands.run_callback_receiver Callback receiver started with pid=39
2024-09-26 04:22:53,098 INFO [-] awx.main.dispatch Running worker dispatcher listening to queues [‘tower_broadcast_all’, ‘tower_settings_change’, ‘awx-task-55d8c9dd9f-4l7w6’]
2024-09-26 04:22:53,110 INFO [-] awx.main.wsrelay Active instance with hostname awx-task-55d8c9dd9f-4l7w6 is registered.
2024-09-26 04:22:53,111 INFO [-] awx.main.wsrelay Starting Websocket Relayer…
2024-09-26 04:22:53,177 WARNING [-] awx.main.tasks.system Rejoining the cluster as instance awx-task-55d8c9dd9f-4l7w6. Prior last_seen None
2024-09-26 04:22:53,187 INFO [-] awx.main.dispatch Dispatcher listener connection established
2024-09-26 04:22:53,187 WARNING [-] awx.main.dispatch.periodic Scheduler next run unexpectedly over 20 seconds in future: 25.91101598739624
2024-09-26 04:23:03,120 INFO [-] awx.main.wsrelay Adding {‘awx-web-59498d8589-98bph’} to websocket broadcast list
2024-09-26 04:23:03,125 INFO [-] awx.main.wsrelay Connection from awx-task-55d8c9dd9f-4l7w6 to 10.42.0.9 established.
2024-09-26 04:23:03,126 INFO [-] awx.main.wsrelay Starting producer for metrics
2024-09-26 04:23:12,672 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 waiting {“type”: “job”, “task_id”: 419, “state”: “waiting”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:12,789 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 pre run {“type”: “job”, “task_id”: 419, “state”: “pre_run”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:12,920 WARNING [-] awx.main.wsrelay Producer 10.42.0.9-control-limit_reached_1 not found.
2024-09-26 04:23:12,921 WARNING [-] awx.main.wsrelay Producer 10.42.0.9-jobs-status_changed not found.
2024-09-26 04:23:13,886 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 preparing playbook {“type”: “job”, “task_id”: 419, “state”: “preparing_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:13,934 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-control-limit_reached_1
2024-09-26 04:23:13,935 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-status_changed
2024-09-26 04:23:13,990 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 running playbook {“type”: “job”, “task_id”: 419, “state”: “running_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 04:23:16,473 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-job_events-419
2024-09-26 04:23:16,474 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-summary
2024-09-26 04:23:21,479 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:21,479 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:21,479 INFO success: callback-receiver entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:21,479 INFO success: callback-receiver entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2024-09-26 04:23:56,124 WARNING [6c472bf82d8b4b69865c5834341ddef9] awx.main.tasks.system Heartbeat skew - interval=62.9260, expected=60
2024-09-26 04:23:56,137 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.main.tasks.system Host awx-task-55d8c9dd9f-krms6 Automatically Deprovisioned.
2024-09-26 04:24:27,200 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 work unit id received {“type”: “job”, “task_id”: 419, “state”: “work_unit_id_received”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:27,212 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 work unit id assigned {“type”: “job”, “task_id”: 419, “state”: “work_unit_id_assigned”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:43,995 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 419
2024-09-26 04:24:43,996 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 post run {“type”: “job”, “task_id”: 419, “state”: “post_run”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:44,096 INFO [-] awx.analytics.job_lifecycle job-419 stats wrapup finished {“type”: “job”, “task_id”: 419, “state”: “stats_wrapup_finished”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:44,272 INFO [6c472bf82d8b4b69865c5834341ddef9] awx.analytics.job_lifecycle job-419 finalize run {“type”: “job”, “task_id”: 419, “state”: “finalize_run”, “work_unit_id”: “PEcwPhpM”, “task_name”: “Ping_test”}
2024-09-26 04:24:53,961 INFO [-] awx.main.wsrelay Producer 10.42.0.9-job_events-419 has no subscribers, shutting down.
2024-09-26 04:24:53,999 INFO [-] awx.main.wsrelay Producer 10.42.0.9-jobs-summary has no subscribers, shutting down.
2024-09-26 04:58:06,821 WARNING [-] awx.main.wsrelay Producer 10.42.0.9-control-limit_reached_1 not found.
2024-09-26 04:58:06,821 WARNING [-] awx.main.wsrelay Producer 10.42.0.9-jobs-status_changed not found.
2024-09-26 04:58:14,127 INFO [-] awx.main.wsrelay Producer 10.42.0.9-control-limit_reached_1 has no subscribers, shutting down.
2024-09-26 04:58:14,476 INFO [-] awx.main.wsrelay Producer 10.42.0.9-jobs-status_changed has no subscribers, shutting down.
2024-09-26 07:54:49,615 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-control-limit_reached_1
2024-09-26 07:54:49,615 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-status_changed
2024-09-26 07:54:59,980 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 waiting {“type”: “job”, “task_id”: 420, “state”: “waiting”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 07:55:00,058 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 pre run {“type”: “job”, “task_id”: 420, “state”: “pre_run”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 07:55:01,202 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 preparing playbook {“type”: “job”, “task_id”: 420, “state”: “preparing_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 07:55:01,295 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 running playbook {“type”: “job”, “task_id”: 420, “state”: “running_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-26 07:55:03,826 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-summary
2024-09-26 07:55:03,826 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-job_events-420
2024-09-26 07:56:14,216 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 work unit id received {“type”: “job”, “task_id”: 420, “state”: “work_unit_id_received”, “work_unit_id”: “oDDot44k”, “task_name”: “Ping_test”}
2024-09-26 07:56:14,227 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 work unit id assigned {“type”: “job”, “task_id”: 420, “state”: “work_unit_id_assigned”, “work_unit_id”: “oDDot44k”, “task_name”: “Ping_test”}
2024-09-26 07:56:30,762 INFO [f36c62303fa041869682851f5fc05f91] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 420
2024-09-26 07:56:30,764 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 post run {“type”: “job”, “task_id”: 420, “state”: “post_run”, “work_unit_id”: “oDDot44k”, “task_name”: “Ping_test”}
2024-09-26 07:56:30,816 INFO [f36c62303fa041869682851f5fc05f91] awx.analytics.job_lifecycle job-420 finalize run {“type”: “job”, “task_id”: 420, “state”: “finalize_run”, “work_unit_id”: “oDDot44k”, “task_name”: “Ping_test”}
2024-09-26 07:56:30,869 INFO [-] awx.analytics.job_lifecycle job-420 stats wrapup finished {“type”: “job”, “task_id”: 420, “state”: “stats_wrapup_finished”, “work_unit_id”: “oDDot44k”, “task_name”: “Ping_test”}
2024-09-26 07:56:40,766 INFO [-] awx.main.wsrelay Producer 10.42.0.9-jobs-summary has no subscribers, shutting down.
2024-09-26 07:56:40,851 INFO [-] awx.main.wsrelay Producer 10.42.0.9-job_events-420 has no subscribers, shutting down.
2024-09-26 08:36:11,079 INFO [-] awx.main.wsrelay Producer 10.42.0.9-jobs-status_changed has no subscribers, shutting down.
2024-09-27 15:20:10,456 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-control-limit_reached_1
2024-09-27 15:20:10,457 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-status_changed
2024-09-27 15:21:03,493 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 waiting {“type”: “job”, “task_id”: 421, “state”: “waiting”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-27 15:21:03,561 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 pre run {“type”: “job”, “task_id”: 421, “state”: “pre_run”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-27 15:21:04,672 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 preparing playbook {“type”: “job”, “task_id”: 421, “state”: “preparing_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-27 15:21:04,726 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 running playbook {“type”: “job”, “task_id”: 421, “state”: “running_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-09-27 15:21:08,317 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-jobs-summary
2024-09-27 15:21:08,317 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-job_events-421
2024-09-27 15:22:17,408 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 work unit id received {“type”: “job”, “task_id”: 421, “state”: “work_unit_id_received”, “work_unit_id”: “D7fpATHD”, “task_name”: “Ping_test”}
2024-09-27 15:22:17,419 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 work unit id assigned {“type”: “job”, “task_id”: 421, “state”: “work_unit_id_assigned”, “work_unit_id”: “D7fpATHD”, “task_name”: “Ping_test”}
2024-09-27 15:22:34,709 INFO [82a6579395fb4a63a05542fadda79cff] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 421
2024-09-27 15:22:34,710 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 post run {“type”: “job”, “task_id”: 421, “state”: “post_run”, “work_unit_id”: “D7fpATHD”, “task_name”: “Ping_test”}
2024-09-27 15:22:34,816 INFO [82a6579395fb4a63a05542fadda79cff] awx.analytics.job_lifecycle job-421 finalize run {“type”: “job”, “task_id”: 421, “state”: “finalize_run”, “work_unit_id”: “D7fpATHD”, “task_name”: “Ping_test”}
2024-09-27 15:22:35,574 INFO [-] awx.analytics.job_lifecycle job-421 stats wrapup finished {“type”: “job”, “task_id”: 421, “state”: “stats_wrapup_finished”, “work_unit_id”: “D7fpATHD”, “task_name”: “Ping_test”}
2024-09-27 15:22:44,712 INFO [-] awx.main.wsrelay Producer 10.42.0.9-jobs-summary has no subscribers, shutting down.
2024-09-27 15:22:45,560 INFO [-] awx.main.wsrelay Producer 10.42.0.9-job_events-421 has no subscribers, shutting down.
^C
[root@server ~]#

@vinothravindhran
Hmm no suspicious logs. Could you please share your K3s logs from journalctl -u k3s between 2024-09-26 04:22:00 and 2024-09-26 04:24:00?

Thank you very much @kurokobo ,

I found the issue that, git repo causing the issue, which size is more than 4GB. I have enabled with test repo and executed job and it is completed within 30sec.

I believe, finally we have fixed the issue working though collaboratively.

Log output
2024-10-01 01:57:24,753 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 waiting {“type”: “job”, “task_id”: 440, “state”: “waiting”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-10-01 01:57:24,823 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 pre run {“type”: “job”, “task_id”: 440, “state”: “pre_run”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-10-01 01:57:24,841 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 preparing playbook {“type”: “job”, “task_id”: 440, “state”: “preparing_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-10-01 01:57:24,895 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 running playbook {“type”: “job”, “task_id”: 440, “state”: “running_playbook”, “work_unit_id”: null, “task_name”: “Ping_test”}
2024-10-01 01:57:24,911 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 work unit id received {“type”: “job”, “task_id”: 440, “state”: “work_unit_id_received”, “work_unit_id”: “iIAFRoKv”, “task_name”: “Ping_test”}
2024-10-01 01:57:24,919 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 work unit id assigned {“type”: “job”, “task_id”: 440, “state”: “work_unit_id_assigned”, “work_unit_id”: “iIAFRoKv”, “task_name”: “Ping_test”}
2024-10-01 01:57:29,241 INFO [-] awx.analytics.job_lifecycle job-440 stats wrapup finished {“type”: “job”, “task_id”: 440, “state”: “stats_wrapup_finished”, “work_unit_id”: “iIAFRoKv”, “task_name”: “Ping_test”}
2024-10-01 01:57:29,428 INFO [c148bf5c538e444a8987801513312b4b] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 440
2024-10-01 01:57:29,429 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 post run {“type”: “job”, “task_id”: 440, “state”: “post_run”, “work_unit_id”: “iIAFRoKv”, “task_name”: “Ping_test”}
2024-10-01 01:57:29,478 INFO [c148bf5c538e444a8987801513312b4b] awx.analytics.job_lifecycle job-440 finalize run {“type”: “job”, “task_id”: 440, “state”: “finalize_run”, “work_unit_id”: “iIAFRoKv”, “task_name”: “Ping_test”}
2024-10-01 01:57:29,862 INFO [-] awx.main.wsrelay Starting producer for 10.42.0.9-job_events-440
2024-10-01 01:57:50,500 INFO [-] awx.main.wsrelay Producer 10.42.0.9-job_events-440 has no subscribers, shutting down.

Hi @kurokobo

When i checked internally my team, the Git repo size will increase gradually
Is there any way to optimize Git repo project reading/sync time so that it will buffer/cache the project and only delta changes would be get updated? Kindly advise on this.

Wow, this is one of the typical causes of your issue. I thought you had already considered this point so I didn’t mention this, but anyway, good catch :stuck_out_tongue:

Maybe no, there was a PR to specify depth: 1 for project update, but it was rejected: [#13711] git depth set to 1 by dariodsa · Pull Request #13717 · ansible/awx · GitHub

Even if the depth can be set to 1, the process of transferring the entire project directory from the awx-task pod to the automation job pod will still occur, so it probably won’t be very effective.

1 Like

It may be worth considering why your git repository is going to continue growing and perhaps seek out ways around it. I was recently doing a project where an ansible-vault encrypted file kept changing and needed to be re-encrypted and committed to the repository, this made the repo balloon very quickly since the encryption changes the entire file, which was kinda large to start with. So I just transferred the file to a persistent volume that would be attached to the job template.

Outside of your issue with Ansible, it does make for an unpleasant developer experience if they have to download huge changelogs everytime, and you are presumably consuming a lot of either on-prem (could be expensive) or Cloud space (super expensive) for a repository, which normally should just be code.

2 Likes

I agree that, rather than focusing on the AWX side, the correct approach is to improve the usage and design of the repositories, I think.

2 Likes