Automation node start up time tracking

What can delay automation node start-up time?

I have 2 instances of AWX up and running. When I run the same play in each instance the one completes the job in 7 seconds and the other in 14. In both instances, from the moment the first task of the play is run it’s only 2 seconds to completion. That means the first instance is spinning up the automation node in 5 seconds and the other in 12. I see seconds being taken by the spinup process on the slow instance and not on the fast instance. The 2 instances are installed on the same OpenShift cluster, both are 21.10.2, and both installed with the same scripts.

Slow:

023-07-11 18:15:32,679 INFO [.] awx.analytics.job_lifecycle job-8532 waiting
2023-07-11 18:15:32,849 INFO [.] awx.analytics.job_lifecycle job-8532 pre run
2023-07-11 18:15:34,762 INFO [.] awx.analytics.job_lifecycle job-8532 preparing playbook
2023-07-11 18:15:34,880 INFO [.] awx.analytics.job_lifecycle job-8532 running playbook
2023-07-11 18:15:38,016 INFO [.] awx.analytics.job_lifecycle job-8532 work unit id received
2023-07-11 18:15:38,067 INFO [.] awx.analytics.job_lifecycle job-8532 work unit id assigned
2023-07-11 18:15:46,340 INFO [.] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 8532
2023-07-11 18:15:46,356 INFO [.] awx.analytics.job_lifecycle job-8532 post run
2023-07-11 18:15:46,483 INFO [.] awx.analytics.job_lifecycle job-8532 finalize run
2023-07-11 18:15:47,349 INFO [-] awx.analytics.job_lifecycle job-8532 stats wrapup finished

Fast:

2023-07-11 18:17:53,513 INFO [.] awx.analytics.job_lifecycle job-147932 pre run
2023-07-11 18:17:53,623 INFO [.] awx.analytics.job_lifecycle job-147932 preparing playbook
2023-07-11 18:17:53,783 INFO [.] awx.analytics.job_lifecycle job-147932 running playbook
2023-07-11 18:17:53,838 INFO [.] awx.analytics.job_lifecycle job-147932 work unit id received
2023-07-11 18:17:53,872 INFO [.] awx.analytics.job_lifecycle job-147932 work unit id assigned
2023-07-11 18:17:59,640 INFO [.] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 147932
2023-07-11 18:17:59,650 INFO [.] awx.analytics.job_lifecycle job-147932 post run
2023-07-11 18:17:59,827 INFO [.] awx.analytics.job_lifecycle job-147932 finalize run
2023-07-11 18:17:59,881 INFO [-] awx.analytics.job_lifecycle job-147932 stats wrapup finished

Here are two more runs from the slow instance with DEBUG turned on. The run log of the first play says it started the first task at 18:55:10 and finished at 18:55:12 - 2 seconds. This log says it took 4 seconds to get the work unit ID, then only started receiving the job event objects at 18:55:10. The second log more clearly says the same thing. There’s a 6 second gap between the work unit being assigned and the first job events being received.

What might be happening between the work unit being assigned and the work working?

2023-07-11 18:54:56,953 INFO [.] awx.analytics.job_lifecycle job-8544 waiting
2023-07-11 18:54:56,962 DEBUG [.] awx.main.dispatch task 78e925d0-f53f-4951-b8ef-5d2cd9267008 starting awx.main.tasks.jobs.RunJob(*[8544])
2023-07-11 18:54:57,188 INFO [.] awx.analytics.job_lifecycle job-8544 pre run
2023-07-11 18:54:59,379 INFO [.] awx.analytics.job_lifecycle job-8544 preparing playbook
2023-07-11 18:54:59,504 INFO [.] awx.analytics.job_lifecycle job-8544 running playbook
2023-07-11 18:55:03,112 INFO [.] awx.analytics.job_lifecycle job-8544 work unit id received
2023-07-11 18:55:03,161 INFO [.] awx.analytics.job_lifecycle job-8544 work unit id assigned
2023-07-11 18:55:05,232 DEBUG [17d7f9bfa43d4a49a9ad275a95e42aed] awx.main.dispatch task 1b61d67b-9ec0-4885-85a7-8fc8f77be8dc starting awx.main.scheduler.tasks.task_manager(*[])
2023-07-11 18:55:05,234 DEBUG [17d7f9bfa43d4a49a9ad275a95e42aed] awx.main.dispatch task 864fffc8-dc43-4220-9031-a670d46ab384 starting awx.main.scheduler.tasks.dependency_manager(*[])
2023-07-11 18:55:05,234 DEBUG [17d7f9bfa43d4a49a9ad275a95e42aed] awx.main.dispatch task 2125d5a0-3119-45a1-92a3-50cf9d35c56d starting awx.main.analytics.analytics_tasks.send_subsystem_metrics(*[])
2023-07-11 18:55:10,183 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:10,192 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:10,684 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:10,686 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:10,689 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:10,689 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:12,441 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:12,441 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:12,442 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:12,443 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 18:55:12,581 DEBUG [-] awx.main.models.inventory Going to update inventory computed fields, pk=2
2023-07-11 18:55:12,595 DEBUG [-] awx.main.models.inventory Finished updating inventory computed fields, pk=2, in 0.014 seconds
2023-07-11 18:55:12,615 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(4)
2023-07-11 18:55:12,686 INFO [-] awx.analytics.job_lifecycle job-8544 stats wrapup finished
2023-07-11 18:55:12,695 INFO [.] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 8544
2023-07-11 18:55:12,710 INFO [.] awx.analytics.job_lifecycle job-8544 post run
2023-07-11 18:55:12,863 INFO [.] awx.analytics.job_lifecycle job-8544 finalize run
2023-07-11 18:55:12,870 DEBUG [.] awx.main.dispatch task bbc5d993-7860-4477-a2c9-43162fda3f4b starting awx.main.tasks.system.update_inventory_computed_fields(*[2])
2023-07-11 18:55:12,899 DEBUG [.] awx.main.models.inventory Going to update inventory computed fields, pk=2
2023-07-11 18:55:12,921 DEBUG [.] awx.main.models.inventory Finished updating inventory computed fields, pk=2, in 0.022 seconds
2023-07-11 18:55:13,249 DEBUG [.] awx.main.dispatch task 78e925d0-f53f-4951-b8ef-5d2cd9267008 starting awx.main.tasks.system.handle_work_success(*[])
...
2023-07-11 19:02:07,916 INFO [.] awx.analytics.job_lifecycle job-8546 waiting
2023-07-11 19:02:07,926 DEBUG [.] awx.main.dispatch task 683f076b-74e7-4f3f-966d-68e05ee50c69 starting awx.main.tasks.jobs.RunJob(*[8546])
2023-07-11 19:02:08,104 INFO [.] awx.analytics.job_lifecycle job-8546 pre run
2023-07-11 19:02:09,969 INFO [.] awx.analytics.job_lifecycle job-8546 preparing playbook
2023-07-11 19:02:10,072 INFO [.] awx.analytics.job_lifecycle job-8546 running playbook
2023-07-11 19:02:13,338 INFO [.] awx.analytics.job_lifecycle job-8546 work unit id received
2023-07-11 19:02:13,381 INFO [.] awx.analytics.job_lifecycle job-8546 work unit id assigned
2023-07-11 19:02:19,646 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:19,651 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:19,897 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:20,147 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:20,147 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:20,149 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:21,402 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:21,652 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:21,653 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:21,655 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:21,655 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2023-07-11 19:02:21,793 DEBUG [-] awx.main.models.inventory Going to update inventory computed fields, pk=2
2023-07-11 19:02:21,808 DEBUG [-] awx.main.models.inventory Finished updating inventory computed fields, pk=2, in 0.014 seconds
2023-07-11 19:02:21,826 DEBUG [-] awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(3)
2023-07-11 19:02:21,888 INFO [-] awx.analytics.job_lifecycle job-8546 stats wrapup finished
2023-07-11 19:02:21,905 INFO [.] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 8546
2023-07-11 19:02:21,921 INFO [.] awx.analytics.job_lifecycle job-8546 post run
2023-07-11 19:02:22,087 INFO [.] awx.analytics.job_lifecycle job-8546 finalize run
2023-07-11 19:02:22,100 DEBUG [.] awx.main.dispatch task 36fb7d09-fc17-452d-a1eb-a336303777ed starting awx.main.tasks.system.update_inventory_computed_fields(*[2])
2023-07-11 19:02:22,120 DEBUG [.] awx.main.models.inventory Going to update inventory computed fields, pk=2
2023-07-11 19:02:22,140 DEBUG [.] awx.main.models.inventory Finished updating inventory computed fields, pk=2, in 0.020 seconds
2023-07-11 19:02:22,469 DEBUG [.] awx.main.dispatch task 683f076b-74e7-4f3f-966d-68e05ee50c69 starting awx.main.tasks.system.handle_work_success(*[])

OK. I have my answer. The difference is the github repo behind the playbook. The repo of the slower instance is more than 40x larger than that of the faster instance. Fascinating. Must learn more about repo control.

Kevin

It’s just size. Sigh. There must be something I can do. My settings for the project are:
“scm_branch”: “”, “scm_refspec”: “”, “scm_clean”: false, “scm_track_submodules”: false, “scm_delete_on_update”: false, “credential”: 4, “timeout”: 0, “scm_revision”: “439c070dc3f65439aac96baf6d1a0766b5ec584f”, “last_job_run”: “2023-07-11T20:31:06.164818Z”, “last_job_failed”: false, “next_job_run”: null, “status”: “successful”, “organization”: 2, “scm_update_on_launch”: false, “scm_update_cache_timeout”: 0, “allow_override”: false, “custom_virtualenv”: null, “default_environment”: null, “signature_validation_credential”: null, “last_update_failed”: false, “last_updated”: “2023-07-11T20:31:06.164818Z”

curious, how much larger (total bytes) is your project A vs project B?

these files will have to be zipped up (in memory) and streamed via Receptor to the automation pod. This probably accounts for the timing difference, and would be expected in this case.

AWX Team

Sorry. This one slipped off the radar. The repo of the slower instance has about 40x as many documents. Doing compression on it and blowing away old branches did not help. It seems to be a pure size thing.

Kevin