I’m using AWX 22 on a K3S server with 4vCPU and 8Gb of memory.
I have a playbook which connect to my vcenter and gather fact all VM (180), then a second task loop over the 180 vm and get information about snapshots.
This specific task is extremly slow, about 14h to check the 180 VM snapshots state.
My playbook:
- name: Gather all registered virtual machines to retrieve uuid
vmware_vm_info:
hostname: 'my_vcenter'
validate_certs: false
show_tag: true
delegate_to: localhost
register: vminfo
- name: Gather snapshot information about the virtual machine in the given vCenter based on machines uuid
community.vmware.vmware_guest_snapshot_info:
hostname: 'my_vcenter'
validate_certs: no
datacenter: 'my_vcenter_dc'
uuid: "{{ item.uuid }}"
delegate_to: localhost
loop: "{{ vminfo.virtual_machines }}"
register: snapshot_info
changed_when: '"snapshots" in snapshot_info.guest_snapshots'
How can i speed up the process ?
I tweak the automation container to have 1gb of ram instead of 100mb by default but not difference.
Does the vmware collection have a way to pull data about all snapshots, instead of just one? If so, it would probably be faster to pull data from all snapshots at once. Then you don’t have to make multiple API calls from Ansible to VMWare.
Your playbook seems to be cycling through all VMs and not a filtered list anyway. So, pulling all data about snapshots is no different.
I updated my playbook to make a list with less data like this:
> - name: Gather all registered virtual machines to retrieve uuid
> vmware_vm_info:
> hostname: 'xxxxx'
> validate_certs: false
> show_tag: true
> delegate_to: localhost
> register: vminfo
>
> - name: Build uuid + guest_name list
> set_fact:
> fact_vm: "{{ fact_vm | default([]) + [{'uuid': item.uuid, 'guest_name': item.guest_name}] }}"
> loop: "{{ vminfo.virtual_machines }}"
> when: item.uuid is defined
>
> - name: Ensure UUID list is unique
> set_fact:
> fact_vm: "{{ fact_vm | unique }}"
>
> ##### GET INFORMATIONS ABOUT SNAPSHOT TASK
> - name: Gather snapshot information about the virtual machine in the given vCenter based on machines uuid
> community.vmware.vmware_guest_snapshot_info:
> hostname: 'xxxx'
> validate_certs: no
> datacenter: 'xxxx'
> uuid: "{{ item.uuid }}"
> delegate_to: localhost
> loop: "{{ fact_vm }}"
> register: snapshot_info
> changed_when: '"snapshots" in snapshot_info.guest_snapshots'
> tags: check
Very fast directly from my node but very slow from awx in the automation container
I tried from awx with verbosity to debug with this result:
TASK [Gather snapshot information about the virtual machine in the given vCenter based on machines uuid] ***
task path: /runner/project/specifics_playbooks/ESX_Manage_Snapshots.yml:30
Using module file /usr/share/ansible/collections/ansible_collections/community/vmware/plugins/modules/vmware_guest_snapshot_info.py
Pipelining is enabled.
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: 1000
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python3 && sleep 0'
ok: [localhost] => (item={'uuid': '42xxx87-xxx-eccb-d790-dxxxx0f939e', 'guest_name': 'XXXX'}) => {
"ansible_loop_var": "item",
"changed": false,
"guest_snapshots": {},
I’m still pretty green with AWX and AAP/Tower. So, I’m not sure how to get statistics and performance metrics specific to the K3s cluster componets, but a couple things I would check:
From the CLI of your K3s node, how long does it take for your to resolve the hostname of the vcenter server. Use the ping command, and estimate the amount of time between pressing enter and then seeing the PING my_vcenter (1.1.1.1) 56(84) bytes of data line.
I doubt Ansible would be doing a DNS lookup on each iteration of the loop, but it’s worth checking if your DNS resolution is slow for some reason.
In the output of the ping above do you see packet loss?
What’s the average PING response time?
What do the Memory, CPU, and HDD utilization statistics look like when executing the playbook?