AWX performance with Loop

Hello,

I’m using AWX 22 on a K3S server with 4vCPU and 8Gb of memory.

I have a playbook which connect to my vcenter and gather fact all VM (180), then a second task loop over the 180 vm and get information about snapshots.

This specific task is extremly slow, about 14h to check the 180 VM snapshots state.

My playbook:


   - name: Gather all registered virtual machines to retrieve uuid
      vmware_vm_info:
        hostname: 'my_vcenter'
        validate_certs: false
        show_tag: true
      delegate_to: localhost
      register: vminfo


    - name: Gather snapshot information about the virtual machine in the given vCenter based on machines uuid 
      community.vmware.vmware_guest_snapshot_info:
        hostname: 'my_vcenter'
        validate_certs: no
        datacenter: 'my_vcenter_dc'
        uuid: "{{ item.uuid }}"
      delegate_to: localhost
      loop: "{{ vminfo.virtual_machines }}"
      register: snapshot_info
      changed_when: '"snapshots" in snapshot_info.guest_snapshots'

How can i speed up the process ?

I tweak the automation container to have 1gb of ram instead of 100mb by default but not difference.

Thank you

Does the vmware collection have a way to pull data about all snapshots, instead of just one? If so, it would probably be faster to pull data from all snapshots at once. Then you don’t have to make multiple API calls from Ansible to VMWare.

Your playbook seems to be cycling through all VMs and not a filtered list anyway. So, pulling all data about snapshots is no different.

I follow this ansible documentation but seems to be not possible to select all VM:
https://docs.ansible.com/ansible/latest/collections/community/vmware/vmware_guest_snapshot_info_module.html#ansible-collections-community-vmware-vmware-guest-snapshot-info-module

Strange thing, when i ran this playbook directly in command line, it’s very fast !

When you run from CLI, is that CLI on the AWX server or CLI from a different system?

A different VM but on the same vlan, same vcenter and same specs

Hello !

I updated my playbook to make a list with less data like this:

>    - name: Gather all registered virtual machines to retrieve uuid
>       vmware_vm_info:
>         hostname: 'xxxxx'
>         validate_certs: false
>         show_tag: true
>       delegate_to: localhost
>       register: vminfo
> 
>     - name: Build uuid + guest_name list
>       set_fact:
>         fact_vm: "{{ fact_vm | default([]) + [{'uuid': item.uuid, 'guest_name': item.guest_name}] }}"
>       loop: "{{ vminfo.virtual_machines }}"
>       when: item.uuid is defined
> 
>     - name: Ensure UUID list is unique
>       set_fact:
>         fact_vm: "{{ fact_vm | unique }}"
> 
> ##### GET INFORMATIONS ABOUT SNAPSHOT TASK
>     - name: Gather snapshot information about the virtual machine in the given vCenter based on machines uuid 
>       community.vmware.vmware_guest_snapshot_info:
>         hostname: 'xxxx'
>         validate_certs: no
>         datacenter: 'xxxx'
>         uuid: "{{ item.uuid }}"
>       delegate_to: localhost
>       loop: "{{ fact_vm }}"
>       register: snapshot_info
>       changed_when: '"snapshots" in snapshot_info.guest_snapshots'
>       tags: check

Very fast directly from my node but very slow from awx in the automation container :frowning:

I tried from awx with verbosity to debug with this result:

TASK [Gather snapshot information about the virtual machine in the given vCenter based on machines uuid] ***
task path: /runner/project/specifics_playbooks/ESX_Manage_Snapshots.yml:30
Using module file /usr/share/ansible/collections/ansible_collections/community/vmware/plugins/modules/vmware_guest_snapshot_info.py
Pipelining is enabled.
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: 1000
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python3 && sleep 0'
ok: [localhost] => (item={'uuid': '42xxx87-xxx-eccb-d790-dxxxx0f939e', 'guest_name': 'XXXX'}) => {
    "ansible_loop_var": "item",
    "changed": false,
    "guest_snapshots": {},

I’m still pretty green with AWX and AAP/Tower. So, I’m not sure how to get statistics and performance metrics specific to the K3s cluster componets, but a couple things I would check:

  1. From the CLI of your K3s node, how long does it take for your to resolve the hostname of the vcenter server. Use the ping command, and estimate the amount of time between pressing enter and then seeing the PING my_vcenter (1.1.1.1) 56(84) bytes of data line.
    I doubt Ansible would be doing a DNS lookup on each iteration of the loop, but it’s worth checking if your DNS resolution is slow for some reason.
  2. In the output of the ping above do you see packet loss?
  3. What’s the average PING response time?
  4. What do the Memory, CPU, and HDD utilization statistics look like when executing the playbook?

Hello, about AWX metrics; there is this Project Discussions with details that could help to investigate where @Gokusan31 issue might be:

Cheers!

Hello Guys!

Thank a lot for you suggestions !!

Dustin is right, i have a dns resolution problem on my containers.

I put ip of my vcenter in my playbook instead of dns name as a workaround and now i got a run in 4min instead of 14 hours !!!

ping is not present in my ee image

Thank you

2 Likes

Hurray!!! I solved my first problem on the forums. Wish they had a badge or something for it. hahaha

2 Likes

Hahaha congrats! actually there is a badge… For that though, @Gokusan31 should mark the solved check :white_check_mark: on the @Dustin 's post that provided the solution :wink:

1 Like