vmware_guest Trying to deploy 2 VM simulteaneously. One will almost always fail

Hi,

We are experimenting with Ansible to deploy VM and, I am not sure what is the problem, but when I try to deploy 2 VM at the same time, one will fails 99% of the time, while the other will complete.

The one that has the problem is actually stuck in the customization phase, and then the playbook will fail as Ansible tries to continue with the following plays, but of course those fails since the OS is stuck in la-la land. The OS is Windows 2016.

Strangely, I didn’t have this problem 3 weeks ago, but in the meantime we updated to Ansible 2.6.1. I downgraded to 2.5 but the problem stayed.

Also, if I try to deploy one Vm, then the other, it works fine, so the problem is really when they are deployed at the same time.

The initial deployment from the Template works, but the customization, where the VM is given it’s true IP, DNS and joins the domain, is the place where it fails.

And now, the real kicker: If I try 3 vm at the same time, it goes through without problem and all 3 completes succesfully…(!)

I’m baffled.

When it fails, in Vmware, there is always an alert on the problematic VM about a VM MAC conflict. But when I look at it’s MAC address, it’s not a duplicate.

Any leads?

Here is the playbook

  • hosts: all
    gather_facts: false

vars_prompt:

  • name: “notes”
    prompt: “VM notes”
    private: no
    default: “Deployed with ansible”

roles:

  • deploy_vmware_guest
  • activate_winrm
  • config_lcm

Here is the main task.

get date

  • set_fact: creationdate=“{{lookup(‘pipe’,‘date “+%Y/%m/%d %H:%M”’)}}”

  • name: Create a VM from a template
    delegate_to: localhost
    vmware_guest:
    hostname: ‘{{ vsphere_host }}’
    username:
    password: “{{ admin_pass }}”
    validate_certs: no
    esxi_hostname:
    datacenter: SECURSANTE-TECNO
    folder: testvm
    name: ‘{{ inventory_hostname }}’
    annotation: “{{ notes }} - {{ creationdate }} - {{ inventory_hostname }}”
    state: poweredon
    template: TMPL-W2016-STD-18-02-07_L12018
    disk:

  • size_gb: 80
    type: thin
    datastore: ‘{{ datastore_1 }}’

  • size_gb: 20
    type: thin
    datastore: ‘{{ datastore_1 }}’

  • size_gb: 4
    type: thin
    datastore: ‘{{ datastore_1 }}’

  • size_gb: 4
    type: thin
    datastore: ‘{{ datastore_1 }}’

  • size_gb: 4
    type: thin
    datastore: ‘{{ datastore_1 }}’

  • size_gb: 4
    type: thin
    datastore: ‘{{ datastore_1 }}’
    hardware:
    memory_mb: 2048
    num_cpus: 2
    networks:

  • name: DVSTECA-531-S4-TECNOA-EXP-ACC
    ip: ‘{{ vm_ip }}’
    netmask: 255.255.255.0
    gateway: 172.24.131.254
    dns_servers:

  • 172.24.134.8

  • 172.24.134.9
    #mac: ‘{{ vm_mac }}’
    customization:
    dns_servers:

  • 172.24.134.8

  • 172.24.134.9
    dns_suffix: “{{ domain_t_suffix }}”
    domain: “{{ domain_t }}”
    password: “{{ admin_pass }}”
    joindomain: “{{ domain_t }}”
    domainadmin:
    domainadminpassword: “{{ admin_pass }}”
    hostname: ‘{{ inventory_hostname }}’
    wait_for_ip_address: yes

vars:
admin_pass:

  • name: sleep for 300 seconds and continue with play #to give enough time to join the domain since Ansible doesn’t wait for that
    wait_for: timeout=300
    delegate_to: localhost

To anyone else that might encounter this problem, I found this thread: https://groups.google.com/forum/#!searchin/ansible-project/vm$20stuck|sort:date/ansible-project/H_0FhXkm2ns/-TvY5Fc1AgAJ

which mention this article : http://www.hurryupandwait.io/blog/getting-readytroubleshooting-unattended-windows-installation

It helped me to debug the problem and it’s not related to ansible.

My best guest is that the MAC conflict happens at the worst of time and prevent the correct domain discovery. Maybe if there was a way to include a small delay in task to ensure the MAC address change happens before, but that would be though.

Thanks.

After reading what seemed like a bazillions support thread on Vmware, I managed to make the sequence work with this workaround:

1- Make sure the NIC in the template is “disabled”
2- In the playbook, in your vmware_guest task, under networks, make sure to add start_connected: yes
networks:
start_connected: yes

The sequence should work fine after that. The slight delay to reactivate the NIC seems enough to make sure the guestOs won’t start trying to join the domain at the same time there are changes made to the networking parts of the guest.