Read timeout when using WinRM to apply Windows patches on Azure

Hello guys:

I’m currently working with Ansible 2.8.5 to apply Windows patches through WinRM connections. This is a portion of my code:

  • name: Instalar actualizaciones 1
    win_updates:
    category_names: “{{ windows_update_categories }}”
    register: updates
    vars:
    ansible_winrm_operation_timeout_sec: 120
    ansible_winrm_read_timeout_sec: 150
    failed_when:
    updates.failed_update_count is defined and
    updates.failed_update_count > 0 and
    1 == 2

I’m attempting to patch my Windows 2012 R2 as part of an Image creation using packer. This is the list of WinRM variables I use for connecting to Windows VM:

ansible_user: packer
ansible_connection: winrm
ansible_winrm_server_cert_validation: ignore
ansible_port: 5986
ansible_winrm_connection_timeout: 1800
ansible_winrm_operation_timeout_sec: 1800
ansible_winrm_read_timeout_sec: 1800
ansible_winrm_transport: ntlm

Those “timeout” variables were recently added as a way to deal with the following error message I’m constantly getting at this same task:

azure-arm: TASK [canvia.os-update : Instalar actualizaciones 1] ***************************
azure-arm: fatal: [13.92.97.72]: UNREACHABLE! => {“changed”: false, “msg”: “winrm connection error: HTTPSConnectionPool(host=‘13.92.97.72’, port=5986): Read timed out. (read timeout=1801)”, “unreachable”: true}

I don’t have any issues when running this same playbook on Windows 2019 on Azure or any Windows version (2012, 2016, 2019) on AWS. I’m only experiencing this issue with Windows 2012 and 2016 on Azure.

Have somebody experienced anything similar before? I hope someone can give me any ideas about this.

Thanks in advance

I have had issues with single core machines when windows updates include an upgrade to the dot net version. What happens is ngen recompiles all the dot net code it can find and ties up an entire core until has finished recompiling, which leaves no cpu time for winrm. Dual core boxes aren't as badly affected as ngen only ties up one core.

Even if it is not ngen, perhaps it would be worth trying to find out what the vms are doing at the moment your connection fails.

Hope this helps,

Jon

Thank you so much buddy. Actually, I’m working with Standard B2ms VMs (2 vCPUs, 8 GB RAM), but this time I’ll try with a larger machine.

What seemed strange is the fact that CPU utilization was really low (almost 0%) during several minutes before Ansible failure appears. I got this info from Azure metrics for this VM.

I’ll share my results once I try again

I give up: even with a DS3_v2 (4 cores, 14 GB memory) VM, it fails with the same error message. I believe this has no relation with system resources, but maybe a Windows and/or Azure specific bug, 'cause it doesn’t happen on AWS.