Issue on Ansible Playbook Run When Calling WinRM - ConnectionError ('Connection aborted.', OSError("(104, 'ECONNRESET')",))

We’re currently using Ansible 2.9.15 through Tower 3.8.0. We’re seeing an issue when targeting new builds on Window Server 2016 through WinRM.

The issue happens at the start of the Playbook and soon after a Join Domain + Reboot Step. It often happens in this order – although there are some discrepancies at exactly which point we see the problem:

win_domain_membership

win_reboot

win_firewall:

state: disabled

profiles:

  • Domain

  • Private

  • Public

win_regedit:

path: HKLM:\SYSTEM\CurrentControlSet\services\TCPIP6\Parameters

name: DisabledComponents

data: 0xffffffff

type: dword

state: present

win_command: netsh int tcp set global chimney=disabled

win_command: netsh int tcp set global rss=disabled

win_regedit:

path: HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

name: EnableTCPA

data: 0

type: DWord

state: present

Quite often Ansible will get stuck on the last task with:

[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover:

ConnectionError (‘Connection aborted.’, OSError(“(104, ‘ECONNRESET’)”,))

There are quite a lot changes going on the server at this point but I can’t see any obvious issues that tie up to the time of the error on Ansible.

It never seems to recover from this and I have to Cancel the job. If I re-run the job everything works fine. I can see that there’s a lot going on with the Network stack at the time which may cause an issue but I’d like to see if I can narrow it down to see if we can handle the issue or ideally get Ansible to recover/retry on its own. Are there any suggestions on what the issue may be or on how to best proceed with debugging?

Many thanks.

Best regards,

Panos

Without any knowledge of windows at all, connection abortion after/during a task that changes something with Tcpip\Parameters in its name… sounds like sawing off the branch that you’re sitting on.
In that sense I’m surprised it doesn’t happen all the time.

Isn’t there a way to have such a change applied at next boot time ?

If I could narrow down exactly which change is causing the problem I can wrap it up/run it separately. The problem is that it’s not easily reproducible and the task it actually fails on (win_regedit) is pretty innocuous.

If I could get Ansible to error/carry on it would also be a start. But the issue is that Ansible sits there waiting for something.

Best regards,

Panos