We’re currently using Ansible 2.9.15 through Tower 3.8.0. We’re seeing an issue when targeting new builds on Window Server 2016 through WinRM.
The issue happens at the start of the Playbook and soon after a Join Domain + Reboot Step. It often happens in this order – although there are some discrepancies at exactly which point we see the problem:
win_domain_membership
win_reboot
win_firewall:
state: disabled
profiles:
-
Domain
-
Private
-
Public
win_regedit:
path: HKLM:\SYSTEM\CurrentControlSet\services\TCPIP6\Parameters
name: DisabledComponents
data: 0xffffffff
type: dword
state: present
win_command: netsh int tcp set global chimney=disabled
win_command: netsh int tcp set global rss=disabled
win_regedit:
path: HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
name: EnableTCPA
data: 0
type: DWord
state: present
Quite often Ansible will get stuck on the last task with:
[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover:
ConnectionError (‘Connection aborted.’, OSError(“(104, ‘ECONNRESET’)”,))
There are quite a lot changes going on the server at this point but I can’t see any obvious issues that tie up to the time of the error on Ansible.
It never seems to recover from this and I have to Cancel the job. If I re-run the job everything works fine. I can see that there’s a lot going on with the Network stack at the time which may cause an issue but I’d like to see if I can narrow it down to see if we can handle the issue or ideally get Ansible to recover/retry on its own. Are there any suggestions on what the issue may be or on how to best proceed with debugging?
Many thanks.
Best regards,
Panos