Getting WinRM connection refused error randomly

Hi

I have a playbook which does few actions on windows machine. Few actions will get completed and then playbook exits with

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: WinRMTransportError: 500 WinRMTransport. [Errno 111] Connection refused

Machine OS is Windows 2012 R2 standard edition.

The error is coming randomly, its not for particular action. Sometimes all actions get through fine. Is there any timeout value something which needs to be increased ?

Thanks
Deepa

Any help on this ?

Thanks

Deepa

I have a couple of suggestions…

If you are using kerberos, make sure your ansible controller’s time is synchronized with your domain controllers and also check that nslookup and ping return the correct hostname and ip address - kerberos depends on both the clock being right and being able to trust DNS, so worth verifying those things.

Otherwise I’d maybe try the following:

Use the event viewer on the destination windows machine to see if there is anything there which might explain the ‘Connection refused’ message

Ensure you are using ansible 2.0 or later, and pywinrm 0.11 (or later) - its much faster when targetting windows hosts than 1.9.x was so less likely to hit timeouts.

Ensure your ansible controller and windows machine are ‘near’ to each other in terms of networking. Its best to avoid lots of network hops between the ansible controller and the machines you are controlling (general advice for Ansible, not specific to windows).

Look for a firewall that could be chopping your connection.

I hope something from the above might help

Jon

Thanks for the inputs.

I noticed Event viewer.The WinRM service is not listening for WS-Management requests.

User Action
If you did not intentionally stop the service, use the following command to see the WinRM configuration:

winrm enumerate winrm/config/listener

Let me explain the things what I have done.

  1. We provisioned a machine say hostname1, Ran ConfigureAnsibleForRemoting.ps1
  2. Took standard image of that machine.
    3.Provisioned new machine say hostname2 using that template
  3. Noticing this winrm connection refused error randomly.
    As per the event viewer, suggestion ran winrm enumerate winrm/config/listener
    I see listener running on port 5986 has hostname mentioned as hostname1 . Will that be a reason for these connection issues ?

We are Configuring Ansibleremoting first and then taking image as we could not get an option to run this ConfigureAnsibleForRemoting powershell script at runtime. As these machines are provisioned runtime.

It could be- I’ve heard reports that cloning the box after enabling wsman can cause issues if you don’t reconfigure stuff (see https://github.com/ansible/ansible/pull/15275), though I’ve not experienced this issue myself.

Regardless, it appears that the issue lies somewhere with the machine’s configuration, not on the the Ansible side. Both the event logs and the occasional “Connection Refused” corroborate this, as a refused connection is not a timeout- it’s the target machine actively saying “go away”.

I’m curious if it starts working smoothly if you use the version of the script in the pull request above on a machine that’s having trouble (run it with -ForceNewSSLCert true). Seems odd that it would work sporadically though.

If the updated script does fix the issue, you’d need to rig up a way to run it on first boot of the new machine (you didn’t specify your virtualization/cloning tech, but lots of ways to do that).

Thanks for the inputs. With new script, I see connection failure reduced significantly