Unusual WinRM connection issue

I have a selected few workgroup Windows server 2012 R2 servers that give the following error:

<10.128.44.37> ESTABLISH WINRM CONNECTION FOR USER: ansible_user on PORT 5986 TO 10.128.44.37
server_101 | UNREACHABLE! => {
“changed”: false,
“msg”: “ntlm: (‘Connection aborted.’, error(104, ‘Connection reset by peer’))”,
“unreachable”: true
}

I am using ntlm with Ansible 2.1.0.0 and pywinrm [kerberos] 2RC4. I have tested the port is open, recreated the listeners, run a curl to the server which delivers a successful 411 response.
Any ideas on further troubleshooting?

Hey Mike,

Unfortunately pywinrm currently has zero logging/diagnostic capabilities (something I’d like to correct for troubleshooting stuff like this). Meantime…

A couple of things to try:

  • Does it work with Basic auth and a local user on that same box?
  • Any chance you could run with Fiddler in the middle? Just run Fiddler on some Windows box, configure it to capture/decrypt HTTPS and to allow external connection, then on your Ansible controller, export HTTPS_PROXY=http://(ip-of-fiddler-box):8888/ and go watch the fun.

I’m mostly just curious where the connection reset is occurring, as there are numerous round-trips involved here (eg, is it NTLM auth failure, resource issue, or something else?).

Thanks,

-Matt

For testing locally I’m assuming you mean Test-WSMan -Authentication Basic -Credential ? I am currently connecting on 5986 with ignore certificate validation turned on.
So in that case I would add -UseSSL switch on the Test-WSMan. Currently running Test-WSMan -Authentication Basic -Credential gives:

Test-WSMAN : <f:WSManFault xmlns:f=“http://schemas.microsoft.com/wbem/wsman/1/wsmanfault” Code=“2150858974” Machine=“Server101”><f:Message>The WinRM client cannot process the request. Unencrypted traffic is currently disabled in the client configuration. Change the client configuration and try the request again. </f:Message></f:WSManFault>
At line:1 char:1

Normally I would say that would mean mean configuring AllowUnencrypted on Winrm Client, however the other working systems do not have this configured.

Running Test-WSMAN -Authentication Negotiate -Credential “” -ComputerName localhost returns:

wsmid : http://schemas.dmtf.org/wbem/wsman/identity/1/wsmanidentity.xsd
ProtocolVersion : http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor : Microsoft Corporation
ProductVersion : OS: 6.3.9600 SP: 0.0 Stack: 3.0

I will try the Fiddler method shortly and return the results.

Seems a little odd but having set the HTTPS_PROXY to the fiddler box, when I run a win_ping to the problem server it does not register any connection in fiddler.

Sorry, by “local user” I just meant using a non-domain user via pywinrm/Ansible. But yeah, for Basic to work, you’d have to (temporarily) enable unencrypted auth with something like:

Set-Item WSMan:\localhost\Service\AllowUnencrypted $true

The HTTPS_PROXY not working seems odd- I use it dozens of times a day… Sure you’ve got it exported? The problem is almost certainly on the control-machine side, as it’d just hang if the envvar worked and Fiddler wasn’t configured properly.

Actually I had to type

winrm set winrm/config/service ‘@{AllowUnencrypted=“true”}’

before it would work for me.

You can also try to run the below PS script on the hosts to ensure all the WinRm options have been taken care of to allow Ansible to connect to it.

https://github.com/ansible/ansible/blob/devel/examples/scripts/ConfigureRemotingForAnsible.ps1

I’m beginning to think this might be as a result of the problem servers being templated in VMWare perhaps?

Interesting.

This change was recently added so you can force the ConfigureRemotingForAnsible.ps1 to generate a new self-signed cert by running like this:

.\ConfigureRemotingForAnsible.ps1 -ForceNewSSLCert true

https://github.com/ansible/ansible/pull/15275

As its says in the PR ‘This is necessary when a CN name changes and the self-signed cert is no longer valid and winRM is not allowing a connection because of winRM SSL validation errors.’

Hope this helps,

Jon

Thanks Jon, good to see it’s being well maintained. Had already gone down the route of the self-signed cert via Powershell unfortunately.
I ran the ConfigureForAnsible.ps1 just in case I had missed something. Seems like the same issue though:

<xx.xx.xx.xx> ESTABLISH WINRM CONNECTION FOR USER: ansible_user@DOMAIN on PORT 5986 TO xx.xx.xx.xx
Server.domain | UNREACHABLE! => {
“changed”: false,
“msg”: “ntlm: (‘Connection aborted.’, error(104, ‘Connection reset by peer’))”,
“unreachable”: true
}

Anything in the event logs? Since it seems to be a connection reset, I’d hope there might be a message on the windows machine to say why.

If you are referring to cloning a Windows machine without proper sysprep usage then that’s very well possible. I remember seeing some WinRM blogs where people had problems due to duplicate SIDs … not 100% sure though.

Yes have seen the articles but this was a properly sysprepped template. Have recreated listeners, changed self-signed cert and still seems to yield the same result.

Jon any particular logs I should focus on? The Windows Remote Management and security logs don’t seem to show anything out of the ordinary.

Sorry, I don’t have a specific suggestion where to look. Sometimes I toss all the event logs and then poke things rather than filter for a specific event category.

One of my colleagues tells me there’s an rc6 for pywinrm 0.2 - might be worth trying that if you aren’t on it already.

Seriously- best thing you could do is figure out why Fiddler isn’t working for you and get a trace… Knowing where it’s failing in the process can really narrow some things down.

I would troubleshoot the windows side first. Are you able to psremote from a windows node to the “problem” node using 5986 (ssl)?

09:12:58:4855 fiddler.network.https> HTTPS handshake to 10.128.44.38 (for #2) failed. System.ComponentModel.Win32Exception The client and server cannot communicate, because they do not possess a common algorithm

09:13:34:4067 fiddler.network.https> HTTPS handshake to 10.128.44.38 (for #3) failed. System.ComponentModel.Win32Exception The client and server cannot communicate, because they do not possess a common algorithm

09:17:40:7434 fiddler.network.https> HTTPS handshake to 10.128.44.38 (for #4) failed. System.ComponentModel.Win32Exception The client and server cannot communicate, because they do not possess a common algorithm

09:18:08:8209 fiddler.network.https> HTTPS handshake to 10.128.44.38 (for #5) failed. System.ComponentModel.Win32Exception The client and server cannot communicate, because they do not possess a common algorithm

09:21:23:7477 fiddler.network.https> HTTPS handshake to 10.128.44.38 (for #6) failed. System.ComponentModel.Win32Exception The client and server cannot communicate, because they do not possess a common algorithm

14:38:02:7271 fiddler.network.https> HTTPS handshake to 10.128.44.37 (for #2) failed. System.ComponentModel.Win32Exception The client and server cannot communicate, because they do not possess a common algorithm

I should probably add that to be FIPS 140-2 compliant the server have the following:
Protocols: TLS 1.0, TLS 1.1, TLS 1.2
Ciphers Enabled: Triple DES 168, AES 128/128, AES 256/256
Hashes Enabled: SHA, SHA 256, SHA 384, SHA 512
Key Exchanges: PKCS, ECDH

SSL Cipher Suite Order changed:
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P521,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P521,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P521,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P521,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384_P521,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384_P384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P521,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384_P521,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384_P384,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA_P521,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA_P384,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA_P256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256_P521,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256_P384,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256_P256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA_P521,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA_P384,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA_P256,TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,TLS_DHE_DSS_WITH_AES_256_CBC_SHA,TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,TLS_DHE_DSS_WITH_AES_128_CBC_SHA,TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA

It might also help to add that all the servers it seems to be failing on are Windows Server 2012 R2 with IIS installed and a few sites with different SSL Certificates installed.

So courtesy of a few colleagues we have a solution. By specifying the fqdn in the inventory rather than the ip, and making sure the Ansible control machine could resolve the fqdn to the ip, the connection is now successful.