Intermittent Kerberos Issues and Unable to Fetch Files Greater than 500MB

Hi,
I’m having a couple of issues at the moment.
The main one being that running any playbook (or even direct ansible command) against a windows domain joined server (2008 R2/2012/2012 R2) sometimes fails with the below error, yet will then work if run again straight away.

fatal: [SERVER.DOMAIN.NAME]: UNREACHABLE! => {“changed”: false, “msg”: “Kerberos auth failure: kinit: Cannot contact any KDC for realm ‘SERVER.DOMAIN.NAME’ while getting initial credentials”, “unreachable”: true}

I’m also getting the following error when trying to Fetch a file over 500MB from a domain joined Windows server:

Friday 17 November 2017 11:23:32 +0000 (0:00:00.382) 0:02:33.701 *******
Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/ansible/plugins/connection/winrm.py”, line 513, in fetch_file
result = self._winrm_exec(cmd_parts[0], cmd_parts[1:])
File “/usr/lib/python2.7/site-packages/ansible/plugins/connection/winrm.py”, line 296, in _winrm_exec
self.protocol.cleanup_command(self.shell_id, command_id)
File “/usr/lib/python2.7/site-packages/winrm/protocol.py”, line 307, in cleanup_command
res = self.send_message(xmltodict.unparse(req))
File “/usr/lib/python2.7/site-packages/winrm/protocol.py”, line 207, in send_message
return self.transport.send_message(message)
File “/usr/lib/python2.7/site-packages/winrm/transport.py”, line 184, in send_message
response = self.session.send(prepared_request, timeout=self.read_timeout_sec)
File “/usr/lib/python2.7/site-packages/requests/sessions.py”, line 625, in send
r = dispatch_hook(‘response’, hooks, r, **kwargs)
File “/usr/lib/python2.7/site-packages/requests/hooks.py”, line 31, in dispatch_hook
hook_data = hook(hook_data, **kwargs)
File "/usr/lib/python2.7/site-packages/requests_kerberos/kerberos
.py", line 294, in handle_response
r = self.handle_other(response)
File "/usr/lib/python2.7/site-packages/requests_kerberos/kerberos
.py", line 217, in handle_other
“{0}”.format(response))
MutualAuthenticationError: Unable to authenticate <Response [200]>
fatal: [SERVER.DOMAIN.NAME]: FAILED! => {“failed”: true, “msg”: “failed to transfer file to "/staging/500MB.zip"”}

msg: failed to transfer file to “/staging/500MB.zip”

Ansible Version:

ansible --version
ansible 2.3.2.0
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides
python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

Any suggestions?

Thanks,
Richie

Not had that particular error but I know from experience that kerberos is particularly dependent on DNS working reliably and also clock synchronisation.

I suggest checking that your ansible controller is able to nslookup your domain controller machines reliably. Something like the following (not tested).

while true
do
nslookup domaincontroller1
nslookup domaincontroller2
sleep 3
done

just to see if the names resolve every time. I’d probably experiment and try pings and netstat -s to look for retransmissions/ packet loss in case part of your network is overloaded.

Also I guess its worth checking if your domain controllers are busy with some kind of load (such as running automatic maintenance which loves to wipe out available CPU on single core machines).

Probably worth checking the ansible controller and domain controller clocks are in sync but if I recall you get a different message when your tickets are outside their validity period.

Sorry its not a straight there answer but hopefully gives you some ideas to investigate.

All the best,

Jon