(I’ve posted a bit about this before, but I want to revisit it because its frustrating as I try to optimize my playbooks)
I have a playbook where I build servers from vmware templates using vmware_guest and I join the domain using that module. Once the servers are built I have an extremely long “wait_for_connection”:
- name: Wait until server becomes available to connect
wait_for_connection:
delay: 900 #Wait 10 minutes before trying
sleep: 30 #After 10 minutes, try every 30 seconds
timeout: 1200 #Maximum amount of time to wait
After this wait, I start running tasks on the new hosts. Initially, those tasks will run fine, but one-by-one, randomly, the servers will start failing with Kerberos errors. During this time I can confirm im able to login to these servers using the same credentials, so the authentication doesn’t seem to be failing outside of ansible, but it fails within ansible for some reason.
The longer I wait after building the servers, the less likely this issue occurs. It just seems insane that I have to keep adding more wait time.
Here’s me running the playbook against 4 servers. Each task runs against all four servers but the red lines highlighed show the kerberos failures and the eventual atrophy of the playbook entirely because of the kerberos errors:
TASK [Registry fix to enable solution for CVE-2017-8529 Part 1] ****************
Monday 08 June 2020 16:32:22 +0000 (0:00:09.368) 0:33:29.081 ***********
changed: [server4.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
changed: [server1.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
changed: [server3.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
changed: [server2.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
TASK [Registry fix to enable solution for CVE-2017-8529 Part 2] ****************
Monday 08 June 2020 16:32:25 +0000 (0:00:03.635) 0:33:32.717 ***********
changed: [server1.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
changed: [server4.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
changed: [server2.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
changed: [server3.fqdn] => {“changed”: true, “data_changed”: false, “data_type_changed”: false}
TASK Configure UAC] *************************************************************
Monday 08 June 2020 16:32:29 +0000 (0:00:03.388) 0:33:36.105 ***********
fatal: [server3.fqdn]: UNREACHABLE! => {“changed”: false, “msg”: “kerberos: the specified credentials were rejected by the server”, “unreachable”: true}
changed: [server1.fqdn] => {“changed”: true, “data_changed”: true, “data_type_changed”: false}
changed: [server2.fqdn] => {“changed”: true, “data_changed”: true, “data_type_changed”: false}
changed: [server4.fqdn] => {“changed”: true, “data_changed”: true, “data_type_changed”: false}
TASK [Initialize Disk 1] *******************************************************
Monday 08 June 2020 16:32:32 +0000 (0:00:03.335) 0:33:39.440 ***********
changed: [server4.fqdn] => {“changed”: true, “cmd”: “Initialize-Disk -Number 1”, “delta”: “0:00:04.105311”, “end”: “2020-06-08 04:32:39.137372”, “rc”: 0, “start”: “2020-06-08 04:32:35.032060”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }
changed: [server1.fqdn] => {“changed”: true, “cmd”: “Initialize-Disk -Number 1”, “delta”: “0:00:03.903042”, “end”: “2020-06-08 04:32:39.527549”, “rc”: 0, “start”: “2020-06-08 04:32:35.624506”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }
changed: [server2.fqdn] => {“changed”: true, “cmd”: “Initialize-Disk -Number 1”, “delta”: “0:00:05.007749”, “end”: “2020-06-08 04:32:40.903429”, “rc”: 0, “start”: “2020-06-08 04:32:35.895680”, “stderr”: “”, “stderr_lines”: , “stdout”: “”, “stdout_lines”: }
TASK [Wait 15 seconds for disk initilization] **********************************
Monday 08 June 2020 16:32:41 +0000 (0:00:08.457) 0:33:47.898 ***********
Pausing for 15 seconds
(ctrl+C then ‘C’ = continue early, ctrl+C then ‘A’ = abort)
ok: [server1.fqdn] => {“changed”: false, “delta”: 15, “echo”: true, “rc”: 0, “start”: “2020-06-08 16:32:41.126472”, “stderr”: “”, “stdout”: “Paused for 15.0 seconds”, “stop”: “2020-06-08 16:32:56.126843”, “user_input”: “”}
TASK [Partition Disk 1] ********************************************************
Monday 08 June 2020 16:32:56 +0000 (0:00:15.051) 0:34:02.949 ***********
changed: [server4.fqdn] => {“changed”: true}
changed: [server1.fqdn] => {“changed”: true}
changed: [server2.fqdn] => {“changed”: true}
TASK [Format Disk 1 as E drive] ************************************************
Monday 08 June 2020 16:33:03 +0000 (0:00:06.888) 0:34:09.838 ***********
changed: [server4.fqdn] => {“changed”: true}
changed: [server1.fqdn] => {“changed”: true}
changed: [server2.fqdn] => {“changed”: true}
TASK [Stage AV Setup Binaries to e:\admin\binaries] ******************
Monday 08 June 2020 16:33:39 +0000 (0:00:24.463) 0:34:46.237 ***********
changed: [server4.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\AVAgent\”, “operation”: “folder_copy”, “size”: 27713762, “src”: “\\reposerver\Applications\Production\AV”}
changed: [server1.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\AVAgent\”, “operation”: “folder_copy”, “size”: 27713762, “src”: “\\reposerver\Applications\Production\AV”}
changed: [server2.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\AVAgent\”, “operation”: “folder_copy”, “size”: 27713762, “src”: “\\reposerver\Applications\Production\AV”}
TASK [Stage SecScan Setup Binaries to e:\admin\binaries] ***********************
Monday 08 June 2020 16:33:42 +0000 (0:00:03.402) 0:34:49.639 ***********
changed: [server1.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\SecScan\64bit”, “operation”: “folder_copy”, “size”: 23530139, “src”: “\\reposerver\Applications\Production\SecScan”}
changed: [server4.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\SecScan\64bit”, “operation”: “folder_copy”, “size”: 23530139, “src”: “\\reposerver\Applications\Production\SecScan”}
changed: [server2.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\SecScan\64bit”, “operation”: “folder_copy”, “size”: 23530139, “src”: “\\reposerver\Applications\Production\SecScan”}
TASK [Stage LAPS Setup Binaries to e:\admin\binaries] *************************
Monday 08 June 2020 16:33:46 +0000 (0:00:03.674) 0:34:53.314 ***********
fatal: [server1.fqdn]: UNREACHABLE! => {“changed”: false, “msg”: “kerberos: the specified credentials were rejected by the server”, “unreachable”: true}
changed: [server2.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\LAPSAgent\x64”, “operation”: “folder_copy”, “size”: 1019904, “src”: “\\reposerver\Applications\Production\Microsoft\LAPS”}
changed: [server4.fqdn] => {“changed”: true, “dest”: “e:\admin\binaries\LAPSAgent\x64”, “operation”: “folder_copy”, “size”: 1019904, “src”: “\\reposerver\Applications\Production\Microsoft\LAPS”}
TASK [Ensure LAPS is installed] ************************************************
Monday 08 June 2020 16:33:49 +0000 (0:00:03.291) 0:34:56.606 ***********
changed: [server4.fqdn] => {“changed”: true, “rc”: 0, “reboot_required”: false}
changed: [server2.fqdn] => {“changed”: true, “rc”: 0, “reboot_required”: false}
TASK [Ensure Agent is installed] **********************************************
Monday 08 June 2020 16:33:54 +0000 (0:00:04.571) 0:35:01.177 ***********
fatal: [server2.fqdn]: UNREACHABLE! => {“changed”: false, “msg”: “kerberos: the specified credentials were rejected by the server”, “unreachable”: true}
changed: [server4.fqdn] => {“changed”: true, “rc”: 0, “reboot_required”: false}
TASK [Ensure Agent is installed] ************************************************
Monday 08 June 2020 16:34:03 +0000 (0:00:09.009) 0:35:10.187 ***********
changed: [server4.fqdn] => {“changed”: true, “rc”: 0, “reboot_required”: false}
TASK [Ensure AV is installed] ******************************************
Monday 08 June 2020 16:34:08 +0000 (0:00:04.973) 0:35:15.161 ***********
fatal: [server4.fqdn]: UNREACHABLE! => {“changed”: false, “msg”: “kerberos: the specified credentials were rejected by the server”, “unreachable”: true}
I’m a bit new to the Linux world, is it possible this is a bug within something on the linux node I run ansible/ansible tower off of? I initially thought it was something with AD replication, but I can authenticate fine against these servers within minutes of them being added to the domain through normal windows/microsoft processes.
Thanks in advance for any advice!