Hi all, help needed with an AWX 19.2.1 Kerberos issue that is driving me crazy. My setup is K8s (one master and 2 worker nodes) on 10.0.0.0/16 subnet and I have my domain controller on my internal network 192.168.5.0/24 (along with all my other servers Linux and Windows). AWX is setup to use metallb load balancer and have an IP on the 192.168.5.0 subnet. No issues connecting to the Web UI, and all my linux tests and playbooks works fine for linux servers residing on my internal network. For a while now I have been trying to get kerberos to work but I keep getting the following error when I try to do a win_ping to any of my windows servers (all residing on the 192.168.5.0/24 subnet):
Kerberos auth failure for principal @ with pexpect: Cannot find KDC for realm "" while getting initial credentials
All my containers inside the AWX pod have krb5.conf set to use my domain (UPPERCASE) and they also have my internal DNS servers in resolv.conf. From the containers I have no problems pinging servers on my internal network (192.168.0.0), and even using kinit @ works - I do get a kerberos ticket. However, when I try to run a win_ping from the web interface I get the error shown above.
The Execution Environment is v 0.4.0 (also tried with my own customized EE)
Other than the use of Metallb LB, and bringing in krb5.conf and a resolv.conf for DNS on my internal LAN - everything is pretty much standard.
Here is my krb5.conf file:
To opt out of the system crypto-policies configuration of krb5, remove the
symlink at /etc/krb5.conf.d/crypto-policies which will not be recreated.
I tried removing that line from the krb5.conf file - but still getting the same error running win_ping.
One additional piece of information - that probably doesn’t make any difference, in my kubernetes environment I use Cilium for the networking. Given that I do get a kerberos ticket using kinit from the AWX EE container - and that everything else I run in the environment works as expected, I don’t think the networking is the issue.
Not sure whether it helps but have a look at the following.
I had a similar issue in the past but that has to do with the K8s cluster failing to resolve DNS names. And the error also somewhat suggests that, AWX is unable to find the KDC for realm “DOMAIN”.
So I’d suggest you to check and ensure the K8s DNS is working fine and able to resolve FQDN.
The second thing would be to check the krb5.conf file of the worker nodes on which AWX containers run. Try running the kinit user@DOMAIN and see if it is successful or not.
And my frustration is growing - having finally spent some time on this issue again. I have tried just about every suggestion, but still not getting this to work.
What I have done since I initially posted this - is to upgrade to AWX 19.5.0. I have played around with the krb5.conf settings based on many of the comments I have received - but I am still getting the same error when I try to run a win_ping (either thru Run Command or a playbook) on my domain joined windows servers.
The error is still:
Kerberos auth failure for principal @ with pexpect: Cannot find KDC for realm "" while getting initial credentials
What does work is if I go into the EE pod and run a kinit to get a kerberos ticket with the same user I use in the GUI. And then run “ansible <server.DOMAIN> -m win_ping -i ” - I get a successful results. But running the same thing thru the Web interface - I get the error above.
Given the success running it from the actual EE pod - tells me DNS is working, and there are no firewalls blocking anything.
Bottom line - I have no clue why this does not work using the Web interface.
Would be greatly appreciated if anyone have any suggestions, or ideas as on how to resolve this.
As I mentioned - running KINIT and win_ping in the EE pod works like a charm. And I look at my domain controller and see the events I expect to see in the event logs. However, trying to run win_ping from the UI - I see nothing logged in event viewer on the domain controller, not even errors.
My question is - what is the difference between the EE pod, and the temp pod that awx spins up on execution from the UI? Shouldn’t the temp pod be the same as the EE pod, in regards to krb5.conf, resolv.conf, installed python modules etc etc?
I really have no idea what to try next. My old AWX install - version 17, running on Docker works just fine.
If anybody have any ideas, on commands to run, config settings etc - let me know where/how to run it
I am still having the same issue. From the EE pod I run a KINIT with my AD user, then I can run a successful ad-hoc command from there against a domain joined Windows server.
As soon as I use the Web interface - I get the same error as I have always gotten, it can’t find the KDC.
And again - I have an installation of AWX 17, where it works like a charm.
The only thing I can think is that your DNS name resolution within the pod is somehow broken. I suggest trying to use ping and dig/nslookup inside your pod.
I am not sure - but given that I can successfully connect using Kerberos from the EE pod, why would the configuration of temp pod spun up during execution be any different? Is there any way to configure AWX to use the actual EE pod that is running instead of spinning up a temporary pod? If so - how can I do that?
Used extra_volumes mount option in the AWX instance configuration
Only difference with our EE pod is that it’s using a custom docker image(had to do this to add custom CA and LDAP utilities for our jobs) instead of the default one and the same image is used for the temp automation/EE pod as well