Unable to Authenticate to Satellite Datacenters Using Kerberos

Hiya everyone!

Let me preface this by saying I’m traditionally a Linux admin, and am nearly clueless about everything Microsoft, including Active Directory, so please bare with me here.

The problem is I’m trying to execute ansible WinRM commands against one of our datacenters, lets call it dc1, from a central datacenter, lets call it dc0. dc0 contains our primary Domain Controller (PROD.COMPANY.COM). dc1 contains a secondary Domain Controller (DC1.PROD.COMPANY.COM). My understanding is that when logging into a machine in dc1, DC1.PROD.COMPANY.COM forwards the login request to PROD.COMPANY.COM, and everything’s dandy. I have Kerberos setup on my executor box (located in dc0), and I can successfully run kinit user@PROD.COMPANY.COM and get a kerberos ticket. klist shows everything I would expect. However, when I run a command against a machine in the dc1 datacenter, I get back an error message that says:

Cannot contact any KDC for realm ‘DC1.PROD.COMPANY.COM

Do I need to have access to that secondary domain controller as well? All the auth is happening in dc0 against PROD.COMPANY.COM, so is that ticket not good enough? I am able to login to boxes in dc0, so that leads me to believe this is the case, but the IT guys are hesitant to open up that access unless I can confirm that, and even then they would really rather I proxy this request somehow (I see squid will let me do that, but I’d really rather not go through all that if I don’t have to. although we have an existing NGINX proxy, so if I can use that that would be a pretty big win).

Here’s my Kerberos setup:

[libdefaults]
default_realm = PROD.COMPANY.COM

[realms]
PROD.COMPANY.COM = {
kdc = prod.company.com
}

[domain_realm]
.prod.company.com = PROD.COMPANY.COM
prod.company.com = PROD.COMPANY.COM

And I’ve tried appending:

.dc1.prod.company.com = PROD.COMPANY.COM

to that “domain_realm” list, but to no avail.

The command I’m trying to run is:

ansible -i /opt/company/our-inventory-script osfamily_windows --limit datacenter_dc1 -m win_ping -vvv

And here’s the output:

hostname.dc1.prod.company.com | FAILED => {
“failed”: true,
“msg”: “Error! kerberos: ((‘Unspecified GSS failure. Minor code may provide more information.’, 851968), ("Cannot contact any KDC for realm ‘DC1.PROD.COMPANY.COM’", -1765328228)), ssh: 500 WinRMTransport. [Errono 111] Connection refused”
}
hostname.dc0.prod.company.com | SUCCESS => {
“changed”: false,
“ping”: “pong”
}

Any thoughts? I really feel like I’m running up against something silly, and I just don’t have the Kerberos/AD experience to catch it. Alternatively, does Ansible support NTLM at all? All my research says “not now, but maybe soon”, but that would get me up and going.

Thanks in advance!

Interesting. as far as I can see, you have the following:

Root domain/forest name: PROD.COMPANY.COM.
DC1 domain name: DC1.PROD.COMPANY.COM

Reading the output above, there’s a mention of both PROD.COMPANY.COM and DC0.COMPANY.COM so it’s hard to create a mental mab of your layout.

It would be helpful to know the actual DC names of your domains. In your krb file you just reference by domain name, but for testing it can be better to point to a named DC, which would be something like “SERVER1.DC1.PROD.COMPANY.COM” for your DC1 domain

In order to help, I’d need some more info:
-Domain Controller names as mentioned above
-Whether the DC1 domain is a child domain in the same forest (which implies an implicit 2-way trust between those domains), or if they are unrelated to each other
-The domain membersthip details of the nodes you want to manage

Your first step would be to obtain a Kerb ticket of a user that is a local admin in the DC1 domain (assuming that the target servers are members of that domain). You do need “local access” to the DC you wish to obtain a kerb ticket against, tho in your situation you should be able to allow a “DC0” user to be a local admin in the “DC1” domain, that should allow you to proceed without access from the Ansible control node to the DC1 domain controller.

Again, hard to help troubleshoot this without all the details.

Sorry for the delay, I’ve been out on vacation all last week.

First off, Trond, your blog post on Kerberos was what enabled me to get as far as I did, so thanks! When it comes to Windows + Ansible, you and Jhawksworth are my heros.

Here’s the hierarchy as I understand it, I’ll obfuscate way less this time. COMPANY is the only part that’s changed:

We have a primary domain controller that contains all of our users. That is:

PROD.COMPANY.MGT

We have 5 Datacenters, each datacenter contains a domain controller that is

DATACENTER_NAME.PROD.COMPANY.MGT

The Domain Controllers in the example would become:

DC1 → DC1.PROD.COMPANY.MGT
DC0 → TX1.PROD.COMPANY.MGT

And those maintain a trust with:

PROD.COMPANY.MGT

which contains our user data. This DC also happens to live in the TX1 datacenter. So in this example there are 3 domain controllers that should have a trust established between them. As far as if they are in the same forest, I’m 90% sure that is the case, but because of a misguided attempt at separation of duties, I’m able to actually SEE the AD configuration, and can only go off what they tell me.

So a total list of all Domain Controllers in the example would be:

  • PROD.COMPANY.MGT

  • DC1.PROD.COMPANY.MGT

  • TX1.PROD.COMPANY.MGT

From what I’ve gathered from talking to my Dad over vacation (He’s a Windows Admin), this is kind of a (to be nice) weird setup.

What I want to do is open a Kerb ticket with PROD.COMPANY.MGT, then authenticate to a machine that is bound to DC1.PROD.COMPANY.MGT. That appears to fail, since Kerberos then complains that there is no realm configured for DC1.PROD.COMPANY.MGT.

Now, here’s where stuff gets really dumb. I don’t have network access to DC1.PROD.COMPANY.MGT – which is why I want to just auth to PROD.COMPANY.MGT. I suspect this is the problem.

The reason I suspect that is the problem is because I DO have network access to TX1.PROD.COMPANY.MGT from my executor box, and everything works peachy when trying to run against a machine bound to that domain.

I think this will all kind of work once I get network access opened up, but at my job “DevOps” and traditional Operations are two different teams under different organizations, so I’m looking to see if my logic here is sound before I have to go fight for access.

Thanks again for your help, If you need any other information please let me know!

Hey,

I’m not sure I fully understand all the domains you have set up in your organisation, but…

You can definitely configure multiple realms in your /etc/krb5.conf and use kinit to acquire tickets for more than one domain at a time. I’ve done this enough to be confident in saying it works, provided you have network connectivity.

The other thing I’ll add here is… you might want, or in the end need, to organise things differently, for two reasons.

Ansible definitely works best when it is close, in networking terms, to the machines you want to control. I have managed to set up machines in a remote data center using ansible, but it was slow and I was at the mercy of other network traffic as to whether things would time out. This is going back a little while, and Ansible 2.0 transfers files to windows much more quickly, so perhaps less of a problem now, but I wound up having an ansible controller in each remote datacenter and have no intention of changing things.

In the end I had to arrange things this way as the network team had to say to no to my request to open up connections to certain data centers, even though they understood why I wanted to do so. Since my ansible configuration is available from source control, and source control is available to all data centers, once the ansible controllers were set up it was pretty straightforward to shift to connecting to a remote ansible for each data center.

To get a bit philosophical for a minute, ssh keys and kerberos tickets are different in their intents. An ssh key lets you in to a given machine, but the kerb ticket tries to say ‘I am this identity, let me do whatever I am allowed to on any machine that belongs to this domain’. So its can be a powerful thing and therefore those who administer domains have to be perhaps more wary about what’s allowed than in the ssh key world, where a key is (I think) only good for one host.

Hope this helps,

Jon

Jon,

As far as my current organization of things, yeah I’m starting to see that, which is a bummer. The intent was to place all the Ansible commands behind a button in Rundeck, so we could safely start handing out access to the less-operational focused people in the company, and centralizing it does make life easier. From previous experience I’ve been able to get away with this this at a way larger scale for Linux machines, but this is the first time I’ve had Windows in the mix and it seems to be fighting this architecture. sigh oh well, it was cool, but perhaps not as practical as I wanted. I do already have control machines in each datacenter, so I guess I’ll have Rundeck SSH between them and kick off the command – that’ll let me do each datacenter in parallel so the extra complexity gives me a little bit of a win either way.

This comparison between SSH keys and Kerb tickets is actually incredibly helpful. I kind of assumed they were more or less the same thing, which is why the networking limitation seemed so painful.

Well, looks like the answer to my original question may just be “Stop wanting to do that.” Thankfully with the holiday slowdown I can take some time and try and re-organize it into something more distributed.

I agree with what Jon said.

Though, the multi-domain thing with krb5 is interesting and something that I haven’t actually tested. However, if this truly is a single-forest model (or you have two-way trusts between your domains) there should be nothing stopping you from using kinit to grab a kerb ticket from the “DC0” domain and using that to control your “satelite” nodes - of course given that the “root domain identity” which the kerb ticket represents actually have (admin) rights on that node.

I’ll try and do some testing on this and get back to you.

Either this is a _very strange design or you might be mixing domain names with domain controller names in your writeup. Either way, it makes it a little hard to keep track of how stuff is organized.

Sorry for the difficulty, to be clear though, I’m not mixing up domain names/DNS entries with domain controllers. Each datacenter really does have its own domain controller. It was set up this way in a misguided attempt for datacenter encapsulation. The person who’s architected this all has 6-7 years of experience but all at the same company where he was forced to build everything out, so he’s never seen how other people set things up. This leaves me with lots of really weird setups that i have to tip-toe around. It’s an interesting, but tiring, challenge