Bug? Windows - Ansible uses LDAP user not ansible_ssh_user!?!

hi guys,

our control machine is configured so that we can login to the machine with our LDAP (windows) users. from there we run ansible playbooks.

here are some of the configurations we use:

[windows:vars]
ansible_ssh_user=[DeployUser]@[OurDomain]
ansible_ssh_pass=password
ansible_connection=winrm

the [DeployUser] is not the same as the LDAP user to login to the ansible control machine.

yet when running powershell modules on a windows machine we noticed that Ansible will use the LDAP user used to login to control machine and not the user configured in the hosts file on ansible_ssh_user.

from what i understand ansible should use the ansible_ssh_user on windows machine to do whatever but for us it uses the LDAP user???

anyone encounter this issue? please help!

thanks in advance

Not hit this- I’m not sure what you mean by ‘LDAP (windows) users’ but if you are logging in to your ansible controller using a windows domain user, and password then chances are you are using kerberos and ansible is then attempting to use your kerberos credentials to talk to your windows machines.

You don’t mention which OS you are running your ansible controller on but if you have krb5-workstation (yum package) or apt-get equivalent installed, you can run the command

klist

which will show any kerberos credentials you have. I suspect ansible is using these.

If I’m right then I think your options are

a/ use a local user on your windows machines (change ansible_ssh_user=some_local_user not a user@domain)

b/ log in to your ansible controller as a domain user with suitable privileges for whatever it is you need to do on your windows machines and change your ansible_ssh_user=domain_user_you_logged_in_to_ansible_as@DOMAIN )

Hope the above helps

Jon

LDAP user is a user in the active directory.

“and ansible is then attempting to use your kerberos credentials to talk to your windows machines.” - but we configured the "ansible_ssh_user| to a specific user and it is not using that user but the user logged in to the control machine…why is that?

The control machine is: Linux version 2.6.32-504.16.2.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) )

I think this is because when you logged into the machine, as part of the login process a kerberos ticket has been cached for the user you logged in as.

When ansible runs, the winrm connection plugin determines that you want to connect via kerberos (there is a bit of guessing going on here, from memory it is assumed you want to connect using kerberos based on having an @ in the ansible_ssh_user and having the python kerberos library loaded.

The actual authorisation is then handled by the kerberos library and since you have a kerberos ticket (as a result of logging in), I suspect it is using that.

If you can I suggest you install krb5-workstation and then log in as whichever user, then try running klist to see what tickets are cached for your user.

if you want to manually create a ticket for the other user, you can do that like this:

kinit user@FULLY.QUALIFIED.DOMAIN

(note domain name must be in upper case).

Does that clarify things at all?

yes it does, thank you.

does this not seem like a bug?

It is definitely surprising but not sure if necessarily its a bug in ansible.

Bear with me, I’ll try and explain why:

If you regard kerberos tickets (that let you access your windows domain machines) as being analogous to ssh keys (that let you access your linux machines) then ansible is at least consistent in that in both cases it is up to you to acquire the necessary thing (kerberos ticket, ssh key) and is not something that ansible manages (unless you make your playbooks do so).

In your case you have (inadvertently) acquired a kerberos ticket by logging in to your controller. That ticket is potentially a powerful thing as (unlike an ssh key) it may give you access to any host belonging to that domain. When ansible attempts to connect it finds you have a ticket for the domain and so it uses it.
This is pretty much exactly what you would expect if (lets pretend for a minute) your ansible controller was a windows machine. If you’d logged into host A as user1@DOMAIN and tried to pick up a file on a share from host B then windows would be using user1@DOMAIN credentials to see if you have permission to access the share on host B.

Thinking about your scenario in the same way you are logging in to your controller as user1@DOMAIN but then asking ansible to access your windows domain hosts as user2@DOMAIN.

I’m not saying what you are doing is wrong but I think Ansible should be flexible enough to cope with other scenarios as well.

For instance I value the ability to connect to more than one domain from my ansible controller, so for me logging in to the controller as a user on a particular domain isn’t a goal.

I get round the need to acquire kerberos tickets before I start connecting to anything by using a custom ansible callback plugin (findable on this list or ansible-devel I think if you are interested). The callback plugin has its limitations though (main one being not working with the ansible command, only ansible-playbook), and it would probably be a lot simpler to wrap up calls to ansible and ansible-playbook in a shell script that calls kinit before you start.

So yes it is surprising but when you think about the way the individual pieces involved work its really just a consequence of putting together technologies that weren’t designed to work together.

I’m curious to know if you just put
ansible_ssh_user: @DOMAIN
or even
ansible_ssh_user: @
in your vars whether the behaviour would be any different from what you are getting now.

That could actually be quite a nice way of reflecting that it is using whatever tickets are available, rather than what is specified.

By all means raise a bug report - I think we should at the minimum document how it behaves in your scenario. I’m just not sure ansible should be forced to acquire the kerberos ticket - what do others think?

Jon

First of all thanks for the detailed explanation and thoughts :slight_smile:

in our situation we want to allow user to login to the control machine with their windows credentials to be able to run ansible playbooks. yet we want the actual work of ansible to be done under a specific user within the active directory. that is the “deployment” user for that environment.

in any case i just did another run with only ‘@’ in the “ansible_ssh_user” variable. works the same.

the bottom line is that ansible can be told to use kerberos with the ‘@’ char, yet it assumes that the user supplied is the user holding a kerberos ticket. it does not validate this nor allow to use a different user but the user which has an available ticket.

BTW, how do i open a bug ticket?

Ok so in your case running kinit deployment@YOURDOMAIN beforehand would be a work around.

Just curious if you put

ansible_ssh_user: @SOMEOTHERDOMAIN

(where SOMEOTHERDOMAIN is not the domain you logged in as, does it still behave in the same way.

Also, I think others might be interested in setting things up like you, where your ansible controller is effectively partcipating in the domain. If you have any instructions on how to set that up for your platform please can you share (I’m trying to add more detail around windows set up at the moment).

Some instructions on how to raise bug reports here:http://docs.ansible.com/ansible/community.html#i-d-like-to-report-a-bug

the winrm connection plugin is part of the core language - make sure you use the issue template as well.

All the best

Jon

I’m experiencing a similar conundrum. Unfortunately kdestroy -A and kinit with my deploy user’s password still does not resolve the issue. Looks like I have to stick with the local credential authentication approach until I can pass different domain creds other than my own.

Mike,

I wonder if you can get round the issue by setting the KRB5CCNAME environment variable before you run the kinit as your deployment user so that the credentials you log in to the controlleras, and the credentials ansible uses are kept separate.

Jon

Hi everyone.
i work with Amir who is the OP here.
the issue is a bit more complicated than that:

we have a few windows server 2008 R2 that we would like to use the winrm module.
we have similar machines that some work and some dont. i compared the build of the machine, the build of the powershell and even local security policy. the result is still the same.
we use kerberos and winbind on the controller machine and since the winrm module work for windows 2012 and some of the 2008 R2 machines with the domain username, i am guessing the issue is not on the controller.

i though it was because it uses the ticket with the ldap user i logged into the controller machine but i am a member of the administrator group on the target machine and it still doesnt work.
if i create a local username and put it in the administrator group, the winrm work.

here is a machine that works:

WINRM RESULT <Response code 0, out “C:\Users\deploy_rn\A”, err “”>
PUT /tmp/tmpe8SQvn TO C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\win_ping
WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\win_ping.ps1 (offset=0 size=2035)
WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\win_ping.ps1 (offset=2035 size=2035)
WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\win_ping.ps1 (offset=4070 size=2035)
WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\win_ping.ps1 (offset=6105 size=602)
PUT /tmp/tmpsiY4YG TO C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\arguments
WINRM PUT /tmp/tmpsiY4YG to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\arguments (offset=0 size=2)
EXEC PowerShell -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -File C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\win_ping.ps1 C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\arguments; Remove-Item "C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762" -Force -Recurse;
WINRM EXEC ‘PowerShell’ [‘-NoProfile’, ‘-NonInteractive’, ‘-EncodedCommand’, ‘UABvAHcAZQByAFMAaABlAGwAbAAgAC0ATgBvAFAAcgBvAGYAaQBsAGUAIAAtAE4AbwBuAEkAbgB0AGUAcgBhAGMAdABpAHYAZQAgAC0ARQB4AGUAYwB1AHQAaQBvAG4AUABvAGwAaQBjAHkAIABVAG4AcgBlAHMAdAByAGkAYwB0AGUAZAAgAC0ARgBpAGwAZQAgAEMAOgBcAFUAcwBlAHIAcwBcAGQAZQBwAGwAbwB5AF8AcgBuAFwAQQBwAHAARABhAHQAYQBcAEwAbwBjAGEAbABcAFQAZQBtAHAAXABhAG4AcwBpAGIAbABlAC0AdABtAHAALQAxADQANAAxADAAMgAwADkAMgA2AC4AOAAtADEANwA4ADIANAA3ADcANQA3ADQANQA4ADcANgAyAFwAXAB3AGkAbgBfAHAAaQBuAGcALgBwAHMAMQAgAEMAOgBcAFUAcwBlAHIAcwBcAGQAZQBwAGwAbwB5AF8AcgBuAFwAQQBwAHAARABhAHQAYQBcAEwAbwBjAGEAbABcAFQAZQBtAHAAXABhAG4AcwBpAGIAbABlAC0AdABtAHAALQAxADQANAAxADAAMgAwADkAMgA2AC4AOAAtADEANwA4ADIANAA3ADcANQA3ADQANQA4ADcANgAyAFwAXABhAHIAZwB1AG0AZQBuAHQAcwA7ACAAUgBlAG0AbwB2AGUALQBJAHQAZQBtACAAIgBDADoAXABVAHMAZQByAHMAXABkAGUAcABsAG8AeQBfAHIAbgBcAEEAcABwAEQAYQB0AGEAXABMAG8AYwBhAGwAXABUAGUAbQBwAFwAYQBuAHMAaQBiAGwAZQAtAHQAbQBwAC0AMQA0ADQAMQAwADIAMAA5ADIANgAuADgALQAxADcAOAAyADQANwA3ADUANwA0ADUAOAA3ADYAMgBcACIAIAAtAEYAbwByAGMAZQAgAC0AUgBlAGMAdQByAHMAZQA7AA==’]
WINRM RESULT <Response code 0, out “{ “changed”: f”, err “”>
rnpl-qa1-bes01 | success >> {
“changed”: false,
“ping”: “pong”
}

here is one that doesnt work:

ESTABLISH WINRM CONNECTION FOR USER: on PORT 5986 TO rnpl-qa1-sts01
ESTABLISH WINRM CONNECTION FOR USER: on PORT 5986 TO rnpl-qa1-sts02
WINRM CONNECT: transport=kerberos endpoint=https://rnpl-qa1-sts01:5986/wsman
WINRM CONNECT: transport=kerberos endpoint=https://rnpl-qa1-sts02:5986/wsman
rnpl-qa1-sts01 | FAILED => the username/password specified for this server was incorrect
rnpl-qa1-sts02 | FAILED => the username/password specified for this server was incorrect

as soon as i remove the @DOMAIN from the host file, and use a local username, the winrm works.
i am probably missing a silly thing but i cant find it.
thanks

When you say ‘it works’ can you do more than a win_ping?

Server 2008 R2 comes with WMF 3.0 which had a bug when first released. Worth at least checking that you have either upgraded to WMF 4.0 or have installed hotfix (see blue box here: http://docs.ansible.com/ansible/intro_windows.html#windows-system-prep)

Not convinced this is the cause of the problem as if I recall the symptoms were different for me but worth ruling it out.

Jon

Ok some updates on this but first information:

Domain controller : 172.16.10.6
Ansible controller - 172.16.19.1
server that works (STS03) - 172.16.19.41

servers that DOESNT work (STS01) - 172.16.1.114

now if i try with a domain username to access from ansible to STS03 (that works), it is all good.
if i try with a domain username to access from ansible to STS01 (doesnt work) - i get the “server not found in kerberos database” and “username is incorrect”

now if i take the server that doesnt work and move it to the same network (172.16.19.42) near the server that works - everything is working on both servers.

as soon as it is in another vlan, the domain username doesnt work anylonger (a local username on the machine works anywhere).

so i suspected it is maybe something on the dc (in the firewall i have ANY to ANY on all 4 servers: DC, ansible , STS01 & STS 03).

i ran wireshark on the DC and ran against both servers:

when the ansible runs again the server INSIDE the network (STS03) i see this:
172.16.10.6 172.16.19.41 TCP 66 kerberos > 55200 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

172.16.10.6 172.16.19.41 TCP 54 kerberos > 55200 [RST, ACK] Seq=1441 Ack=1419 Win=0 Len=0

so it seems that the DC is working directly against the destination server.

BUT if i run the same winrm against the server in another VLAN i see this:
172.16.10.6 172.16.12.71 KRB5 176 KRB Error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN

172.16.10.6 172.16.12.71 TCP 54 kerberos > 60772 [RST, ACK] Seq=111 Ack=1441 Win=0 Len=0

it seems that when the destination server is in another VLAN, the kerberos is checked against the controller machine and not the destination server.

could i be on to something?