Swapping credentials

Hi Ansible Community!

I have a playbook running against windows servers. I have one play where I’m connecting as the local administrator, then a second play where I’m connecting as a domain user. I’m confused on how to do this. I’m running from Ansible Tower so I have the domain user as the machine credentials applied.

How do I tell the second play to use the domain account (machine credentials) after telling the first play to use the local admin account? Any help appreciated, im pretty new to Ansible.

  • hosts: serverA.internal.domain
    vars:
    ansible_user: Administrator
    ansible_password: XXXXXXXXXXXX
    gather_facts: no
    connection: winrm
    port: 5985

tasks:

  • debug:
    var: hostvars[inventory_hostname]
    verbosity: 1

  • hosts: serverA.internal.domain

vars:
ansible_user: ??machine credential??
ansible_password: XXXXXXXXXXXX

gather_facts: no
connection: winrm
port: 5985

What you have there is one way but by default WinRM only allows local administrators to connect to the host so you need to make sure you either the domain user is also a local admin or adjust the WinRM security to allow non-admins to connect.

Another option is to define the host twice in your inventory like so

`
[windows]
serverA_local ansible_host=serverA.internal.domain ansible_user=administrator ansible_password=pass
serverA_domain ansible_host=serverA.internal.domain ansible_user=DOMAIN\user ansible_password=pass

[windows:vars]
ansible_connection=winrm
ansible_port=5985

`

In your play you would set hosts: serverA_local for the local inventory entry and hosts: serverA_domain for the domain inventory.

Thanks

Jordan

Thanks Jordan, I think you kicked me in the right direction, but still missing something. I’m following your guidance somewhat, but I’m adding the inventory within the playbook instead of in the inventory:

  • name: add new host staging_domain to inventory
    add_host:
    name: staging_domain
    ansible_host: serverA.internal.domain
    ansible_user: ‘{{ ansible_user }}’
    ansible_password: ‘{{ ansible_password }}’
    ansible_connection: winrm
    ansible_port: 5985

  • name: add new host staging_localadmin to inventory
    add_host:
    name: staging_localadmin
    ansible_host: serverA.internal.domain
    ansible_user: Administrator
    ansible_password: ‘{{ randopass }}’
    ansible_connection: winrm
    ansible_port: 5985

The above works when I connect to staging_localadmin, but does NOT when I connect to staging_domain.

When connecting to staging_domain, I get:

plaintext: the specified credentials were rejected by the server

I’m running this from tower, so the {{ ansible_user }} and {{ ansible_password }} I’m passing staging_domain should be the machine credentials. I verified this with some debug statements.

Further troubleshooting makes this seem like it has something to do with time (GPO applying maybe?)

I can run another job with the same connection to staging_domain and eventually it starts working.

I’m still trying to figure it out, ill post back here if I find anything

I can’t tell what changes, but while ansible is trying to connect, it throws this error in the event log:

Log Name: System
Event ID: 10111
Level: Warning
Source: Microsoft-Windows-WinRM
Description:

User authentication using Basic Authentication scheme failed.

Unexpected error received from LogonUser 1326: %%1326

Plaintext means basic auth over http which is rejected by windows because it is not encrypted. Basic auth also does not work for domain accounts but unfortunately it is the default for backwards compatibility reasons when the username specified is not in the UPN format.

If you are connecting to a domain account you can set ansible_winrm_transport: ntlm to get you going but I highly recommend you get Kerberos auth working for domain accounts.

Acknowledged. I’ve been trying to stick with Kerberos now, but STILL having issues…

The machine credentials I use are serviceaccount@ALLUPPERCASE.DOMAIN and right after vmware_guest builds the VM, I try to continue on but now I get:

kerberos: the specified credentials were rejected by the server, plaintext: the specified credentials were rejected by the server

However, I still see the same behavior… I get that error, and minutes later I can run the job again and get past it. I’m able to logon to the server right after vwmare_guest finishes with the service account…

pullin my hair out here, not sure whats going on

The fact that you were able to get a Kerberos ticket showed that your host is set up to get the tickets correctly. Some things you should check

  • The domain account is a local admin, non admins can technically connect through WinRM but not by default. In any case Ansible is very limited with what it can do when connecting as a non-admin account so it’s not something we usually document

  • The time is synced between your Ansible controller and the Windows server

  • You aren’t using message encryption. This should be done automatically but some older libraries that Ansible uses may not have it available. To check set ‘ansible_winrm_message_encryption: always’ just to double check message encryption is available and works

Also you should set `ansible_winrm_transport: kerberos’ to stop the fallback to Basic auth. Unfortunately this is also another backwards compatibility issue which we can’t take away but isn’t something that is really optimal.

You can actually see kerberos failing within the same play… It will run various commands then just randomly run into one that it gets the kerberos error on.

This is what that play looks like in yaml:

tasks:

  • name: Ensure SMBv1 is disabled

win_optional_feature:
name: smb1protocol
state: absent

  • name: Initialize Disk 1
    win_shell: Initialize-Disk -Number 1
    ignore_errors: yes

  • name: Wait 15 seconds for disk initilization
    pause:
    seconds: 15

  • name: Partition Disk 1
    win_partition:
    drive_letter: E
    partition_size: -1
    disk_number: 1
    state: present
    ignore_errors: yes #Ignore errors because this module doesn’t handle idempotency well

  • name: Format Disk 1 as E drive
    win_format:
    drive_letter: E
    file_system: NTFS
    new_label: DATA
    ignore_errors: yes #Ignore errors because this module doesn’t handle idempotency well

  • name: Ensure SMBv1 is disabled
    win_optional_feature:
    name: smb1protocol
    state: absent

Thanks again for the help on this.

I double verified the machine credential is a domain admin, and verified that time is in-sync between the ansible tower host and the domain.

I’ll try setting ansible_winrm_transport: kerberos and ansible_winrm_message_encryption: always and see what happens

First run looks the same:

Second Run (from failure) gets further (?!?!)

I’ve taken to just brute-force running the same playbook over and over again until the issue goes away. I still suspect GPO or replication or time… or something

However - one clue - When the kerberos error happens, I see this generated in the log files:

Log Name: System
Source: Microsoft-Windows-WinRM
Event ID: 10154
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: hostname.internal.domain
Description:
The WinRM service failed to create the following SPNs: WSMAN/hostname.internal.domain; WSMAN/hostname.

Additional Data
The error received was 1355: %%1355.

User Action
The SPNs can be created by an administrator using setspn.exe utility.

If you have multiple DCs then potentially it could be replication at fault here but usually if a host is missing from the domain controller it queries then a different error is shown (service not found in the database).

Is the host you are connecting to sharing the same hostname as an older host that it’s potentially replacing? If so the SPN could be registered under the newer host on 1 DC but still not been replicated to another DC where it still thinks hostname is another host. Each host would technically have it’s own unique key and when the server goes to check the credentials it is unable to decrypt the secret because it’s using a different key than the one the DC thought it had (older host) and thus think the credentials were bad.

I think you got it figured out Jordan.

I tried with a object that didn’t previously exist and it worked.

I’ve been manually deleting the old computer objects beforehand, but I dont think I’ve been giving it enough time to replicate (our AD structure is messy/slow right now)

I’ll probably work a ‘delete computer object’ and ‘wait 5 minutes’ into my vm provisioning script (the one we’ve been working with here)

Appreciate the help once again!