Initial provisioning/user setup via root remote_user, continued CM via other remote_user?

When I order a new server from a hosting provider which doesn’t have images like AMIs or user-created Images, I generally get a minimal OS installation and a root user account.

The first thing I need to do on the server, before I can start securely configuring the server from an admin user account, and deploying an app to that server, is to create the admin user account with which I’ll do the rest of the work, and then disable password-based login and root SSH access.

Currently, I have two separate playbooks to accomplish these two separate tasks (first setting up the server/security minimally, second configuring the server and deploying an app).

Are there any better ways of doing this? Basically, I’d like to have a way of saying “if this is a new server/my admin user can’t connect, first run this set of plays as the root user, then continue on as the normal remote_user”.

Using Digital Ocean or AWS makes this a bit easier, as I can use Packer and create an initial image that already has the minimal base configuration… but I manage a lot of hosts from a lot of providers, and usually don’t have a way to manage fresh images.

There is of course the “user” keyword on a play to change users. You can have multiple plays in one playbook (and each can include lists of plays from other files, or can just be a list of them) so you don’t need to launch ansible-playbook three times.

(If using a cloud provider, what you really should do is look into the provisioning modules, and have a play to bring up new resources that also includes your list of configuration plays)

If using physical setups, you can do some basic configuration in kickstart/preseed, so your base systems come up correctly at the end of the install process. You could even use ansible inside the installation environment to do this.

The main difficulty I foresee, though, is that I want to use remote_user: “admin” normally—but that user won’t exist on the first run (many of the providers I use—LEB-type providers—don’t have the ability to use kickstart or anything like that… they just provision a VM with whatever image they have laying around :().

In this case, the first run would fail unless I manually change remote_user for the entire playbook to ‘root’ and then use ‘user’ for every included playbook/role/task to set the user for the particular play.

Or is there a conditional way, using gathered facts or something, to attempt to log in via the normal remote_user, and if that fails, drop to root?

So, I have a playbook set up with remote_user: admin, and the remote server only allows ‘root’ until the admin user is set up. If I add a ping task as the first task in the playbook (with failed_when: false and gather_facts: no), then I get the following:

`
PLAY [playbook] *****************************************************************

TASK: [Check if we can connect using ping module.] ****************************
fatal: [drupal] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
`

Is there some way, in a playbook, to have a ‘pre-pre-task’ or a way to catch an SSH connection error and set a flag based on that? Basically, I don’t want to fail after SSH connection error, but attempt to run a separate play as root… something along those lines.

Worst case, I’ll just keep doing what I’m doing (separate small playbook to configure admin user and SSH security that runs as root, and kick that playbook off by hand for each server provision). But it would be great if it were possible to provision and re-run a playbook on any hosting provider (besides the ones with nice APIs or kickstart abilities) with one playbook :slight_smile:

-Jeff

So I got quite a bit further, but was ultimately stymied by the fact that I can’t override a playbook/role variable or global variable using the set_fact module (at least, after 1.5/1.6). Here’s the code I had almost working:

`

There has been a discussion about a possible set_global.

Facts are less secure (machines may be less trustworthy) and need to stay in a particular lesser scope, so that’s why they don’t override globals – a fact could control what software gets installed, etc.

That would be perfect—and I understand why it’s not good for facts to be overriding globals. Plus, my use case here is probably not ideal in any way (though I know of more than few people who don’t have the luxury of working exclusively with providers that allow kickstart configs or any kind of prebuilt images).

Is there an issue or some other discussion I could track for set_global. Someday (I promise!) I’ll get some time to start contributing actual code to the project, rather than little docs fixes :stuck_out_tongue:

-Jeff

The discussion happens all over the list. This is part of that discussion.

Do you actually need to detect at all?
your first play, which sets up the admin user, could run every time. If set up to be properly idempotent then would you need to test for anything?

I am facing a similar issue. I would like a play to set up a user and a second play (in the same book) to start doing things as the created user. So far I have tried the following:

Do you actually need to detect at all?

If you want to be able to remove the original access (ssh as root, or perhaps a default user like ‘ubuntu’) then yes, I’d think you have to know whether the change has already been made or not. If you just want to add access, and not revoke the initial method, then perhaps no detection is necessary.

Anyway I think I may have just gotten this sorted for my own setup using a lot of the ideas here. Just did it today so I’d call it pre-alpha. Any suggestions for improvement welcome:

(We have a role for each IAAS provider we use, so these tasks come after the local_action call to the ec2 module, in our ‘ec2add’ role. This seems to be working with multi-instance calls (exact count>1) to the ec2 module…)

  • name: Attempt SSH as initialUser - succeeds only if user customization not yet done (also adds private IP(s) to known_hosts, when succeeds)
    shell: “ssh -o StrictHostKeyChecking=no -i …/ssh/{{ key_name }} {{ initialUser }}@{{ item.private_ip }} ‘exit’”
    with_items: ec2.tagged_instances
    when: wait == “yes”
    ignore_errors: yes
    register: ssh_attempt

  • name: Add instance(s) still requiring user customization to ‘host_users_customized_False’ group
    local_action: add_host hostname={{ item.item.private_ip }} groupname=“host_users_customized_False”
    with_items: ssh_attempt.results
    when: wait == “yes” and item.rc == 0

  • name: Re-SSH as initialUser using private DNS name(s) instead of IP(s), to also add them to known_hosts (skips if first attempt failed)
    shell: “ssh -o StrictHostKeyChecking=no -i …/ssh/{{ key_name }} {{ initialUser }}@{{ item.item.private_dns_name }} ‘exit’”
    with_items: ssh_attempt.results
    when: wait == “yes” and item.rc == 0
    ignore_errors: yes

Then in a later playbook I just target the group of hosts whose users aren’t yet customized (union’ed to a serverGroup sort of ec2 tag group, to prevent friendly fire).

-Mark