My CA issues certificates for my TLS-encrypted connections. For example, for rsyslog, stunnel, or Prometheus node exporter. I configure all these services with Ansible roles. However, I have a fundamental problem with the process, especially when a service is run root-less.
If my playbook obtains the certificate first, the user with which the service is executed is not yet known. However, this user must become the owner of the certificate key files.
Conversely, the role for the service cannot be fully executed because a certificate that does not yet exist would lead to a runtime error.
Without Ansible as an administrator, I would perform the process in three steps:
I’m not sure I 100% understand what you’re describing because the scenario you’re outlining is a bit too generic to really grasp what your issue is, and from what I gather you already have the solution in your post.
My understanding is that you want to have an Ansible playbook install an application or service, then after install, you want to install CA-signed SSL certs for the app, but you’re running in to a chicken-and-egg scenario regarding certificate ownership since the service-specific accounts don’t exist until the app is installed.
Without Ansible as an administrator, I would perform the process in three steps:
Install and configure the service.
Obtain and install the certificate.
Configure the certificate files for the service.
As far as I can figure, you’d do this same step-by-step in an Ansible playbook. Hence my confusion as to what your question is, as it seems like you already have the answer. You’d probably want to run some checks before obtaining and installing the cert just so you don’t send the same cert to your CA to be signed in order to keep idempotency in Ansible.
I don’t know how to implement this in the playbook. Until now, I have always integrated all roles one after the other so that each role is executed exactly once:
The role x509 obtains a server certificate and assigns the key file to the group ssl-cert and permissions u=rw,g=r. The role node_exporter creates the user node_exp, configures the path to the server certificate, and starts the service. An error occurs because the user node_exp does not yet belong to the group ssl-cert.
The node exporter is first configured without TLS encryption. Then a certificate is obtained. The third step would be to switch the node_exporter role from unencrypted to TLS. But how does that work? In my inventory, I only have one set of variables for exactly one point in time.
I am very eager to hear your response, especially since I have encountered this problem before in similar cases.
You can cheat a little bit by pre-creating the node-exp user (and group). Looking at the code of prometheus.prometheus.node_exporter role, the user is created with this simple task:
If we resolve the variables and their default values, this becomes:
- name: "Create system user node-exp"
ansible.builtin.user:
name: "node-exp"
system: true
shell: "/usr/sbin/nologin"
group: "node-exp"
home: "/etc/node_exporter" # You should verify this, I'm not sure if I traced the value correctly
create_home: false
Your myroles.x509 role should change the ownership of SSL private key and cert to the node-exp user that now exists. prometheus.prometheus.node_exporter will just confirm that the user already exists and do nothing additional (idempotence).
Thank you for your explanation. But I had already thought of that idea. However, it’s just a cheat. And it doesn’t answer my question.
Another idea was to work with the ansible.builtin.include_role module (not tested). I fear there will be a ping pong effect between the first and second include of the role:
I exclusively use “include_role” for the reasons you’re encountering, it gives greater control over when and how a role is executed and with what vars and all that. I didn’t even know you could do “roles:” to include them in bulk like you showed. I can see how that is problematic for going “back” and reconfiguring an installed app.
The roles keyword has been there from the start of ansible. I don’t know if it is officially deprecated but IMHO it should be. The order of execution is always confusing users when combined with tasks, pre_tasks, and post_tasks. While that is only confusing, the impossibility of running tasks between roles is (for me, at least) a reason to stay away from the roles keyword