After playing around with Ansible Automation platform and AWX for some time now, I decided to get under the hood and figure out how the platform operates at scale and how job are executed on remote nodes
Ansible-runner
According to the Ansibe-runner documentation βAnsible Runner is a tool and python library that helps when interfacing with Ansible directly or as part of another system whether that be through a container image interface, as a standalone tool, or as a Python module that can be imported. The goal is to provide a stable and consistent interface abstraction to Ansible. This allows Ansible to be embedded into other systems that donβt want to manage the complexities of the interface on their own (such as CI/CD platforms, Jenkins, or other automated tooling).β
Ansible runner is the underlying tool that is used in AWX & Ansible Automation Platform, its how the platform(s) interface with ansible directly. You can read more about ansible-runner on the ansible-runner doc site
To get started we are going to install ansible-runner pip module per the documentation.
pip3 install ansible-runner
We will also need podman installed for our container images that will be used by ansible-runner later.
sudo dnf -y install podman
Once we have ansible-runner & podman installed we are going to setup the project directory, the project directory is a bit different to a standard directory but one you see how its setup it and user it will make since. The the standard project directory for an ansible-runner project look like this.
.
βββ env
β βββ envvars
β βββ extravars
β βββ passwords
β βββ cmdline
β βββ settings
β βββ ssh_key
βββ inventory
β βββ hosts
βββ project
βββ test.yml
βββ roles
βββ testrole
βββ defaults
βββ handlers
βββ meta
βββ README.md
βββ tasks
βββ tests
βββ vars
For the sake of testing I created a simple project structure.
.
βββ env
β βββ settings
βββ inventory
β βββ hosts
βββ project
βββ site.yml
So looking at the simple folder structure we see that we have an inventory, a playbook, and a settings file. The playbook file is a simple debug message and the inventory is localhost for the sake of testing, but what is the settings file? The settings file and its content is for controlling the runner directly.
cat env/settings
process_isolation: true
container_image: quay.io/ansible/ansible-runner:latest
The first option in the settings file is the process_isolation option, process_isolation instructs ansible-runner to execute Ansible tasks inside a container environment. The second option is the container image or execution environment we want to run out ansible tasks in.
Once we have all our setting set lets check some things before we run our playbook, (Note if you have other container images on your system you may see them listed here)
podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
We check and we have no containers on the system, so now we are ready to run our test playbook using ansible-runner.
ansible-runner run . --playbook site.yml
PLAY [Print msg] ***************************************************************
TASK [Print a test message] ****************************************************
ok: [localhost] => {
"msg": "This is a test message"
}
PLAY RECAP *********************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Awesome, our job ran successfully, not let check back with podman,
podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/ansible/ansible-runner latest bec0dc171168 17 months ago 816 MB
Now we notice that the container image that we specified in the env/settings file has been pulled and our ansible tasks have been executed via that container image. But what is ansible-runner doing under the hood?
In the plumbing
So to see what ansible-runner is doing under the hood we are going to use the transmit and worker
transmit Send a job to a remote ansible-runner process
worker Execute work streamed from a controlling instance
We are going to take our ansible-runner command from before and substitute run with transmit, if you run the transmit by itself you get a jumbled mess of json so to make it easier to read we are going to use the jq command as well.
ansible-runner transmit . --playbook site.yml | jq
{
"kwargs": {
"ident": "85f43d3e1c2e420ea0017ee2a93f7f61",
"binary": null,
"playbook": "site.yml",
"module": null,
"module_args": null,
"host_pattern": null,
"verbosity": null,
"quiet": false,
"rotate_artifacts": 0,
"json_mode": false,
"omit_event_data": false,
"only_failed_event_data": false,
"inventory": null,
"forks": null,
"project_dir": null,
"artifact_dir": null,
"roles_path": null,
"process_isolation": null,
"process_isolation_executable": null,
"process_isolation_path": null,
"process_isolation_hide_paths": null,
"process_isolation_show_paths": null,
"process_isolation_ro_paths": null,
"container_image": "quay.io/ansible/ansible-runner:devel",
"container_volume_mounts": null,
"container_options": null,
"directory_isolation_base_path": null,
"cmdline": null,
"limit": null,
"suppress_env_files": false
}
}
{
"zipfile": 36136
}
As we can see from the output we see the arguments that we are passing to control ansible-runner, note the process-isolation and container image match our settings file. If we take it a step further and pipe the transmit to the ansible-runner worker command we will see more information.
ansible-runner transmit . --playbook site.yml | ansible-runner worker | jq
{
"status": "starting",
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"command": [
"podman",
"run",
"--rm",
"--tty",
"--interactive",
"--workdir",
"/runner/project",
"-v",
"/tmp/tmp3tda5n4n/:/runner/:Z",
"--env-file",
"/tmp/tmp3tda5n4n/artifacts/b47cec15743a4b03a69814beab9f2ca4/env.list",
"--quiet",
"--name",
"ansible_runner_b47cec15743a4b03a69814beab9f2ca4",
"quay.io/ansible/ansible-runner:latest",
"ansible-playbook",
"-i",
"/runner/inventory/hosts",
"site.yml"
],
"env": {
"ANSIBLE_UNSAFE_WRITES": "1",
"AWX_ISOLATED_DATA_DIR": "/runner/artifacts/b47cec15743a4b03a69814beab9f2ca4",
"ANSIBLE_CACHE_PLUGIN_CONNECTION": "/runner/artifacts/b47cec15743a4b03a69814beab9f2ca4/fact_cache",
"ANSIBLE_CALLBACK_PLUGINS": "/runner/artifacts/b47cec15743a4b03a69814beab9f2ca4/callback",
"ANSIBLE_STDOUT_CALLBACK": "awx_display",
"ANSIBLE_RETRY_FILES_ENABLED": "False",
"ANSIBLE_HOST_KEY_CHECKING": "False",
"ANSIBLE_CACHE_PLUGIN": "jsonfile",
"RUNNER_OMIT_EVENTS": "False",
"RUNNER_ONLY_FAILED_EVENTS": "False"
},
"cwd": "/runner/project"
}
{
"status": "running",
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4"
}
{
"uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"counter": 1,
"stdout": "",
"start_line": 0,
"end_line": 0,
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"event": "playbook_on_start",
"pid": 20,
"created": "2023-09-22T18:29:48.741357",
"event_data": {
"playbook": "site.yml",
"playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"uuid": "71184a63-51d8-4fea-aa42-911afd968e74"
}
}
{
"uuid": "faa46551-95a6-54c3-ce22-000000000006",
"counter": 2,
"stdout": "\r\nPLAY [Print msg] ***************************************************************",
"start_line": 0,
"end_line": 2,
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"event": "playbook_on_play_start",
"pid": 20,
"created": "2023-09-22T18:29:48.743348",
"parent_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"event_data": {
"playbook": "site.yml",
"playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"play": "Print msg",
"play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
"play_pattern": "localhost",
"name": "Print msg",
"pattern": "localhost",
"uuid": "faa46551-95a6-54c3-ce22-000000000006"
}
}
{
"uuid": "faa46551-95a6-54c3-ce22-000000000008",
"counter": 3,
"stdout": "\r\nTASK [Print a test message] ****************************************************",
"start_line": 2,
"end_line": 4,
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"event": "playbook_on_task_start",
"pid": 20,
"created": "2023-09-22T18:29:48.750619",
"parent_uuid": "faa46551-95a6-54c3-ce22-000000000006",
"event_data": {
"playbook": "site.yml",
"playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"play": "Print msg",
"play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
"play_pattern": "localhost",
"task": "Print a test message",
"task_uuid": "faa46551-95a6-54c3-ce22-000000000008",
"task_action": "ansible.builtin.debug",
"resolved_action": "ansible.builtin.debug",
"task_args": "",
"task_path": "/runner/project/site.yml:5",
"name": "Print a test message",
"is_conditional": false,
"uuid": "faa46551-95a6-54c3-ce22-000000000008"
}
}
{
"uuid": "bf5eee5e-6f01-4ebb-b456-b26506721803",
"counter": 4,
"stdout": "",
"start_line": 4,
"end_line": 4,
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"event": "runner_on_start",
"pid": 20,
"created": "2023-09-22T18:29:48.751339",
"parent_uuid": "faa46551-95a6-54c3-ce22-000000000008",
"event_data": {
"playbook": "site.yml",
"playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"play": "Print msg",
"play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
"play_pattern": "localhost",
"task": "Print a test message",
"task_uuid": "faa46551-95a6-54c3-ce22-000000000008",
"task_action": "ansible.builtin.debug",
"resolved_action": "ansible.builtin.debug",
"task_args": "",
"task_path": "/runner/project/site.yml:5",
"host": "localhost",
"uuid": "bf5eee5e-6f01-4ebb-b456-b26506721803"
}
}
{
"uuid": "79820614-a340-4ca1-9f88-508e214bb7a2",
"counter": 5,
"stdout": "\u001b[0;32mok: [localhost] => {\u001b[0m\r\n\u001b[0;32m \"msg\": \"This is a test message\"\u001b[0m\r\n\u001b[0;32m}\u001b[0m",
"start_line": 4,
"end_line": 7,
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"event": "runner_on_ok",
"pid": 20,
"created": "2023-09-22T18:29:48.758284",
"parent_uuid": "faa46551-95a6-54c3-ce22-000000000008",
"event_data": {
"playbook": "site.yml",
"playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"play": "Print msg",
"play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
"play_pattern": "localhost",
"task": "Print a test message",
"task_uuid": "faa46551-95a6-54c3-ce22-000000000008",
"task_action": "ansible.builtin.debug",
"resolved_action": "ansible.builtin.debug",
"task_args": "",
"task_path": "/runner/project/site.yml:5",
"host": "localhost",
"remote_addr": "localhost",
"res": {
"msg": "This is a test message",
"_ansible_verbose_always": true,
"_ansible_no_log": false,
"changed": false
},
"start": "2023-09-22T18:29:48.751294",
"end": "2023-09-22T18:29:48.758153",
"duration": 0.006859,
"event_loop": null,
"uuid": "79820614-a340-4ca1-9f88-508e214bb7a2"
}
}
{
"uuid": "d0cd1845-6268-475a-a909-66024991f5d7",
"counter": 6,
"stdout": "\r\nPLAY RECAP *********************************************************************\r\n\u001b[0;32mlocalhost\u001b[0m : \u001b[0;32mok=1 \u001b[0m changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ",
"start_line": 7,
"end_line": 11,
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
"event": "playbook_on_stats",
"pid": 20,
"created": "2023-09-22T18:29:48.762944",
"parent_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"event_data": {
"playbook": "site.yml",
"playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
"changed": {},
"dark": {},
"failures": {},
"ignored": {},
"ok": {
"localhost": 1
},
"processed": {
"localhost": 1
},
"rescued": {},
"skipped": {},
"artifact_data": {},
"uuid": "d0cd1845-6268-475a-a909-66024991f5d7"
}
}
{
"status": "successful",
"runner_ident": "b47cec15743a4b03a69814beab9f2ca4"
}
{
"zipfile": 29725
}
Now we can see the podman command that runner is executing with all the env settings we had before, but also we see the playbook run in json.
So now that we know how ansible-runner works better letβs use it with other tool that that helps cluster and scale
Receptor
The next tool we are going to look at is receptor. According to the docs, βReceptor is an overlay network intended to ease the distribution of work across a large and dispersed collection of workers. Receptor nodes establish peer-to-peer connections with each other via existing networks. Once connected, the receptor mesh provides datagram (UDP-like) and stream (TCP-like) capabilities to applications, as well as robust unit-of-work handling with resiliency against transient network failures.β
Receptor is the tool thatβs going to allow us to scale and create nodes which we can run jobs from. To get started we need to install a few packages first.
dnf install golang make git
This will install the golang and make tools to help compile the receptor binary and git will allow us to clone the repository from GitHub.
To get started we need to clone the repository:
git clone https://github.com/ansible/receptor.git
Once you have a clone of the repository, change into the receptor directory and run the make command to build the receptor binary.
make receptor
Once the binary is built letβs copy it to /usr/local/bin so we can use the command no matter what directory we are in.
Now that we have the receptor binary we now need the receptorctl tool
pip install receptorctl
Now that we have all the tools we need for receptor installed letβs start setting up the service to create our mesh.
On the node that you are going to designate as the control node, we need to setup the config file:
(note: you may need to create the receptor folder in /etc as it is not their by default)
vi /etc/receptor/receptor.conf
---
- node:
id: controller
- log-level:
level: Debug
- tcp-listener:
port: 2222
- control-service:
service: control
filename: /tmp/controller.sock
- work-command:
worktype: ansible-runner
command: ansible-runner
params: worker
allowruntimeparams: true
So what we have set up in this config file is the:
- Node has been set to controller
- We have set the logging level to debug
- We are listening for other nodes on port 2222
- The
control-service
allows the user to issue commands like βstatusβ or βwork submitβ to a receptor node. work-command
defines a type of work that can run on the node.- worktype User-defined name to give this work definition
- command The executable that is invoked when running this work
- params Command-line options passed to this executable
Now that we have the controller node setup lets set up the execution node, there is only a couple of changes from the controller config to the execution config
vi /etc/receptor/receptor.conf
---
- node:
id: execution
- log-level:
level: Debug
- tcp-peer:
address: $IP_OF_CONTROLLER:2222
- control-service:
service: control
- work-command:
worktype: ansible-runner
command: ansible-runner
params: worker
allowruntimeparams: true
Now that we have the config file all set up letβs start the service.
(note: It might be best to start the receptor service in the background or in a tmux session)
receptor -c /etc/receptor/receptor.conf
Once we have the service running on both the controller node and the execution node we can check if both systems are able to communicate to one another
receptorctl --socket /tmp/controller.sock status
Node ID: controller
Version: 1.4.1
System CPU Count: 2
System Memory MiB: 1763
Connection Cost
execution 1
Known Node Known Connections
controller execution: 1
execution controller: 1
Route Via
execution execution
Node Service Type Last Seen Tags
controller control Stream 2023-09-26 11:05:39 {'type': 'Control Service'}
execution control Stream 2023-09-26 11:05:24 {'type': 'Control Service'}
Node Work Types
controller ansible-runner
execution ansible-runner
Excellent now that both nodes can communicate we can move on to the final step
(note: if nodes are not connecting to each other please check local and network firewalls to ensure traffic is not blocked)
Itβs all coming together
Now that we have both ansible-runner running our jobs via execution environments and receptor communicating with our nodes, letβs put the two tools together. First lets take out ansible-runner command from earlier and modify it to work with receptor.
ansible-runner transmit runner -p site.yml | receptorctl --socket /var/run/receptor/receptor.sock work submit -f --node execution -p - ansible-runner | ansible-runner process runner
So you will notice we now are piping the ansible-runner transmit command to receptorctl. Lets break down whatβs going on with the receptorctl command:
- work Commands related to unit-of-work processing
- submit Submit a new unit of work.
- -f, --follow Remain attached to the job and print its results to stdout
- βnode TEXT Receptor node to run the work on.
- -p, --payload TEXT File containing unit of work data. Use - for stdin.
- WORKTYPE
- submit Submit a new unit of work.
So what we are doing is passing the ansible-runner transmit to the stdin for the receptorctl command and telling receptorctl that this workload is an ansible-runner workload based on what we specified in the receptor config earlier. The ansible-runner process
command receives the output of remote ansible-runner work and distributes the results.
If everything is setup correctly and your command runs successfully you should notice on your execution node that the ansible-runner:latest
execution environment under your podman images and you should have had the playbook output on your controllers terminal indicating everything was successful.
podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/ansible/ansible-runner latest bec0dc171168 17 months ago 816 MB
Notes
- Please read the ansible-runner documentation on the env section, this section might help clear up how AWX and AAP are passing variables, ssh-keys, passwords, etc to execution environments.
- The more nodes you have the more network or local firewall rules you might have to change / modify, please keep this in mind in more secure environments.
- The work I did in the document is a reflection of my own work and not a reflection of the AWX project or Ansible Automation Platform, some of these processes and steps might differ or be incorrect from what the AWX project or Ansible Automation Platform are doing.