Ansible-runner and receptor: The hard way

After playing around with Ansible Automation platform and AWX for some time now, I decided to get under the hood and figure out how the platform operates at scale and how job are executed on remote nodes

Ansible-runner

According to the Ansibe-runner documentation β€œAnsible Runner is a tool and python library that helps when interfacing with Ansible directly or as part of another system whether that be through a container image interface, as a standalone tool, or as a Python module that can be imported. The goal is to provide a stable and consistent interface abstraction to Ansible. This allows Ansible to be embedded into other systems that don’t want to manage the complexities of the interface on their own (such as CI/CD platforms, Jenkins, or other automated tooling).”

Ansible runner is the underlying tool that is used in AWX & Ansible Automation Platform, its how the platform(s) interface with ansible directly. You can read more about ansible-runner on the ansible-runner doc site

To get started we are going to install ansible-runner pip module per the documentation.

pip3 install ansible-runner

We will also need podman installed for our container images that will be used by ansible-runner later.

sudo dnf -y install podman

Once we have ansible-runner & podman installed we are going to setup the project directory, the project directory is a bit different to a standard directory but one you see how its setup it and user it will make since. The the standard project directory for an ansible-runner project look like this.

.
β”œβ”€β”€ env
β”‚   β”œβ”€β”€ envvars
β”‚   β”œβ”€β”€ extravars
β”‚   β”œβ”€β”€ passwords
β”‚   β”œβ”€β”€ cmdline
β”‚   β”œβ”€β”€ settings
β”‚   └── ssh_key
β”œβ”€β”€ inventory
β”‚   └── hosts
└── project
    β”œβ”€β”€ test.yml
    └── roles
        └── testrole
            β”œβ”€β”€ defaults
            β”œβ”€β”€ handlers
            β”œβ”€β”€ meta
            β”œβ”€β”€ README.md
            β”œβ”€β”€ tasks
            β”œβ”€β”€ tests
            └── vars

For the sake of testing I created a simple project structure.

.
β”œβ”€β”€ env
β”‚   └── settings
β”œβ”€β”€ inventory
β”‚   └── hosts
└── project
    └── site.yml

So looking at the simple folder structure we see that we have an inventory, a playbook, and a settings file. The playbook file is a simple debug message and the inventory is localhost for the sake of testing, but what is the settings file? The settings file and its content is for controlling the runner directly.

cat env/settings 
process_isolation: true
container_image: quay.io/ansible/ansible-runner:latest

The first option in the settings file is the process_isolation option, process_isolation instructs ansible-runner to execute Ansible tasks inside a container environment. The second option is the container image or execution environment we want to run out ansible tasks in.

Once we have all our setting set lets check some things before we run our playbook, (Note if you have other container images on your system you may see them listed here)

podman images
REPOSITORY  TAG         IMAGE ID    CREATED     SIZE

We check and we have no containers on the system, so now we are ready to run our test playbook using ansible-runner.

ansible-runner run . --playbook site.yml

PLAY [Print msg] ***************************************************************

TASK [Print a test message] ****************************************************
ok: [localhost] => {
    "msg": "This is a test message"
}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Awesome, our job ran successfully, not let check back with podman,

podman images
REPOSITORY                      TAG         IMAGE ID      CREATED        SIZE
quay.io/ansible/ansible-runner  latest      bec0dc171168  17 months ago  816 MB

Now we notice that the container image that we specified in the env/settings file has been pulled and our ansible tasks have been executed via that container image. But what is ansible-runner doing under the hood?

In the plumbing

So to see what ansible-runner is doing under the hood we are going to use the transmit and worker

transmit            Send a job to a remote ansible-runner process
worker              Execute work streamed from a controlling instance

We are going to take our ansible-runner command from before and substitute run with transmit, if you run the transmit by itself you get a jumbled mess of json so to make it easier to read we are going to use the jq command as well.

ansible-runner transmit . --playbook site.yml | jq
{
  "kwargs": {
    "ident": "85f43d3e1c2e420ea0017ee2a93f7f61",
    "binary": null,
    "playbook": "site.yml",
    "module": null,
    "module_args": null,
    "host_pattern": null,
    "verbosity": null,
    "quiet": false,
    "rotate_artifacts": 0,
    "json_mode": false,
    "omit_event_data": false,
    "only_failed_event_data": false,
    "inventory": null,
    "forks": null,
    "project_dir": null,
    "artifact_dir": null,
    "roles_path": null,
    "process_isolation": null,
    "process_isolation_executable": null,
    "process_isolation_path": null,
    "process_isolation_hide_paths": null,
    "process_isolation_show_paths": null,
    "process_isolation_ro_paths": null,
    "container_image": "quay.io/ansible/ansible-runner:devel",
    "container_volume_mounts": null,
    "container_options": null,
    "directory_isolation_base_path": null,
    "cmdline": null,
    "limit": null,
    "suppress_env_files": false
  }
}
{
  "zipfile": 36136
}

As we can see from the output we see the arguments that we are passing to control ansible-runner, note the process-isolation and container image match our settings file. If we take it a step further and pipe the transmit to the ansible-runner worker command we will see more information.

ansible-runner transmit . --playbook site.yml | ansible-runner worker | jq
{
  "status": "starting",
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "command": [
    "podman",
    "run",
    "--rm",
    "--tty",
    "--interactive",
    "--workdir",
    "/runner/project",
    "-v",
    "/tmp/tmp3tda5n4n/:/runner/:Z",
    "--env-file",
    "/tmp/tmp3tda5n4n/artifacts/b47cec15743a4b03a69814beab9f2ca4/env.list",
    "--quiet",
    "--name",
    "ansible_runner_b47cec15743a4b03a69814beab9f2ca4",
    "quay.io/ansible/ansible-runner:latest",
    "ansible-playbook",
    "-i",
    "/runner/inventory/hosts",
    "site.yml"
  ],
  "env": {
    "ANSIBLE_UNSAFE_WRITES": "1",
    "AWX_ISOLATED_DATA_DIR": "/runner/artifacts/b47cec15743a4b03a69814beab9f2ca4",
    "ANSIBLE_CACHE_PLUGIN_CONNECTION": "/runner/artifacts/b47cec15743a4b03a69814beab9f2ca4/fact_cache",
    "ANSIBLE_CALLBACK_PLUGINS": "/runner/artifacts/b47cec15743a4b03a69814beab9f2ca4/callback",
    "ANSIBLE_STDOUT_CALLBACK": "awx_display",
    "ANSIBLE_RETRY_FILES_ENABLED": "False",
    "ANSIBLE_HOST_KEY_CHECKING": "False",
    "ANSIBLE_CACHE_PLUGIN": "jsonfile",
    "RUNNER_OMIT_EVENTS": "False",
    "RUNNER_ONLY_FAILED_EVENTS": "False"
  },
  "cwd": "/runner/project"
}
{
  "status": "running",
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4"
}
{
  "uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
  "counter": 1,
  "stdout": "",
  "start_line": 0,
  "end_line": 0,
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "event": "playbook_on_start",
  "pid": 20,
  "created": "2023-09-22T18:29:48.741357",
  "event_data": {
    "playbook": "site.yml",
    "playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
    "uuid": "71184a63-51d8-4fea-aa42-911afd968e74"
  }
}
{
  "uuid": "faa46551-95a6-54c3-ce22-000000000006",
  "counter": 2,
  "stdout": "\r\nPLAY [Print msg] ***************************************************************",
  "start_line": 0,
  "end_line": 2,
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "event": "playbook_on_play_start",
  "pid": 20,
  "created": "2023-09-22T18:29:48.743348",
  "parent_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
  "event_data": {
    "playbook": "site.yml",
    "playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
    "play": "Print msg",
    "play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
    "play_pattern": "localhost",
    "name": "Print msg",
    "pattern": "localhost",
    "uuid": "faa46551-95a6-54c3-ce22-000000000006"
  }
}
{
  "uuid": "faa46551-95a6-54c3-ce22-000000000008",
  "counter": 3,
  "stdout": "\r\nTASK [Print a test message] ****************************************************",
  "start_line": 2,
  "end_line": 4,
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "event": "playbook_on_task_start",
  "pid": 20,
  "created": "2023-09-22T18:29:48.750619",
  "parent_uuid": "faa46551-95a6-54c3-ce22-000000000006",
  "event_data": {
    "playbook": "site.yml",
    "playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
    "play": "Print msg",
    "play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
    "play_pattern": "localhost",
    "task": "Print a test message",
    "task_uuid": "faa46551-95a6-54c3-ce22-000000000008",
    "task_action": "ansible.builtin.debug",
    "resolved_action": "ansible.builtin.debug",
    "task_args": "",
    "task_path": "/runner/project/site.yml:5",
    "name": "Print a test message",
    "is_conditional": false,
    "uuid": "faa46551-95a6-54c3-ce22-000000000008"
  }
}
{
  "uuid": "bf5eee5e-6f01-4ebb-b456-b26506721803",
  "counter": 4,
  "stdout": "",
  "start_line": 4,
  "end_line": 4,
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "event": "runner_on_start",
  "pid": 20,
  "created": "2023-09-22T18:29:48.751339",
  "parent_uuid": "faa46551-95a6-54c3-ce22-000000000008",
  "event_data": {
    "playbook": "site.yml",
    "playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
    "play": "Print msg",
    "play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
    "play_pattern": "localhost",
    "task": "Print a test message",
    "task_uuid": "faa46551-95a6-54c3-ce22-000000000008",
    "task_action": "ansible.builtin.debug",
    "resolved_action": "ansible.builtin.debug",
    "task_args": "",
    "task_path": "/runner/project/site.yml:5",
    "host": "localhost",
    "uuid": "bf5eee5e-6f01-4ebb-b456-b26506721803"
  }
}
{
  "uuid": "79820614-a340-4ca1-9f88-508e214bb7a2",
  "counter": 5,
  "stdout": "\u001b[0;32mok: [localhost] => {\u001b[0m\r\n\u001b[0;32m    \"msg\": \"This is a test message\"\u001b[0m\r\n\u001b[0;32m}\u001b[0m",
  "start_line": 4,
  "end_line": 7,
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "event": "runner_on_ok",
  "pid": 20,
  "created": "2023-09-22T18:29:48.758284",
  "parent_uuid": "faa46551-95a6-54c3-ce22-000000000008",
  "event_data": {
    "playbook": "site.yml",
    "playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
    "play": "Print msg",
    "play_uuid": "faa46551-95a6-54c3-ce22-000000000006",
    "play_pattern": "localhost",
    "task": "Print a test message",
    "task_uuid": "faa46551-95a6-54c3-ce22-000000000008",
    "task_action": "ansible.builtin.debug",
    "resolved_action": "ansible.builtin.debug",
    "task_args": "",
    "task_path": "/runner/project/site.yml:5",
    "host": "localhost",
    "remote_addr": "localhost",
    "res": {
      "msg": "This is a test message",
      "_ansible_verbose_always": true,
      "_ansible_no_log": false,
      "changed": false
    },
    "start": "2023-09-22T18:29:48.751294",
    "end": "2023-09-22T18:29:48.758153",
    "duration": 0.006859,
    "event_loop": null,
    "uuid": "79820614-a340-4ca1-9f88-508e214bb7a2"
  }
}
{
  "uuid": "d0cd1845-6268-475a-a909-66024991f5d7",
  "counter": 6,
  "stdout": "\r\nPLAY RECAP *********************************************************************\r\n\u001b[0;32mlocalhost\u001b[0m                  : \u001b[0;32mok=1   \u001b[0m changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   ",
  "start_line": 7,
  "end_line": 11,
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4",
  "event": "playbook_on_stats",
  "pid": 20,
  "created": "2023-09-22T18:29:48.762944",
  "parent_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
  "event_data": {
    "playbook": "site.yml",
    "playbook_uuid": "71184a63-51d8-4fea-aa42-911afd968e74",
    "changed": {},
    "dark": {},
    "failures": {},
    "ignored": {},
    "ok": {
      "localhost": 1
    },
    "processed": {
      "localhost": 1
    },
    "rescued": {},
    "skipped": {},
    "artifact_data": {},
    "uuid": "d0cd1845-6268-475a-a909-66024991f5d7"
  }
}
{
  "status": "successful",
  "runner_ident": "b47cec15743a4b03a69814beab9f2ca4"
}
{
  "zipfile": 29725
}

Now we can see the podman command that runner is executing with all the env settings we had before, but also we see the playbook run in json.

So now that we know how ansible-runner works better let’s use it with other tool that that helps cluster and scale

Receptor

The next tool we are going to look at is receptor. According to the docs, β€œReceptor is an overlay network intended to ease the distribution of work across a large and dispersed collection of workers. Receptor nodes establish peer-to-peer connections with each other via existing networks. Once connected, the receptor mesh provides datagram (UDP-like) and stream (TCP-like) capabilities to applications, as well as robust unit-of-work handling with resiliency against transient network failures.”

Receptor is the tool that’s going to allow us to scale and create nodes which we can run jobs from. To get started we need to install a few packages first.

dnf install golang make git

This will install the golang and make tools to help compile the receptor binary and git will allow us to clone the repository from GitHub.

To get started we need to clone the repository:

git clone https://github.com/ansible/receptor.git

Once you have a clone of the repository, change into the receptor directory and run the make command to build the receptor binary.

make receptor

Once the binary is built let’s copy it to /usr/local/bin so we can use the command no matter what directory we are in.

Now that we have the receptor binary we now need the receptorctl tool

pip install receptorctl

Now that we have all the tools we need for receptor installed let’s start setting up the service to create our mesh.

On the node that you are going to designate as the control node, we need to setup the config file:
(note: you may need to create the receptor folder in /etc as it is not their by default)

vi /etc/receptor/receptor.conf
---
- node:
    id: controller

- log-level:
    level: Debug

- tcp-listener:
    port: 2222

- control-service:
    service: control
    filename: /tmp/controller.sock

- work-command:
	worktype: ansible-runner
	command: ansible-runner
	params: worker
	allowruntimeparams: true

So what we have set up in this config file is the:

  • Node has been set to controller
  • We have set the logging level to debug
  • We are listening for other nodes on port 2222
  • The control-service allows the user to issue commands like β€œstatus” or β€œwork submit” to a receptor node.
  • work-command defines a type of work that can run on the node.
    • worktype User-defined name to give this work definition
    • command The executable that is invoked when running this work
    • params Command-line options passed to this executable

Now that we have the controller node setup lets set up the execution node, there is only a couple of changes from the controller config to the execution config

vi /etc/receptor/receptor.conf
---
- node:
    id: execution

- log-level:
    level: Debug

- tcp-peer:
    address: $IP_OF_CONTROLLER:2222

- control-service:
    service: control

- work-command:
	worktype: ansible-runner
	command: ansible-runner
	params: worker
	allowruntimeparams: true

Now that we have the config file all set up let’s start the service.
(note: It might be best to start the receptor service in the background or in a tmux session)

receptor -c /etc/receptor/receptor.conf

Once we have the service running on both the controller node and the execution node we can check if both systems are able to communicate to one another

receptorctl --socket /tmp/controller.sock status
Node ID: controller
Version: 1.4.1
System CPU Count: 2
System Memory MiB: 1763

Connection   Cost
execution    1

Known Node   Known Connections
controller   execution: 1 
execution    controller: 1 

Route        Via
execution    execution

Node         Service   Type       Last Seen             Tags
controller   control   Stream     2023-09-26 11:05:39   {'type': 'Control Service'}
execution    control   Stream     2023-09-26 11:05:24   {'type': 'Control Service'}

Node         Work Types
controller   ansible-runner
execution    ansible-runner

Excellent now that both nodes can communicate we can move on to the final step
(note: if nodes are not connecting to each other please check local and network firewalls to ensure traffic is not blocked)

It’s all coming together

Now that we have both ansible-runner running our jobs via execution environments and receptor communicating with our nodes, let’s put the two tools together. First lets take out ansible-runner command from earlier and modify it to work with receptor.

ansible-runner transmit runner -p site.yml | receptorctl --socket /var/run/receptor/receptor.sock work submit -f --node execution -p - ansible-runner | ansible-runner process runner

So you will notice we now are piping the ansible-runner transmit command to receptorctl. Lets break down what’s going on with the receptorctl command:

  • work Commands related to unit-of-work processing
    • submit Submit a new unit of work.
      • -f, --follow Remain attached to the job and print its results to stdout
      • –node TEXT Receptor node to run the work on.
      • -p, --payload TEXT File containing unit of work data. Use - for stdin.
      • WORKTYPE

So what we are doing is passing the ansible-runner transmit to the stdin for the receptorctl command and telling receptorctl that this workload is an ansible-runner workload based on what we specified in the receptor config earlier. The ansible-runner process command receives the output of remote ansible-runner work and distributes the results.

If everything is setup correctly and your command runs successfully you should notice on your execution node that the ansible-runner:latest execution environment under your podman images and you should have had the playbook output on your controllers terminal indicating everything was successful.

podman images
REPOSITORY                      TAG         IMAGE ID      CREATED        SIZE
quay.io/ansible/ansible-runner  latest      bec0dc171168  17 months ago  816 MB

Notes

  • Please read the ansible-runner documentation on the env section, this section might help clear up how AWX and AAP are passing variables, ssh-keys, passwords, etc to execution environments.
  • The more nodes you have the more network or local firewall rules you might have to change / modify, please keep this in mind in more secure environments.
  • The work I did in the document is a reflection of my own work and not a reflection of the AWX project or Ansible Automation Platform, some of these processes and steps might differ or be incorrect from what the AWX project or Ansible Automation Platform are doing.
4 Likes

F.Y.I., precompiled receptor binaries for major OSes/platforms are published on the releases page on GitHub.

So if you are on Linux/MacOS/Windows on x64/ARM, to install Receptor, you can just download and extract binary instead of compiling it from source code.

4 Likes

That’s an excellent description, thank you @aheath1992 – very interesting.

I’d also like to point out a wonderful article series with diagrams by @kurokobo. His blogs are in Japanese but translate well to English if desired: How Receptor is used in AWX and Automation Mesh is the 4th part which links to the previous three.

3 Likes