AWX slower than ansible-playbook CLI

I’m working on migrating my company’s Ansible playbooks from our current setup with a static VM, running static flat files against ansible-playbook, to using AWX. I’ve managed to get everything working, except playbook execution is much slower on AWX.

For example, for a particular playbook running against 9 hosts, I’ve recorded these run times:

  • CLI/ansible-playbook: 16m, 5m
  • AWX: 41m, 25m

This will be a blocker for migration to AWX, as some of our deployments go to hundreds of hosts and take several hours on the command line already, and I can’t afford to spend 3x-5x as much time.

I’ve noticed a number of “fixed cost” drivers of this, e.g. in some cases, AWX has to spin up an automation-task on the k8s cluster, and sometimes a new k8s node is required to do so. But putting that aside, even execution within the tasks takes consistently longer.

My leading hypothesis is that it’s not using persisted SSH connections correctly. We use a bastion host to connect between Ansible and our devices. In a recent playbook run, I looked on the end device’s /var/log/auth.log and noted:

  • for CLI/ansible-playbook: 3 instances of “Accepted password for ”
  • for AWX: 293 instances of “Accepted password for ”
  • In both cases, on the bastion, I do see hundreds of “Accepted publickey for ” so it seems like the connection between the bastion and the end device is the difference.

In my inventory, I have added the following:

ansible_ssh_common_args: '-o ProxyCommand="ssh -v -W %h:%p user@bastion.example.com -o StrictHostKeyChecking=no -o ControlMaster=auto -o ControlPersist=3600s"'

and when I run with verbose logs, I see this:

SSH: EXEC sshpass -d12 ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o ‘User=“user”’ -o ConnectTimeout=10 -o ‘ProxyCommand=ssh -v -W %h:%p user@bastion.example.com -o StrictHostKeyChecking=no -o ControlMaster=auto -o ControlPersist=3600s’ -o ‘ControlPath=“/runner/cp/7fd6ab9754”’ 10.0.0.999 ‘/bin/sh -c ‘"’“‘echo ~user && sleep 0’”’"‘’

(replaced IP user/hostname with examples)

So it seems like AWX wants to persist the connection, but it doesn’t seem to be working. Are there other settings I need to tweak to get SSH persistence to work? Currently using AWX 22.3.0. I could upgrade if that helps, but didn’t see anything in the release notes about SSH or persistence.

Something to ensure is that the ansible.cfg used within AWX (by default in your EE’s or in your project’s directory, etc) have pipelining enabled.

This will greatly speed up SSH connections (persistent or not) by preventing the creation of temporary files, including when privilege escalation is used.

In addition, it requires remote servers to have:

/etc/sudoers:
Defaults    !requiretty

For pipelining to work with privilege escalation via sudo.

2 Likes

Thanks! I was able to get this added and saw a reduction by about half of how many SSH connections are made.

Also note for posterity, in case anyone else runs into this and needs to rebuild their execution environment to include ansible.cfg using ansible-builder v3, I was able to structure it similar to the comment in variable ansible_config not working · Issue #413 · ansible/ansible-builder · GitHub

I think I have myself unblocked enough to proceed from here. I’ll cycle back later if I have time to try to reduce the reconnections further.

2 Likes