Unable to find process isolation executable: podman on AAP/AWX

,

OK, I know I am a Red Hat employee, but I need help, and I assume other folks will hit this, so I would rather get help online, in the open, so everyone will benefit when someone helps me out :pray:

I routinely will build images for AAP using the containerized installer, then use these images in labs and workshps, and everything installs fine. But “SOMETIMES” I will get this error when trying to sync a project (through the infra.controller content collection, which uses the awx.awx content collection).

Worker output:
Unable to find process isolation executable: podman

If I login to the webUI, and manually sync the project, everything will work fine… and there is no errors.

I have a feeling… (guess) that it has something to do with when I issue an SSL cert.

I will do something like this->

- name: Make sure Automation Controller is stopped
  become: true
  become_user: "{{ run_commands_user }}"
  containers.podman.podman_container:
    name: automation-controller-web
    state: stopped
  register: install_controller
  until: install_controller is not failed
  retries: 5

I then let Let’s Encrypt create a cert (it sets up an https website using the certbot package, retrieves the cert, then deletes the web service).

Then turn Automation controller back on later->

- name: Make sure Automation Controller is online before changing base URL
  become: true
  become_user: "{{ run_commands_user }}"
  containers.podman.podman_container:
    name: automation-controller-web
    state: started
  register: install_controller
  until: install_controller is not failed
  retries: 5

I am curious… if I should be doing that with all these containers…

[ec2-user@ansible-1 ~]$ podman ps
CONTAINER ID  IMAGE                                                                        COMMAND               CREATED      STATUS         PORTS       NAMES
a5dd9747001b  registry.redhat.io/rhel8/postgresql-13:latest                                run-postgresql        6 weeks ago  Up 25 minutes              postgresql
462fc9c41346  registry.redhat.io/rhel8/redis-6:latest                                      run-redis             6 weeks ago  Up 25 minutes              redis
a9bed32cb9f5  registry.redhat.io/ansible-automation-platform-24/ee-supported-rhel8:latest  /usr/bin/receptor...  6 weeks ago  Up 25 minutes              receptor
714348cda22c  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  6 weeks ago  Up 25 minutes              automation-controller-rsyslog
e606bb5e6231  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  6 weeks ago  Up 25 minutes              automation-controller-task
d37e4e346fbe  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  6 weeks ago  Up 21 minutes              automation-controller-web

so there is actually 6 containers… should I be restarting all of these rather than just the web? or if my hypothesis is just off…

I also have tried a block/rescue to and until loops… but once it hits this podman error, I can’t seem to bypass it unless I manually login to the webUI. It is hair pulling furstrating…

2 Likes

That error comes from ansible-runner. That error is only going to happen when runner cannot successfully execute the podman --version command and shouldn’t have anything to do with SSL certs. I suspect you might be hitting a worker node that somehow doesn’t have podman installed? Or perhaps a broken installation?

1 Like

as soon as I login to the webUI, the project syncs successfully, and I never see that error again… podman exists fine

Sean do you have a snippet that just uses awx.awx.collection that I can try to reproduce the problem with?

also is that worker output error message on the stdout page of the project sync in the UI?

I cannot reliabley recreate this, but here is when it happened last time from the POV from my control node provisioning the AWX host on AWS:

TASK [infra.controller_configuration.projects : Configure Controller Projects | Wait for finish the projects creation] *************************************
Wednesday 29 November 2023  15:40:59 -0800 (0:00:06.092)       0:25:29.196 ****
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (45 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (44 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (43 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (42 retries left).
failed: [nov29-student1-ansible-1] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': 'j761021533636.17017', 'results_file': '/home/ec2-user/.ansible_async/j761021533636.17017', 'changed': False, '__controller_project_item': {'name': 'Automated Management', 'organization': 'Default', 'scm_update_on_launch': True, 'scm_update_cache_timeout': 3600, 'scm_type': 'git', 'scm_url': 'https://github.com/redhat-partner-tech/automated-satellite.git', 'scm_branch': 'aap2-24', 'default_environment': 'auto_satellite workshop execution environment'}, 'ansible_loop_var': '__controller_project_item'}) => {"__projects_job_async_results_item": {"__controller_project_item": {"default_environment": "auto_satellite workshop execution environment", "name": "Automated Management", "organization": "Default", "scm_branch": "aap2-24", "scm_type": "git", "scm_update_cache_timeout": 3600, "scm_update_on_launch": true, "scm_url": "https://github.com/redhat-partner-tech/automated-satellite.git"}, "ansible_job_id": "j761021533636.17017", "ansible_loop_var": "__controller_project_item", "changed": false, "failed": 0, "finished": 0, "results_file": "/home/ec2-user/.ansible_async/j761021533636.17017", "started": 1}, "ansible_job_id": "j761021533636.17017", "ansible_loop_var": "__projects_job_async_results_item", "attempts": 5, "changed": false, "finished": 1, "msg": "Project update failed", "results_file": "/home/ec2-user/.ansible_async/j761021533636.17017", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (45 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (44 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (43 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (42 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (41 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (40 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (39 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (38 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (37 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (36 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (35 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (34 retries left).
changed: [nov29-student1-ansible-1] => (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': 'j671667510877.17151', 'results_file': '/home/ec2-user/.ansible_async/j671667510877.17151', 'changed': False, '__controller_project_item': {'name': 'Fact Scan', 'organization': 'Default', 'scm_type': 'git', 'scm_url': 'https://github.com/ansible/awx-facts-playbooks.git'}, 'ansible_loop_var': '__controller_project_item'})

the workshop will use the infra.controller validated content collection role->

 - infra.controller_configuration.projects

which is basically doing this->

- name: Add project
  awx.awx.project:
    name: "Our Project"
    organization: "Default"
    scm_type: git
    scm_url: "https://github.com/redhat-partner-tech/automated-satellite.git"
    execution_environment: "quay.io/s4v0/ee-automated-satellite-aap2:2.0.0"
    scm_branch: 'aap2-24'

when we login we see the podman error… but when we manually press sync on the webUI it just works… and no other problems are reported… it is very weird…

We encountered this issue on containerized AAP 2.5 today and a reboot resolved it.