Unable to find process isolation executable: podman on AAP/AWX

,

OK, I know I am a Red Hat employee, but I need help, and I assume other folks will hit this, so I would rather get help online, in the open, so everyone will benefit when someone helps me out :pray:

I routinely will build images for AAP using the containerized installer, then use these images in labs and workshps, and everything installs fine. But “SOMETIMES” I will get this error when trying to sync a project (through the infra.controller content collection, which uses the awx.awx content collection).

image

Worker output:
Unable to find process isolation executable: podman

If I login to the webUI, and manually sync the project, everything will work fine… and there is no errors.

I have a feeling… (guess) that it has something to do with when I issue an SSL cert.

I will do something like this->

- name: Make sure Automation Controller is stopped
  become: true
  become_user: "{{ run_commands_user }}"
  containers.podman.podman_container:
    name: automation-controller-web
    state: stopped
  register: install_controller
  until: install_controller is not failed
  retries: 5

I then let Let’s Encrypt create a cert (it sets up an https website using the certbot package, retrieves the cert, then deletes the web service).

Then turn Automation controller back on later->

- name: Make sure Automation Controller is online before changing base URL
  become: true
  become_user: "{{ run_commands_user }}"
  containers.podman.podman_container:
    name: automation-controller-web
    state: started
  register: install_controller
  until: install_controller is not failed
  retries: 5

I am curious… if I should be doing that with all these containers…

[ec2-user@ansible-1 ~]$ podman ps
CONTAINER ID  IMAGE                                                                        COMMAND               CREATED      STATUS         PORTS       NAMES
a5dd9747001b  registry.redhat.io/rhel8/postgresql-13:latest                                run-postgresql        6 weeks ago  Up 25 minutes              postgresql
462fc9c41346  registry.redhat.io/rhel8/redis-6:latest                                      run-redis             6 weeks ago  Up 25 minutes              redis
a9bed32cb9f5  registry.redhat.io/ansible-automation-platform-24/ee-supported-rhel8:latest  /usr/bin/receptor...  6 weeks ago  Up 25 minutes              receptor
714348cda22c  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  6 weeks ago  Up 25 minutes              automation-controller-rsyslog
e606bb5e6231  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  6 weeks ago  Up 25 minutes              automation-controller-task
d37e4e346fbe  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  6 weeks ago  Up 21 minutes              automation-controller-web

so there is actually 6 containers… should I be restarting all of these rather than just the web? or if my hypothesis is just off…

I also have tried a block/rescue to and until loops… but once it hits this podman error, I can’t seem to bypass it unless I manually login to the webUI. It is hair pulling furstrating…

1 Like

That error comes from ansible-runner. That error is only going to happen when runner cannot successfully execute the podman --version command and shouldn’t have anything to do with SSL certs. I suspect you might be hitting a worker node that somehow doesn’t have podman installed? Or perhaps a broken installation?

1 Like

as soon as I login to the webUI, the project syncs successfully, and I never see that error again… podman exists fine

Sean do you have a snippet that just uses awx.awx.collection that I can try to reproduce the problem with?

also is that worker output error message on the stdout page of the project sync in the UI?

I cannot reliabley recreate this, but here is when it happened last time from the POV from my control node provisioning the AWX host on AWS:

TASK [infra.controller_configuration.projects : Configure Controller Projects | Wait for finish the projects creation] *************************************
Wednesday 29 November 2023  15:40:59 -0800 (0:00:06.092)       0:25:29.196 ****
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (45 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (44 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (43 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (42 retries left).
failed: [nov29-student1-ansible-1] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': 'j761021533636.17017', 'results_file': '/home/ec2-user/.ansible_async/j761021533636.17017', 'changed': False, '__controller_project_item': {'name': 'Automated Management', 'organization': 'Default', 'scm_update_on_launch': True, 'scm_update_cache_timeout': 3600, 'scm_type': 'git', 'scm_url': 'https://github.com/redhat-partner-tech/automated-satellite.git', 'scm_branch': 'aap2-24', 'default_environment': 'auto_satellite workshop execution environment'}, 'ansible_loop_var': '__controller_project_item'}) => {"__projects_job_async_results_item": {"__controller_project_item": {"default_environment": "auto_satellite workshop execution environment", "name": "Automated Management", "organization": "Default", "scm_branch": "aap2-24", "scm_type": "git", "scm_update_cache_timeout": 3600, "scm_update_on_launch": true, "scm_url": "https://github.com/redhat-partner-tech/automated-satellite.git"}, "ansible_job_id": "j761021533636.17017", "ansible_loop_var": "__controller_project_item", "changed": false, "failed": 0, "finished": 0, "results_file": "/home/ec2-user/.ansible_async/j761021533636.17017", "started": 1}, "ansible_job_id": "j761021533636.17017", "ansible_loop_var": "__projects_job_async_results_item", "attempts": 5, "changed": false, "finished": 1, "msg": "Project update failed", "results_file": "/home/ec2-user/.ansible_async/j761021533636.17017", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (45 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (44 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (43 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (42 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (41 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (40 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (39 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (38 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (37 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (36 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (35 retries left).
FAILED - RETRYING: [nov29-student1-ansible-1]: Configure Controller Projects | Wait for finish the projects creation (34 retries left).
changed: [nov29-student1-ansible-1] => (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': 'j671667510877.17151', 'results_file': '/home/ec2-user/.ansible_async/j671667510877.17151', 'changed': False, '__controller_project_item': {'name': 'Fact Scan', 'organization': 'Default', 'scm_type': 'git', 'scm_url': 'https://github.com/ansible/awx-facts-playbooks.git'}, 'ansible_loop_var': '__controller_project_item'})

the workshop will use the infra.controller validated content collection role->

 - infra.controller_configuration.projects

which is basically doing this->

- name: Add project
  awx.awx.project:
    name: "Our Project"
    organization: "Default"
    scm_type: git
    scm_url: "https://github.com/redhat-partner-tech/automated-satellite.git"
    execution_environment: "quay.io/s4v0/ee-automated-satellite-aap2:2.0.0"
    scm_branch: 'aap2-24'

when we login we see the podman error… but when we manually press sync on the webUI it just works… and no other problems are reported… it is very weird…