Install of awx-operator from helm chart generates error loop in awx-manager container

I am attempting to deploy the awx-operator via the helm chart at awx-operator Helm charts | awx-operator-helm . I have previously successfully installed 2.9.0 and deployed AWX, but prior to the move.

I am now getting the following error in awx-manager, irrespective of the version deployed:

2024-10-15T16:17:26.004798609+08:00 TASK [Verify imagePullSecrets] *************************************************
2024-10-15T16:17:26.004803763+08:00 task path: /opt/ansible/playbooks/awx.yml:10
2024-10-15T16:17:26.004808093+08:00 
2024-10-15T16:17:26.004821573+08:00 -------------------------------------------------------------------------------
2024-10-15T16:17:26.260766825+08:00 {"level":"error","ts":"2024-10-15T08:17:26Z","logger":"logging_event_handler","msg":"","name":"awxtest15","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"runner_on_failed","job":"7862226185777618437","EventData.Task":"Verify imagePullSecrets","EventData.TaskArgs":"","EventData.FailedTaskPath":"/opt/ansible/playbooks/awx.yml:10","error":"[playbook task failed]","stacktrace":"github.com/operator-framework/ansible-operator-plugins/internal/ansible/events.loggingEventHandler.Handle\n\tansible-operator-plugins/internal/ansible/events/log_events.go:111"}
2024-10-15T16:17:26.260864035+08:00 
2024-10-15T16:17:26.260898398+08:00 --------------------------- Ansible Task StdOut -------------------------------
2024-10-15T16:17:26.260902750+08:00 
2024-10-15T16:17:26.260907455+08:00  TASK [Verify imagePullSecrets] ******************************** 
2024-10-15T16:17:26.260919623+08:00 fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2024-10-15T16:17:26.260935467+08:00 

This error loops, and no further pods deploy.

To minimise possible confusion, I have been using the default values file, with minimal changes (enabled: true, and the name, which produced a working configuration previously):

AWX:
  enabled: true
  name: awxtest15
  postgres:
    dbName: Unset
    enabled: false
    host: Unset
    password: Unset
    port: 5678
    sslmode: prefer
    type: unmanaged
    username: admin
  spec:
    admin_user: admin

Further research suggests there is a python issue.

The operator is being installed on a RKE node based on OL8.9.

Running the installer manually:

bash-4.4$ ansible-playbook run.yml -e @vars.yml -v
Using /etc/ansible/ansible.cfg as config file
[WARNING]: Found variable using reserved name: no_log

PLAY [localhost] ****************************************************************************************************************************************************************************

TASK [common : Get information about the cluster] *******************************************************************************************************************************************
ok: [localhost] => {"ansible_facts": {"api_groups": ["", "apiregistration.k8s.io", "apps", "events.k8s.io", "authentication.k8s.io", "authorization.k8s.io", "autoscaling", "batch", "certificates.k8s.io", "networking.k8s.io", "policy", "rbac.authorization.k8s.io", "storage.k8s.io", "admissionregistration.k8s.io", "apiextensions.k8s.io", "scheduling.k8s.io", "coordination.k8s.io", "node.k8s.io", "discovery.k8s.io", "flowcontrol.apiserver.k8s.io", "catalog.cattle.io", "crd.projectcalico.org", "helm.cattle.io", "k3s.cattle.io", "operator.tigera.io", "ui.cattle.io", "upgrade.cattle.io", "awx.ansible.com", "cns.vmware.com", "management.cattle.io", "metrics.k8s.io"]}, "changed": false}

TASK [common : Determine the cluster type] **************************************************************************************************************************************************
ok: [localhost] => {"ansible_facts": {"is_k8s": true, "is_openshift": false}, "changed": false}

TASK [common : debug] ***********************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": "CLUSTER TYPE: is_openshift=False; is_k8s=True"
}

TASK [installer : Check for presence of old awx Deployment] *********************************************************************************************************************************
fatal: [localhost]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/libexec/platform-python"}, "changed": false, "msg": "Failed to import the required Python library (kubernetes) on awx-operator-controller-manager-6bd475f79-jh6nd's Python /usr/libexec/platform-python. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"}

PLAY RECAP **********************************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

bash-4.4$ uname -a
Linux awx-operator-controller-manager-6bd475f79-jh6nd 5.15.0-105.125.6.2.2.el9uek.x86_64 #2 SMP Tue Sep 19 23:28:56 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
bash-4.4$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.9 (Ootpa)

If I set ansible_python_interpreter: ‘{{ ansible_playbook_python }}’ in vars.yml, I get:

bash-4.4$ ansible-playbook run.yml -e @vars.yml -v
Using /etc/ansible/ansible.cfg as config file
[WARNING]: Found variable using reserved name: no_log

PLAY [localhost] ****************************************************************************************************************************************************************************

TASK [common : Get information about the cluster] *******************************************************************************************************************************************
ok: [localhost] => {"ansible_facts": {"api_groups": ["", "apiregistration.k8s.io", "apps", "events.k8s.io", "authentication.k8s.io", "authorization.k8s.io", "autoscaling", "batch", "certificates.k8s.io", "networking.k8s.io", "policy", "rbac.authorization.k8s.io", "storage.k8s.io", "admissionregistration.k8s.io", "apiextensions.k8s.io", "scheduling.k8s.io", "coordination.k8s.io", "node.k8s.io", "discovery.k8s.io", "flowcontrol.apiserver.k8s.io", "catalog.cattle.io", "crd.projectcalico.org", "helm.cattle.io", "k3s.cattle.io", "operator.tigera.io", "ui.cattle.io", "upgrade.cattle.io", "awx.ansible.com", "cns.vmware.com", "management.cattle.io", "metrics.k8s.io"]}, "changed": false}

TASK [common : Determine the cluster type] **************************************************************************************************************************************************
ok: [localhost] => {"ansible_facts": {"is_k8s": true, "is_openshift": false}, "changed": false}

TASK [common : debug] ***********************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": "CLUSTER TYPE: is_openshift=False; is_k8s=True"
}

TASK [installer : Check for presence of old awx Deployment] *********************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "/bin/sh: line 1:  4158 Killed                  /usr/bin/python3 /opt/ansible/.ansible/tmp/ansible-tmp-1729214698.536406-4146-47837280803179/AnsiballZ_k8s_info.py\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 137}

PLAY RECAP **********************************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

bash-4.4$

Which corresponds to the error generated by running via helm.

Any suggestions as to where to go from here?

I have exactly the same problem from today.

AWX-Operator 2.19.1 installed on RKE2, I start working today and:

[some_server]$ k get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-operator-controller-manager-b8789c7f7-5sqwc 1/2 CrashLoopBackOff 4 (80s ago) 5m58s
--------------------------- Ansible Task StdOut -------------------- -----------

TASK [Verify imagePullSecrets] ********************************************************** ****
task path: /opt/ansible/playbooks/awx.yml:10

-------------------------------------------------- -----------------------------

and this certain log ends a log follower.

Anybody any idea?

Running the playbook manually with more verbosity gives us a little more detail, but still nothing that suggests a cause or a solution:

bash-4.4$ ansible-playbook run.yml -e @vars.yml -v -v -v -v -v -v 
ansible-playbook [core 2.15.8]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/usr/share/ansible/openshift']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /opt/ansible/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible-playbook
  python version = 3.9.18 (main, Sep 22 2023, 17:58:34) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] (/usr/bin/python3)
  jinja version = 3.1.3
  libyaml = True
Using /etc/ansible/ansible.cfg as config file

and more setup information, then it starts running things.

PLAY [localhost] ****************************************************************************************************************************************************************************

TASK [common : Get information about the cluster] *******************************************************************************************************************************************
task path: /opt/ansible/roles/common/tasks/main.yml:3
redirecting (type: lookup) ansible.builtin.k8s to kubernetes.core.k8s
ok: [localhost] => {
    "ansible_facts": {
        "api_groups": [
            "",
            "apiregistration.k8s.io",
            "apps",
            "events.k8s.io",
            "authentication.k8s.io",
            "authorization.k8s.io",
            "autoscaling",
            "batch",
            "certificates.k8s.io",
            "networking.k8s.io",
            "policy",
            "rbac.authorization.k8s.io",
            "storage.k8s.io",
            "admissionregistration.k8s.io",
            "apiextensions.k8s.io",
            "scheduling.k8s.io",
            "coordination.k8s.io",
            "node.k8s.io",
            "discovery.k8s.io",
            "flowcontrol.apiserver.k8s.io",
            "catalog.cattle.io",
            "crd.projectcalico.org",
            "helm.cattle.io",
            "k3s.cattle.io",
            "operator.tigera.io",
            "ui.cattle.io",
            "upgrade.cattle.io",
            "awx.ansible.com",
            "cns.vmware.com",
            "management.cattle.io",
            "metrics.k8s.io"
        ]
    },
    "changed": false
}

And then more tasks from the common role, then it hits the installer role, and things go pear-shaped:

TASK [installer : Check for presence of old awx Deployment] *********************************************************************************************************************************
task path: /opt/ansible/roles/installer/tasks/main.yml:2
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: ansible
<localhost> EXEC /bin/sh -c 'echo ~ansible && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /opt/ansible/.ansible/tmp `"&& mkdir "` echo /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398 `" && echo ansible-tmp-1729497208.756366-10930-68667324237398="` echo /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398 `" ) && sleep 0'
Including module_utils file ansible/__init__.py
Including module_utils file ansible/module_utils/__init__.py
Including module_utils file ansible/module_utils/basic.py
Including module_utils file ansible/module_utils/_text.py
Including module_utils file ansible/module_utils/common/_json_compat.py
Including module_utils file ansible/module_utils/common/__init__.py
Including module_utils file ansible/module_utils/common/_utils.py
Including module_utils file ansible/module_utils/common/arg_spec.py
Including module_utils file ansible/module_utils/common/file.py
Including module_utils file ansible/module_utils/common/locale.py
Including module_utils file ansible/module_utils/common/parameters.py
Including module_utils file ansible/module_utils/common/collections.py
Including module_utils file ansible/module_utils/common/process.py
Including module_utils file ansible/module_utils/common/sys_info.py
Including module_utils file ansible/module_utils/common/text/converters.py
Including module_utils file ansible/module_utils/common/text/__init__.py
Including module_utils file ansible/module_utils/common/text/formatters.py
Including module_utils file ansible/module_utils/common/validation.py
Including module_utils file ansible/module_utils/common/warnings.py
Including module_utils file ansible/module_utils/compat/selectors.py
Including module_utils file ansible/module_utils/compat/__init__.py
Including module_utils file ansible/module_utils/compat/_selectors2.py
Including module_utils file ansible/module_utils/compat/selinux.py
Including module_utils file ansible/module_utils/distro/__init__.py
Including module_utils file ansible/module_utils/distro/_distro.py
Including module_utils file ansible/module_utils/errors.py
Including module_utils file ansible/module_utils/parsing/convert_bool.py
Including module_utils file ansible/module_utils/parsing/__init__.py
Including module_utils file ansible/module_utils/pycompat24.py
Including module_utils file ansible/module_utils/six/__init__.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/ansiblemodule.py
Including module_utils file ansible_collections/__init__.py
Including module_utils file ansible_collections/kubernetes/__init__.py
Including module_utils file ansible_collections/kubernetes/core/__init__.py
Including module_utils file ansible_collections/kubernetes/core/plugins/__init__.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/__init__.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/args_common.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8s/client.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/client/discovery.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/client/__init__.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/client/resource.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8s/__init__.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8s/core.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8s/exceptions.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8s/service.py
Including module_utils file ansible/module_utils/common/dict_transformations.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/apply.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/common.py
Including module_utils file ansible/module_utils/urls.py
Including module_utils file ansible/module_utils/compat/typing.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/exceptions.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/hashes.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8s/waiter.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/k8sdynamicclient.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/version.py
Including module_utils file ansible_collections/kubernetes/core/plugins/module_utils/_version.py
Using module file /opt/ansible/.ansible/collections/ansible_collections/kubernetes/core/plugins/modules/k8s_info.py
<localhost> PUT /opt/ansible/.ansible/tmp/ansible-local-109200clhnr2p/tmptihkeltq TO /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398/AnsiballZ_k8s_info.py
<localhost> EXEC /bin/sh -c 'chmod u+x /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398/ /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398/AnsiballZ_k8s_info.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python3 /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398/AnsiballZ_k8s_info.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /opt/ansible/.ansible/tmp/ansible-tmp-1729497208.756366-10930-68667324237398/ > /dev/null 2>&1 && sleep 0'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "invocation": {
        "api_version": "apps/v1",
        "kind": "Deployment",
        "module_args": {
            "api_version": "apps/v1",
            "kind": "Deployment",
            "name": "awx",
            "namespace": "awx"
        },
        "name": "awx",
        "namespace": "awx"
    },
    "module_stderr": "",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": -9
}

What is particularly odd is that both stderr and stdout are empty.

Further, manually running that first module succeeds:

export PYTHONPATH=${PYTHONPATH}:/opt/ansible/.ansible/collections/
echo '{ "ANSIBLE_MODULE_ARGS": {
  "api_version": "apps/v1",
  "kind": "Deployment",
  "name": "awx",
  "namespace": "awx"
} } ' |python3 /opt/ansible/.ansible/collections/ansible_collections/kubernetes/core/plugins/modules/k8s_info.py

{"changed": false, "resources": [], "api_found": true, "invocation": {"module_args": {"api_version": "apps/v1", "kind": "Deployment", "name": "awx", "namespace": "awx", "wait": false, "wait_sleep": 5, "wait_timeout": 120, "label_selectors": [], "field_selectors": [], "kubeconfig": null, "context": null, "host": null, "api_key": null, "username": null, "password": null, "validate_certs": null, "ca_cert": null, "client_cert": null, "client_key": null, "proxy": null, "no_proxy": null, "proxy_headers": null, "persist_config": null, "impersonate_user": null, "impersonate_groups": null, "wait_condition": null, "hidden_fields": null}}}

Any suggestions? Anyone? Bueller?

Also logged in github as issue #1978.

One minor correction: The host cluster is running OL9, not 8.9.

As an experiment, I set up an OL8 cluster and re-ran everything, but had the same result. So then I tried falling back to the generic Redhat kernel (instead of the UEK kernel) and still the same result.

Eventually identified as being caused by Crowdstrike Falcon Agent. Kill the agent, and everything runs.