I had a working/running setup previously. I was running into some issues and needed to troubleshoot. I accidentally deleted my namespace. (Yeah, big oops!) I worked through getting my storage correct and all pods are loading/running now except awx web.
I am using awx-operator version 2.11.0 which runs awx 23.7 I reviewed my logs and noted that they are saying password authentication failed for awx. I presume this is for the postgresql DB. Can anyone confirm this?
I also validated that a secret file is present and contains both a username and password. I noted that a DB is present on both of my worker nodes in the cluster. Is that correct?
Here is a snippet of my logs:
[root@gsil-kube01 ~]# kubectl logs awx-web-ffc587896-4d684 -n awx
...
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in <module>
sys.exit(manage())
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/__init__.py", line 159, in manage
if (connection.pg_version // 10000) < 12:
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/connection.py", line 15, in __getattr__
return getattr(self._connections[self._alias], item)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/functional.py", line 57, in __get__
res = instance.__dict__[self.name] = self.func(instance)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 436, in pg_version
with self.temporary_connection():
File "/usr/lib64/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 705, in temporary_connection
with self.cursor() as cursor:
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 330, in cursor
return self._cursor()
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 306, in _cursor
self.ensure_connection()
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 270, in connect
self.connection = self.get_new_connection(conn_params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection
connection = self.Database.connect(**conn_params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/connection.py", line 728, in connect
raise ex.with_traceback(None)
django.db.utils.OperationalError: connection failed: password authentication failed for user "awx"
2024-05-09 19:48:13,372 WARN exited: awx-cache-clear (exit status 1; not expected)
2024-05-09 19:48:13,372 WARN exited: awx-cache-clear (exit status 1; not expected)
2024-05-09 19:48:13,469 INFO gave up: awx-cache-clear entered FATAL state, too many start retries too quickly
2024-05-09 19:48:13,469 INFO gave up: awx-cache-clear entered FATAL state, too many start retries too quickly
2024-05-09 19:48:13,469 WARN exited: ws-heartbeat (exit status 1; not expected)
2024-05-09 19:48:13,469 WARN exited: ws-heartbeat (exit status 1; not expected)
2024-05-09 19:48:14,471 INFO gave up: ws-heartbeat entered FATAL state, too many start retries too quickly
2024-05-09 19:48:14,471 INFO gave up: ws-heartbeat entered FATAL state, too many start retries too quickly
Processing Event: ver:3.0 server:supervisor serial:0 pool:superwatcher poolserial:0 eventname:PROCESS_STATE_FATAL len:72
2024-05-09 19:48:14,471 WARN received SIGQUIT indicating exit request
2024-05-09 19:48:14,471 WARN received SIGQUIT indicating exit request
2024-05-09 19:48:14,471 INFO waiting for superwatcher, nginx, uwsgi, daphne to die
2024-05-09 19:48:14,471 INFO waiting for superwatcher, nginx, uwsgi, daphne to die
...brutally killing workers...
2024-05-09 19:48:14,541 INFO stopped: nginx (exit status 0)
2024-05-09 19:48:14,541 INFO stopped: nginx (exit status 0)
2024-05-09 19:48:14,680 WARNING [-] awx.conf.settings Database settings are not available, using defaults. error: connection failed: password authentication failed for user "awx"
2024-05-09 19:48:14,680 WARNING Database settings are not available, using defaults. error: connection failed: password authentication failed for user "awx"
...
Here is my setup:
[root@gsil-kube01 ~]# cat storage.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageClass.kubernetes.io/is-default-class: "true"
name: local-storage
namespace: awx
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
#volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
namespace: awx
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /var/lib/postgresql/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- gsil-kube01.idm.gsil.org
- gsil-kube02.idm.gsil.org
- gsil-kube03.idm.gsil.org
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-13-awx-postgres-13-0
namespace: awx
spec:
storageClassName: local-storage
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
[root@gsil-kube01 ~]# cat awxvalues.yaml
AWX:
# enable use of awx-deploy template
enabled: true
name: awx
spec:
service_type: NodePort
nodeport_port: 30080
admin_user: admin
hostname: awx.idm.gsil.org
image: gsil-docker1.idm.gsil.org:5001/quay.io/ansible/awx
image_version: 23.7.0
init_container_image: gsil-docker1.idm.gsil.org:5001/quay.io/ansible/awx-ee
init_container_image_version: latest
ee_images:
- name: AWX EE
image: gsil-docker1.idm.gsil.org:5001/quay.io/ansible/awx-ee:latest
postgres_image: gsil-docker1.idm.gsil.org:5001/postgres
postgres_image_version: "13"
control_plane_ee_image: gsil-docker1.idm.gsil.org:5001/quay.io/ansible/awx-ee:latest
redis_image: gsil-docker1.idm.gsil.org:5001/redis
redis_image_version: "7"
ldap_cacert_secret: awx-custom-certs
ldap_password_secret: awx-ldap-password
bundle_cacert_secret: awx-custom-certs
extra_settings:
- <LDAP_STUFF_HERE>....
customVolumes:
postgres:
enabled: true
hostPath: /var/lib/postgresql/data
size: 2Gi
storageClassName: local-storage
projects:
enabled: true
hostPath: /opt/projects/data
size: 5Gi
[root@gsil-kube01 ~]# kubectl get sc,pv,pvc -n awx
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/local-storage kubernetes.io/no-provisioner Delete Immediate false 70m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/postgres-pv 2Gi RWX Delete Bound awx/postgres-13-awx-postgres-13-0 local-storage 70m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/postgres-13-awx-postgres-13-0 Bound postgres-pv 2Gi RWX local-storage 70m
[root@gsil-kube01 ~]# kubectl get secret -n awx
NAME TYPE DATA AGE
awx-admin-password Opaque 1 19m
awx-app-credentials Opaque 3 18m
awx-broadcast-websocket Opaque 1 19m
awx-custom-certs Opaque 1 24h
awx-ldap-password Opaque 1 22m
awx-postgres-configuration Opaque 6 18m
awx-receptor-ca kubernetes.io/tls 2 18m
awx-receptor-work-signing Opaque 2 18m
awx-secret-key Opaque 1 19m
redhat-operators-pull-secret Opaque 1 19m
sh.helm.release.v1.gsil-awx.v1 helm.sh/release.v1 1 19m
[root@gsil-kube01 ~]# kubectl edit secret -n awx awx-postgres-configuration
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
database: YXd4
host: YXd4LXBvc3RncmVzLTEz
password: NU9QdnBOWDJhcTR3SjFQQTJxYkRMUDVqaEpDN3dmcE4=
port: NTQzMg==
type: bWFuYWdlZA==
username: YXd4
kind: Secret
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"v1","kind":"Secret","metadata":{"labels":{"app.kubernetes.io/component":"awx","app.kubernetes.io/managed-by":"awx-operator","app.kubernetes.io/operator-version":"2.11.0","app.kubernetes.io/part-of":"awx"},"name":"awx-postgres-configuration","namespace":"awx"},"stringData":{"database":"awx","host":"awx-postgres-13","password":"5OPvpNX2aq4wJ1PA2qbDLP5jhJC7wfpN","port":"5432","type":"managed","username":"awx"}}'
creationTimestamp: "2024-05-10T11:43:41Z"
labels:
app.kubernetes.io/component: awx
app.kubernetes.io/managed-by: awx-operator
app.kubernetes.io/operator-version: 2.11.0
app.kubernetes.io/part-of: awx
name: awx-postgres-configuration
namespace: awx
ownerReferences:
- apiVersion: awx.ansible.com/v1beta1
kind: AWX
name: awx
uid: 79af9d9a-61de-42d5-8d38-6c2e39dd44d6
resourceVersion: "17960391"
uid: c584c1e6-9f9b-4e18-8673-c9b3f1877ce0
type: Opaque
[root@gsil-kube02 pgdata]# ls -lah
total 64K
drwx------. 19 systemd-coredump root 4.0K May 10 11:43 .
drwxr-xr-x. 3 root root 20 Mar 13 17:50 ..
drwx------. 6 systemd-coredump input 54 Mar 13 17:50 base
drwx------. 2 systemd-coredump input 4.0K May 10 11:44 global
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_commit_ts
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_dynshmem
-rw-------. 1 systemd-coredump input 4.8K Mar 13 17:50 pg_hba.conf
-rw-------. 1 systemd-coredump input 1.6K Mar 13 17:50 pg_ident.conf
drwx------. 4 systemd-coredump input 68 May 10 11:48 pg_logical
drwx------. 4 systemd-coredump input 36 Mar 13 17:50 pg_multixact
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_notify
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_replslot
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_serial
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_snapshots
drwx------. 2 systemd-coredump input 6 May 10 11:43 pg_stat
drwx------. 2 systemd-coredump input 84 May 10 11:58 pg_stat_tmp
drwx------. 2 systemd-coredump input 18 Mar 13 17:50 pg_subtrans
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_tblspc
drwx------. 2 systemd-coredump input 6 Mar 13 17:50 pg_twophase
-rw-------. 1 systemd-coredump input 3 Mar 13 17:50 PG_VERSION
drwx------. 3 systemd-coredump input 92 May 9 17:09 pg_wal
drwx------. 2 systemd-coredump input 18 Mar 13 17:50 pg_xact
-rw-------. 1 systemd-coredump input 88 Mar 13 17:50 postgresql.auto.conf
-rw-------. 1 systemd-coredump input 28K Mar 13 17:50 postgresql.conf
-rw-------. 1 systemd-coredump input 36 May 10 11:43 postmaster.opts
-rw-------. 1 systemd-coredump input 101 May 10 11:43 postmaster.pid
[root@gsil-kube02 pgdata]# pwd
/var/lib/postgresql/data/data/pgdata
[root@gsil-kube03 pgdata]# ls -lah
total 60K
drwx------. 19 systemd-coredump root 4.0K May 10 11:41 .
drwxr-xr-x. 3 root root 20 Feb 22 18:50 ..
drwx------. 6 systemd-coredump input 54 Mar 1 19:02 base
drwx------. 2 systemd-coredump input 4.0K May 9 19:47 global
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_commit_ts
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_dynshmem
-rw-------. 1 systemd-coredump input 4.8K Mar 1 19:02 pg_hba.conf
-rw-------. 1 systemd-coredump input 1.6K Mar 1 19:02 pg_ident.conf
drwx------. 4 systemd-coredump input 68 May 10 11:41 pg_logical
drwx------. 4 systemd-coredump input 36 Mar 1 19:02 pg_multixact
drwx------. 2 systemd-coredump input 6 May 9 19:46 pg_notify
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_replslot
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_serial
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_snapshots
drwx------. 2 systemd-coredump input 84 May 10 11:41 pg_stat
drwx------. 2 systemd-coredump input 6 May 10 11:41 pg_stat_tmp
drwx------. 2 systemd-coredump input 18 May 3 13:29 pg_subtrans
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_tblspc
drwx------. 2 systemd-coredump input 6 Mar 1 19:02 pg_twophase
-rw-------. 1 systemd-coredump input 3 Mar 1 19:02 PG_VERSION
drwx------. 3 systemd-coredump input 92 May 6 10:35 pg_wal
drwx------. 2 systemd-coredump input 18 Mar 1 19:02 pg_xact
-rw-------. 1 systemd-coredump input 88 Mar 1 19:02 postgresql.auto.conf
-rw-------. 1 systemd-coredump input 28K Mar 1 19:02 postgresql.conf
-rw-------. 1 systemd-coredump input 36 May 9 19:46 postmaster.opts
[root@gsil-kube03 pgdata]# pwd
/var/lib/postgresql/data/data/pgdata
What suggestions can anyone make to troubleshoot this issue? Thanks!