AWX Operator 2.5.1 New Deploy Never Completes (django.db.utils.OperationalError: connection is bad)

I have a fresh install of AWX (22.7.0) but end up with the same failure as I did in 22.6

TASK [Check if there are any super users defined.] ********************************
fatal: [localhost]: FAILED! => {“changed”: true, “rc”: 1, “return_code”: 1, “stderr”: “Traceback (most recent call last):\n
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection\n
self.connect()\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner\n
return func(*args, **kwargs)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 270, in connect\n
self.connection = self.get_new_connection(conn_params)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26,
in inner\n return func(*args, **kwargs)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection\n connection = self.Database.connect(**conn_params)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/connection.py", line 728, in connect\n raise ex.with_traceback(None)\npsycopg.OperationalError: connection is bad: Name or service not known\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/usr/bin/awx-manage", line 8, in \n sys.exit(manage())\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/init.py", line 184, in manage\n if (connection.pg_version // 10000) < 12:\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/connection.py", line 15, in getattr\n return getattr(self._connections[self._alias], item)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/functional.py", line 57, in get\n res = instance.dict[self.name] = self.func(instance)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 436, in pg_version\n with self.temporary_connection():\n File "/usr/lib64/python3.9/contextlib.py", line 119, in enter\n return next(self.gen)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 705, in temporary_connection\n with self.cursor() as cursor:\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner\n return func(*args, **kwargs)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 330, in cursor\n return self._cursor()\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 306, in _cursor\n self.ensure_connection()\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner\n return func(*args, **kwargs)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection\n self.connect()\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in exit\n raise dj_exc_value.with_traceback(traceback) from exc_value\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection\n self.connect()\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner\n return func(*args, **kwargs)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 270, in connect\n self.connection = self.get_new_connection(conn_params)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner\n return func(*args, **kwargs)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection\n connection = self.Database.connect(**conn_params)\n File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/connection.py", line 728, in connect\n raise ex.with_traceback(None)\ndjango.db.utils.OperationalError: connection is bad: Name or service not known\n”, “stderr_lines”: [“Traceback (most recent call last):”, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection", " self.connect()“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner”, " return func(*args, **kwargs)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 270, in connect”, " self.connection = self.get_new_connection(conn_params)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner”, " return func(*args, **kwargs)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection”, " connection = self.Database.connect(**conn_params)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/connection.py", line 728, in connect”, " raise ex.with_traceback(None)“, “psycopg.OperationalError: connection is bad: Name or service not known”, “”, “The above exception was the direct cause of the following exception:”, “”, “Traceback (most recent call last):”, " File "/usr/bin/awx-manage", line 8, in ”, " sys.exit(manage())“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/init.py", line 184, in manage”, " if (connection.pg_version // 10000) < 12:“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/connection.py", line 15, in getattr”, " return getattr(self._connections[self._alias], item)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/functional.py", line 57, in get”, " res = instance.dict[self.name] = self.func(instance)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 436, in pg_version”, " with self.temporary_connection():“, " File "/usr/lib64/python3.9/contextlib.py", line 119, in enter”, " return next(self.gen)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 705, in temporary_connection”, " with self.cursor() as cursor:“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner”, " return func(*args, **kwargs)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 330, in cursor”, " return self._cursor()“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 306, in _cursor”, " self.ensure_connection()“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner”, " return func(*args, **kwargs)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection”, " self.connect()“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in exit”, " raise dj_exc_value.with_traceback(traceback) from exc_value", " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection", " self.connect()“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner”, " return func(*args, **kwargs)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 270, in connect”, " self.connection = self.get_new_connection(conn_params)“, " File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner”, " return func(*args, **kwargs)“, "
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/{“level”:“error”,“ts”:“2023-08-16T04:24:15Z”,“logger”:
“logging_event_handler”,“msg”:”“,“name”:“awx”,“namespace”:“awx-dev”,“gvk”:“awx.ansible.com/v1beta1, Kind=AWX”,“event_type”:“runner_on_failed”,
“job”:“4754706633111121296”,“EventData.Task”:“Check if there are any super users defined.”,“EventData.TaskArgs”:”“,
“EventData.FailedTaskPath”:”/opt/ansible/roles/installer/tasks/initialize_django.yml:2",“error”:“[playbook task failed]”,“stacktrace”:
github.com/operator-framework/operator-sdk/internal/ansible/events.loggingEventHandler.Handle\n\t/workspace/internal/ansible/events/log_events.go:111”}
db/backends/postgresql/base.py", line 275, in get_new_connection", " connection = self.Database.connect(**conn_params)“, "
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/connection.py", line 728, in connect”, "
raise ex.with_traceback(None)", “django.db.utils.OperationalError: connection is bad: Name or service not known”],
“stdout”: “”, “stdout_lines”: }
…ignoring

Any suggestion?
seems everything is in RUNNING state …

Nick

I can see the following actions keep running from task and web

[wait-for-migrations] Waiting for database migrations…
[wait-for-migrations] Attempt 1 of 30

Nick

I cannot run any of awx-manage command from task nor web

always getting " django.db.utils.OperationalError: connection is bad: Name or service not known"

when I turn on no_log it showed both tasks
TASK [Check if there are any super users defined.] ********************************
fatal: [localhost]: FAILED! => {“changed”: true, “rc”: 1,
“django.db.utils.OperationalError: connection is bad: Name or service not known”],

TASK [Create super user via Django if it doesn’t exist.] ********************************
fatal: [localhost]: FAILED! => {“changed”: true, “failed_when_result”:
raise ex.with_traceback(None)", “django.db.utils.OperationalError: connection is bad: Name or service not known”], “stdout”: “”, “stdout_lines”: }

Nick

Hello,
If you are using an external database, could you please provide your secret Posgress configuration spec and your AWX spec? Please be sure to redact any sensitive information. This will better enable us to assist you.
-AWX Team

I am not using external database.

Here is my postgres secret

secretGenerator:

  • name: awx-postgres-configuration
    type: Opaque
    literals:
  • host=awx-postgres-13
  • port=5432
  • database=awx
  • username=awx
  • password=xxxxxxxxxxxxxxxxxxxxxxxxxxx
  • type=managed

what is the health of your postgres pod? Has it been restarting?

can you please output “kubectl get pods”

AWX Team

kubectl get pods -n awx-dev
NAME READY STATUS RESTARTS AGE
awx-operator-controller-manager-564f8dc4fc-69rk2 2/2 Running 0 8d
awx-postgres-13-0 1/1 Running 0 8d
awx-task-6948b766fb-bnk8m 4/4 Running 1144 (15m ago) 8d
awx-web-7c59966546-dgw7d 3/3 Running 1144 (15m ago) 8d

same error for basic install from AWX Operator 2.5.3

from log

kubectl -n awx logs deploy/ansible-awx-task -c ansible-awx-task
[wait-for-migrations] Waiting for database migrations…
[wait-for-migrations] Attempt 1 of 30
[wait-for-migrations] Waiting 0.5 seconds before next attempt
[wait-for-migrations] Attempt 2 of 30

got it resolved myself, it was problem on my setting on pod-network-cidr

Nickel