AWX upgrade stuck

motorbass · July 5, 2024, 9:20am

Hi
As planned in my previous post, I attempted to upgrade my AWX setup using Helm : awx-operator from 2.9.0 to 2.19.0 and so awx from 23.5.1 from to 24.6.0.

Note: i’m a K8s beginner so my question here could be probably dumb.

1st attempt
Upgrade using helm upgrade my-awx-operator awx-operator/awx-operator
I had exactly this issue about postgre15 that cannot create directory but I wasn’t able to fix it by myself. So i decided to first make a Helm rollback, then upgrade to a lower version than 2.19.0.

2nd attempt
I rollback to 2.9.0 using helm rollback my-awx-operator as 2.9.0 were the previous release.

My current issue is that awx-xxx-web pod fail because it seems it doesn’t know how to connect to the Postgre DB

...brutally killing workers...
2024-07-05 09:12:35,115 INFO stopped: nginx (exit status 0)
2024-07-05 09:12:35,115 INFO stopped: nginx (exit status 0)
2024-07-05 09:12:35,119 WARNING  [-] awx.conf.settings Database settings are not available, using defaults. error: connection is bad: Name or service not known
2024-07-05 09:12:35,119 WARNING  Database settings are not available, using defaults. error: connection is bad: Name or service not known
2024-07-05 09:12:35,107 INFO     [-] daphne.server Killed 0 pending application instances
2024-07-05 09:12:35,107 INFO     Killed 0 pending application instances
2024-07-05 09:12:35,638 INFO stopped: daphne (exit status 0)
2024-07-05 09:12:35,638 INFO stopped: daphne (exit status 0)
worker 1 buried after 1 seconds
worker 2 buried after 1 seconds
worker 3 buried after 1 seconds
worker 4 buried after 1 seconds
worker 5 buried after 1 seconds
binary reloading uWSGI...

At the moment, the original postgres-13-0 pod is still UP with its data (i’m able to connect using Dbeaver)
Kubernetes secrets (postgres-configuration) is also still present with good information about credentials.

It seems like the web pod isn’t aware on how to get these informations, but i’m a bit stuck to my limited K8s knowledges…

Is there any advices or something to look at to be sure my web pod is well configured?

Best regards

Gael

kurokobo · July 5, 2024, 12:53pm

Have the hostnames included in postgres-configuration been rolled back to point to PSQL 13? Please check if they have been reverted to 13 after being changed to 15 during the upgrade.

motorbass · July 5, 2024, 1:09pm

Hi
hostname was awx-infra-postgres-15 (base 64 encoded), so i just update this value in awx-infra-postgres-13 (base64 encoded).
Also manually set annotations from awx-infra-postgres-15 to awx-infra-postgres-13.
Finally it cames back online and working ! huge thanks !

One more question, I saw on Github there’s issues/PR about the upgrade from psql13 to psql15, it seems to look likes UID 26 should have permissions to write on PV right ?

In my case, should I launch again the upgrade then temporarily mount PV to change those permissions or do we have a workaround ?

kurokobo · July 5, 2024, 1:53pm

Correct.

Add postgres_data_volume_init: true to your AWX’s spec. This will fix the permissions on the directory in the PV automatically.

motorbass · July 5, 2024, 1:56pm

Ok, i’ll try to find where i can add that inside Helm chart, and let you know what happened fingercrossed !

motorbass · July 5, 2024, 3:11pm

Just gave a try and got the same error.

I used to install or upgrade AWX without custom values but using this command (the one attempt that fails)

 helm upgrade my-awx-operator awx-operator/awx-operator --namespace ppr-awx --version 2.19.1

I simply add postgres_data_volume_init: true to AWX.spec in the following values.yml :

AWX:

  # enable use of awx-deploy template
  enabled: true
  name: awx
  spec:
    admin_user: admin
    postgres_data_volume_init: true

  # configurations for external postgres instance
  postgres:
    enabled: false
    host: Unset
    port: 5678
    dbName: Unset
    username: admin
    # for secret management, pass in the password independently of this file
    # at the command line, use --set AWX.postgres.password
    password: Unset
    sslmode: prefer
    type: unmanaged

Then I upgrade using :

helm upgrade my-awx-operator awx-operator/awx-operator --namespace ppr-awx --version 2.19.1 -f .\values.yaml

I guess i set it in the wrong place ?

EDIT :
Also tried to add the following as seen here

postgres_init_container_commands: |
  chown 26:0 /var/lib/pgsql/data
  chmod 700 /var/lib/pgsql/data

But same error mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

motorbass · July 9, 2024, 2:04pm

@kurokobo as your advice wasn’t working in my case, I finally found a workaround as mentioned in GitHub here and here.

Create a temporary pod
Map the existing postegre-15 PV to that pod
change directory permissions to

chown 26:0 data/
chmod 700 data/

kill my temporary pod then kill the postgre-15 pod that fails
pod regenerates by itself then it starts perform migration. I had some issues on logs that need to be fixed with kubectl apply --server-side -k "github.com/ansible/awx-operator/config/crd?ref=2.19.1" as mentionned in the same issue 1907 in the beginning of this post.

Conclusion : everything went very fast, DB migration from PGSQL13 to PGSQL15 seems to run smoothly.
I’ll only delete PV/PVC related to the old PGSQL13 and i’m 100% done.

Thanks again for your help ! (here but also all times I read you in GitHub )

kurokobo · July 10, 2024, 3:00pm

@motorbass
Sorry for the delayed response.

If postgres_data_volume_init is set to true , then the chmod and chown commands you manually executed should have been performed automatically, so I’m not sure why that didn’t resolve the issue.

However, even if you add parameters and perform a helm upgrade, it might take some time for the AWX Operator to actually start a new reconciliation loop using those new parameters.
Also, depending on the failed tasks, there might be cases where condition checks are not correctly performed in the next loop, and that could be the reason why the parameters were not properly applied.

In any case, I’m glad it got resolved! Thank you for updating this topic!

motorbass · July 10, 2024, 3:22pm

Yeah I do really think it should resolve as it seems to resolves all related issues according to many Github issues.
Anyway, I succeed to perform the migration, even if it wasn’t 100% automated result it here !
Thanks again a lot for your precious help & advices !

system · August 9, 2024, 3:23pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Stuck on installing update to AWX using operator 2.19.1 Get Help awx	5	1223	January 13, 2025
ATTENTION - do NOT upgrade to AWX operator to 2.13.0 Get Help awx	6	2135	April 17, 2024
AWX Openshift pods not starting Get Help awx , openshift , community-wg , community-general	6	93	April 3, 2025
Upgrade with awx-operator 1.1.2 to 1.1.3 (or newer) fails AWX Project awx	10	27	March 7, 2023
Can't install awx-operator 2.19.1 in k8s Get Help awx , awx-operator , kubernetes	3	131	December 21, 2024

AWX upgrade stuck

Related topics