I currently have AWX 23.8.1 running using an external Postgres (with Crunchy operator) based on the v14.10 image, and I would like to migrate/clone to a new instance using another external Postgres (CNPG operator): since I want to test this I need to keep the existing AWX Crunchy (NS = awx) running while I test the new AWX CPNG (NS = awx2)…
I have tried to bootstrap the CNPG cluster with the live DB (which is based on pgdump and pg_restore) but it failed so far… so now I’m looking at the migration guide to achieve the same (I also considered awxbackup/awxrestore but feel like it’s even more complicated).
My questions:
first, is my idea possible?
I see we can specify secret_key and old_postgres_conf in the new AWX deployment but what if they are in a different namespace?
is it ok to go from Crunchy 14.0 to CNPG 16.3 from the AWX standpoint?
does the migration include all secrets for the mesh and execution nodes, so that the new AWX (CNPG) will reach out to the existing instances?
I apologize for all the questions but despite all my tentatives (from postgres and AWX sides) I haven’t been able to do what I’m trying to do and it’s becoming frustrating
Using the migration, can I clone an existing deployment (based on external crunchy postgres) into a new one based on external CNPG?
I would use Crunchy for old_postgres_conf and CNPG for postgres_config, the old secret_key and broadcast_key, and it would stand a new instance with the same secrets?
I tried cloning the DB and creating a new instance with the existing secrets but the deployment keeps failing after multiple DB migration attempts.
We have a complex topology of execution nodes and was wondering if they would stay locked to the old/running instance in case I stand a new/clone?
I was hoping to migrate them manually from one instance to the other when the clone is confirmed working…
IIRC, the migration does a pgdump > pgrestore from one active database connection to the other. Going from one external postgres to another should be fine. Cloning the DB can be problematic if there are technical differences in the postgres server reading the DB vs the one that created it.
Your execution nodes will stay with the old instance. In your new instance, you may not need to “create” your nodes, but you will need to download the install bundles again and reprovision the nodes to connect to the new instance.
Could you confirm that with the migration option both instances can run in parallel (in different NS)?
One more (last?) question: if the old secret and old-postgres-config are in another namespace (awx) and I want to deploy in a new namespace (awx2), when I create the AWX object for the migration and apply it to the “awx2” NS (where awx-operator runs as well, because I read it’s namespaced), will it get the secrets (key, broadcast and old-postgres) from the old “awx” NS?
No, I don’t think it works accross namespaces. What you can do is copy the secrets from the old namespace to the new namespace (same thing you would do if you were migrating from one cluster to another).
I think that should be fine. AWX recently upgraded from the docker postgres 13 image to the sclorg postgresql 15 image, and the upgrade process uses the same pgdump | pgrestore method.
I have an issue with my deployment and the operator is giving me the following error:
The task includes an option with an undefined variable.
The error was: list object has no element 0. list object has no element 0
The error appears to be in '/opt/ansible/roles/installer/tasks/migrate_data.yml': line 31, column 3, but may be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be: - name: Set the resource pod name as a variable.
Sounds like it can’t find the old postgres pod (unmanaged) but it’s the same config as the running prod AWX (pg-cluster-primary.postgres.svc)…?
I have checked the old-postgres-configuration secret and it’s the same as the one in prod (which I copied), same for all predefined secrets which were copied from the running instance.
Here’s my AWX object, all secrets are created as part of the manifest prior to AWX (copied from other namespace):
Now I’m looking at it, the only thing I think of could be the operator version since I run v2.12.1 (AWX 23.8.1) in prod and the new one (for the migration) is running the latest operator v2.17.0.
Could this be the issue?
edit: just tried with operator v2.12.1 and I get the same error as before
Apparently, the migrations assume that the old config secret must refer to a managed instance, even though specifying a current config secret (without anything else) implies an unmanaged instance. I apologize, I didn’t think it worked like that.
What I think you can do instead, is migrate the data yourself directly from crunchy to cnpg, and then spin up a new awx-cnpg instance with only the secrets it needs for the cnpg database (i.e. leave out old_postgres_configuration_secret). I have a detailed post relating to migrating data between two pods for a botched upgrade. You’ll have to fiddle with the targeted hosts, username, and passwords, but I think it should work. I had the luxury of being able to use the environment variables and re-using the same credentials between hosts.
Sorry for the late reply but it took me time to get back to this.
So what I did is I built a playbook to backup/restore data from/to external postgres (using pg_dump and pg_restore) as well as all secrets, configmaps and the AWX CRD: heavily inspired from the existing roles but it works with external postgres this time.
I can then restore the DB automatically and manually recreate my AWX CRD using the secrets and config backups to redeploy AWX as a clone! (I ended up using only the secret-key and admin-password for my new instance).