Fresh install of AWX using AWX-Operator v0.19.0 on EKS using external RDS instance

I have an old AWX v6.1.0.0 deployment running on an EKS cluster with a PostgreSQL 11 RDS instance backing it that I’m attempting to migrate to using AWX-Operator v0.19.0 running AWX v20.0.1 with a PostgreSQL 12 RDS instance. This is basically a fresh install as far as AWX-Operator is concerned with database migration.

I’ve setup my awx.yaml as such:


---
apiVersion: v1
kind: Secret
metadata:
name: <Resource Name>-secret-key
namespace: <Namespace>
stringData:
secret_key: <old secret>
type: Opaque

---
apiVersion: v1
kind: Secret
metadata:
name: <Resource Name>-old-postgres-configuration
namespace: <Namespace>
stringData:
host: old-rds-identifer.region.rds.amazonaws.com
port: "5432"
database: awx
username: awx
password: <old password>
sslmode: prefer
type: unmanaged
type: Opaque

---
apiVersion: v1
kind: Secret
metadata:
name: <Resource Name>-postgres-configuration
namespace: <Namespace>
stringData:
host: new-rds-identifier.region.rds.amazonaws.com
port: "5432"
database: awx
username: awx
password: <new password>
sslmode: prefer
type: unmanaged
type: Opaque

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: <Resource Name>
namespace: <Namespace>
spec:
service_type: ClusterIP
ingress_type: ingress
hostname: awx.example.com
old_postgres_configuration_secret: <Resource Name>-old-postgres-configuration
postgres_configuration_secret: <Resource Name>-postgres-configuration
secret_key_secret: <Resource Name>-secret-key

If I issue the kubectl apply -f awx.yaml and then watch the awx-operator-controller-manager-* pod awx-manager container logs I can still see it trying to use the <resource name>-postgres-0 pod
for the database. This just keeps spitting out Cache miss log entries for this pod as it doesn’t exist.

Has anyone else managed 1) to get AWX-Operator 0.19.0 to use RDS without using the integrated PostgreSQL pod and 2) successfully migrated from an older database instance? Am I missing something or doing something wrong here?

Not sure if I’m 100% understanding but here’s my take.

Moving data from an old RDS to a new RDS is not something the operator will manage for you

Attempting to upgrade from v6.1.0.0 → is a big leap. I’d do incremental upgrades until you max out on what the postgres 11 will handle. Then do a pg dump and a pg restore onto your new RDS 12. Then switch to using that RDS and continue upgrading until you get to your desired version.

Hi,

I have got it working - first off, it might be worth checking that the -postgres-configuration secret in the cluster has actually updated to reflect the change from managed to non-managed - it sounds like maybe it has persisted the original deployment db config. I manually deleted that before I deployed my unmanaged config and it worked (ish) - outside of that I had some other issues that ended up being related to SSL (debugged by trying different values for sslmode).

Hope that is useful

I should mention mine did not involve any migration of old DB .

Thanks for the responses. A couple of things that may factor, is that the old AWX pod does not appear to have been deployed via the AWX-operator. I wasn’t the one that deployed it and just inherited the management of it. I’m attempting to move to the AWX-operator deployment while upgrading as I’d done a POC in a separate namespace with the operator deployed version but was not using an external RDS database in the POC and did just use the awx-postgres-0 pod. I am attempting to upgrade within the same namespace as the existing old version pod is deployed. It is deployed as a single StatefulSet with a pod containing awx-web, awx-celery, awx-rabbit and awx-memcached containers. It also only had the awx-secrets k8s secret so there was no previous -postgres-configuration secret. Using the fields from the awx-secrets secret I setup the -old-postgres-configuration and -secret-key secrets. I then deployed the new RDS instance and setup the -postgres-configuration.

I do understand 6.1.0.0 → 20.0.1 is a huge leap, this was deployed and not touched because it worked but right now is not working properly and upgrading to a new version was considered the right course of actions rather than trying to troubleshoot and fix a very old version. For several weeks now the schedules have not been executing properly, unsure if it was related but this seemed to start around the time the node group for the cluster was updated.

Making smaller incremental upgrades given that gap does make sense, would there be any recommendations as to which versions to attempt to upgrade to intermediately? As well from both comments seems to indicate it may be best to upgrade from external RDS → integrated Postgres pod → external RDS if i’m understanding correctly? Ideally we want to have it on the external RDS as it fits with backup and DR processes better than the integrated Postgres pod, but if it is thought that method would help facilitate the upgrade could be done short term.

TIA

Another option I wanted to propose was that you try using the export/import functionality of the awx cli https://docs.ansible.com/ansible-tower/latest/html/towercli/examples.html#import-export. You could try this with your 6.1.0 instance as is or upgrade it to something a bit more recent, but still pre-operator.

Then attempt to import the data into the new/fresh 20.0.1 with the new RDS.

You will lose things like past jobs and events, but I presume your primary concern is transferring things like projects, job templates, etc.

Thanks for the responses. A couple of things that may factor, is that the old AWX pod does not appear to have been deployed via the AWX-operator. I wasn’t the one that deployed it and just inherited the management of it. I’m attempting to move to the AWX-operator deployment while upgrading as I’d done a POC in a separate namespace with the operator deployed version but was not using an external RDS database in the POC and did just use the awx-postgres-0 pod. I am attempting to upgrade within the same namespace as the existing old version pod is deployed. It is deployed as a single StatefulSet with a pod containing awx-web, awx-celery, awx-rabbit and awx-memcached containers. It also only had the awx-secrets k8s secret so there was no previous -postgres-configuration secret. Using the fields from the awx-secrets secret I setup the -old-postgres-configuration and -secret-key secrets. I then deployed the new RDS instance and setup the -postgres-configuration.

I do understand 6.1.0.0 → 20.0.1 is a huge leap, this was deployed and not touched because it worked but right now is not working properly and upgrading to a new version was considered the right course of actions rather than trying to troubleshoot and fix a very old version. For several weeks now the schedules have not been executing properly, unsure if it was related but this seemed to start around the time the node group for the cluster was updated.

Making smaller incremental upgrades given that gap does make sense, would there be any recommendations as to which versions to attempt to upgrade to intermediately? As well from both comments seems to indicate it may be best to upgrade from external RDS → integrated Postgres pod → external RDS if i’m understanding correctly? Ideally we want to have it on the external RDS as it fits with backup and DR processes better than the integrated Postgres pod, but if it is thought that method would help facilitate the upgrade could be done short term.

I’m not advocating for going to the integrated Postgres pod. What I was trying to say is that you’ll need to manage upgrading the RDS on your own. I’ve just never personally done a major version upgrade of RDS so maybe was sending mixed messages about how one actually accomplishes that. Looks like you can do it in-situ, they just can’t guarantee you your data will be compatible https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html#USER_UpgradeDBInstance.PostgreSQL.MajorVersion

I had taken a look at the roles/installer/tasks/migrate_data.yml and it did look to me that there was no chance of it properly handling the pg_restore into an external DB as it only provided the username & database. I’m attempting to do this upgrade in parallel and leaving the old AWX deployment and RDS as is and deploy out a new AWX instance with new RDS. There is a desire to maintain all the data of past jobs, etc but if it comes down to it I may be able to get approval to lose it. Definitely trying to keep all existing projects, templates, credentials, etc in tact and not have to reconfigure everything from scratch.

I’ve attempted to perform the pg_dump/pg_restore between the old and new RDS instances but it seemed to get held up at the end executing the CREATE INDEX main_jobevent_job_id_uuid_3df694c5_idx ON public.main_jobevent USING btree (job_id, uuid); statement. Kept saying skipping analyze of "main_jobevent" --- lock not available but checking pg_locks didn’t indicate any locks against the table. My thought was to be able to get the data into the RDS instance then just simply define the -postgres-configuration secret to point to it and let AWX perform the application DB migrations necessary. At this point I’m uncertain if what I have is safe to proceed or not given the failed pg_restore attempt. I’m restoring the most recent old RDS snapshot and going to attempt to do the upgrade from 11.13 → 12.10 then do the pg_dump/pg_restore from this restore RDS instance to the new RDS instance which should be 12.10 → 12.10.

Handling the RDS migration then only setting the -postgres-configuration secret to point to the new RDS instance seems to have worked. It took several hours for the database migration to complete because there was quite a lot of job history data so the indexing took quite some time to complete.

Now I’m just dealing with getting the existing Ansible roles we have to run properly since they’ve been running on the old 6.1.0.0 AWX system with Ansible 2.8 for so long.

awesome