ATTENTION - do NOT upgrade to AWX operator to 2.13.0

We’ve discovered a bug that would cause you to lose your data.

If you already upgraded to 2.13.0 and didn’t make a backup - do NOT stop / restart your postgres 15 container.

The database data is living only in that container, and not in the mounted in volume. We’ll update this post with instructions for copying that data out to preserve it.

5 Likes

we’ve deleted the 2.13.0 quay tag so upgrading won’t be possible (even though 2.13.0 operator exists on operator hub)

1 Like

!!! instruction for user who have successfully updated to awx-operator v2.13.0 without a existing backup !!!

  1. !!! DO NOT RESTART awx-postgres-15-0 pod !!!
  2. take a backup with AWXBackup
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
  name: awxbackup-2-13-0
spec:
  deployment_name: awx
  1. upgrading from 2.13.0 to 2.13.1 (will be release today)
    when upgrade operator to 2.13.1 operator will fail to upgrade awx (don’t panic)

  2. wait for postgres to be restarted by the operator

  3. restore from backup with

apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
  name: awxrestore-2-13-0
spec:
  deployment_name: awx
  backup_name: awxbackup-2-13-0

after successful upgrade to 2.13.1 and restore from backup awx should be functional again

3 Likes

2.13.1 released Release 2.13.1 · ansible/awx-operator · GitHub

3 Likes

does this include deployments with an external managed postgres db? we run ours in RDS

External DB should be fine

2 Likes

Well I had a fun day today. I had problems with upgrading from 2.12.1 to 2.13.1, but not necessarily because of a bug, but more of an oversight.

I had set postgres_image: to the redhat sclorg image, which was still pinned to 13 after the operator upgrade. This led to the operator creating the new postgresql-15 statefulset using the postgresql-13 image, performing the pg_dump/pg_restore successfully and leaving me with a new operator on a new but old db.

Thankfully I had taking a volume snapshot of my PVC’s and did an AWX Backup (first time ever! lol).

In order to upgrade to postgresql-15 postpartum, I had to

  1. scale down the operator.
  2. restore the old awx-postgres-13
    a. restore the PVC from a snapshot (I might have deleted it myself during troubleshooting)
    b. restore the old awx-postgres-13 statefulset, which I just copied the new 15 one and replaced all “-15” with “-13”
    c. restore the awx-postgres-13 service similarly by copying and replacing the *-15 one
  3. reinitialize the postgresql-15 db
    a. scale down the statefuleset
    b. delete and recreate the PVC
    c. update the awx-postgres-15 statefulset (and the AWX CR) with the correct
    postgresql-15 image
    d. scale up the statefuleset
  4. open a terminal in the new awx-postgres-15-0 and migrate the data

Since my postgres settings were otherwise the default, I was able to migrate the data by creating a bash file in the new postgres container:

/var/lib/pgsql/data/migrate.sh

#!/bin/bash
pg_dump="pg_dump -h awx-postgres-13 -U awx -d awx -p 5432 -F custom"
pg_restore="pg_restore -U awx -d awx"

function end_keepalive {
  rc=$?
  rm -f "$1"
  kill $(cat /proc/$2/task/$2/children 2>/dev/null) 2>/dev/null || true
  wait $2 || true
  exit $rc
}
keepalive_file="$(mktemp)"
while [[ -f "$keepalive_file" ]]; do
  echo 'Migrating data to new PostgreSQL 15 Database...'
  sleep 60
done &
keepalive_pid=$!
trap 'end_keepalive "$keepalive_file" "$keepalive_pid"' EXIT SIGINT SIGTERM
echo keepalive_pid: $keepalive_pid
set -e -o pipefail
PGPASSWORD="$POSTGRES_PASSWORD" $pg_dump | PGPASSWORD="$POSTGRES_PASSWORD" $pg_restore
set +e +o pipefail
echo 'Successful'
cd /var/lib/pgsql/data
vi migrate.sh
chmod +x migrate.sh
./migrate.sh

After migrating the data, it was safe to remove the *-13 statefulset/service/pvc, and scale up the operator.


Now that I know better, for the next instance I still need to upgrade, I’m going to temporarily comment out the postgres_image parameters so that the operator use the defaults for the upgrade, and switch back to the redhat sclorg image afterwards. follow the steps I laid out above.

Edit: I tried to use the default postgres images, but forgot that the default docker postgres image is incompatible with the sclorg one that I’m using and ran into permission errors. So, I followed the steps I outlined above from the previous attempt and everything’s happy. :slight_smile: