ATTENTION - do NOT upgrade to AWX operator to 2.13.0

Well I had a fun day today. I had problems with upgrading from 2.12.1 to 2.13.1, but not necessarily because of a bug, but more of an oversight.

I had set postgres_image: to the redhat sclorg image, which was still pinned to 13 after the operator upgrade. This led to the operator creating the new postgresql-15 statefulset using the postgresql-13 image, performing the pg_dump/pg_restore successfully and leaving me with a new operator on a new but old db.

Thankfully I had taking a volume snapshot of my PVC’s and did an AWX Backup (first time ever! lol).

In order to upgrade to postgresql-15 postpartum, I had to

  1. scale down the operator.
  2. restore the old awx-postgres-13
    a. restore the PVC from a snapshot (I might have deleted it myself during troubleshooting)
    b. restore the old awx-postgres-13 statefulset, which I just copied the new 15 one and replaced all “-15” with “-13”
    c. restore the awx-postgres-13 service similarly by copying and replacing the *-15 one
  3. reinitialize the postgresql-15 db
    a. scale down the statefuleset
    b. delete and recreate the PVC
    c. update the awx-postgres-15 statefulset (and the AWX CR) with the correct
    postgresql-15 image
    d. scale up the statefuleset
  4. open a terminal in the new awx-postgres-15-0 and migrate the data

Since my postgres settings were otherwise the default, I was able to migrate the data by creating a bash file in the new postgres container:

/var/lib/pgsql/data/migrate.sh

#!/bin/bash
pg_dump="pg_dump -h awx-postgres-13 -U awx -d awx -p 5432 -F custom"
pg_restore="pg_restore -U awx -d awx"

function end_keepalive {
  rc=$?
  rm -f "$1"
  kill $(cat /proc/$2/task/$2/children 2>/dev/null) 2>/dev/null || true
  wait $2 || true
  exit $rc
}
keepalive_file="$(mktemp)"
while [[ -f "$keepalive_file" ]]; do
  echo 'Migrating data to new PostgreSQL 15 Database...'
  sleep 60
done &
keepalive_pid=$!
trap 'end_keepalive "$keepalive_file" "$keepalive_pid"' EXIT SIGINT SIGTERM
echo keepalive_pid: $keepalive_pid
set -e -o pipefail
PGPASSWORD="$POSTGRES_PASSWORD" $pg_dump | PGPASSWORD="$POSTGRES_PASSWORD" $pg_restore
set +e +o pipefail
echo 'Successful'
cd /var/lib/pgsql/data
vi migrate.sh
chmod +x migrate.sh
./migrate.sh

After migrating the data, it was safe to remove the *-13 statefulset/service/pvc, and scale up the operator.


Now that I know better, for the next instance I still need to upgrade, I’m going to temporarily comment out the postgres_image parameters so that the operator use the defaults for the upgrade, and switch back to the redhat sclorg image afterwards. follow the steps I laid out above.

Edit: I tried to use the default postgres images, but forgot that the default docker postgres image is incompatible with the sclorg one that I’m using and ran into permission errors. So, I followed the steps I outlined above from the previous attempt and everything’s happy. :slight_smile:

1 Like