Well I had a fun day today. I had problems with upgrading from 2.12.1 to 2.13.1, but not necessarily because of a bug, but more of an oversight.
I had set postgres_image:
to the redhat sclorg image, which was still pinned to 13 after the operator upgrade. This led to the operator creating the new postgresql-15 statefulset using the postgresql-13 image, performing the pg_dump/pg_restore successfully and leaving me with a new operator on a new but old db.
Thankfully I had taking a volume snapshot of my PVC’s and did an AWX Backup (first time ever! lol).
In order to upgrade to postgresql-15 postpartum, I had to
- scale down the operator.
- restore the old awx-postgres-13
a. restore the PVC from a snapshot (I might have deleted it myself during troubleshooting)
b. restore the old awx-postgres-13 statefulset, which I just copied the new 15 one and replaced all “-15” with “-13”
c. restore the awx-postgres-13 service similarly by copying and replacing the *-15 one - reinitialize the postgresql-15 db
a. scale down the statefuleset
b. delete and recreate the PVC
c. update the awx-postgres-15 statefulset (and the AWX CR) with the correct
postgresql-15 image
d. scale up the statefuleset - open a terminal in the new awx-postgres-15-0 and migrate the data
Since my postgres settings were otherwise the default, I was able to migrate the data by creating a bash file in the new postgres container:
/var/lib/pgsql/data/migrate.sh
#!/bin/bash
pg_dump="pg_dump -h awx-postgres-13 -U awx -d awx -p 5432 -F custom"
pg_restore="pg_restore -U awx -d awx"
function end_keepalive {
rc=$?
rm -f "$1"
kill $(cat /proc/$2/task/$2/children 2>/dev/null) 2>/dev/null || true
wait $2 || true
exit $rc
}
keepalive_file="$(mktemp)"
while [[ -f "$keepalive_file" ]]; do
echo 'Migrating data to new PostgreSQL 15 Database...'
sleep 60
done &
keepalive_pid=$!
trap 'end_keepalive "$keepalive_file" "$keepalive_pid"' EXIT SIGINT SIGTERM
echo keepalive_pid: $keepalive_pid
set -e -o pipefail
PGPASSWORD="$POSTGRES_PASSWORD" $pg_dump | PGPASSWORD="$POSTGRES_PASSWORD" $pg_restore
set +e +o pipefail
echo 'Successful'
cd /var/lib/pgsql/data
vi migrate.sh
chmod +x migrate.sh
./migrate.sh
After migrating the data, it was safe to remove the *-13 statefulset/service/pvc, and scale up the operator.
Now that I know better, for the next instance I still need to upgrade, I’m going to temporarily comment out the follow the steps I laid out above.postgres_image
parameters so that the operator use the defaults for the upgrade, and switch back to the redhat sclorg image afterwards.
Edit: I tried to use the default postgres images, but forgot that the default docker postgres image is incompatible with the sclorg one that I’m using and ran into permission errors. So, I followed the steps I outlined above from the previous attempt and everything’s happy.