Re-import postgres database

Pablo_Ramos · September 15, 2022, 8:44pm

Hi everyone

I’ve recently upgrade my AWX Operator instance from 0.22.0 to 0.28.0. This included a switch from postgres 9 to postgres 13.
Now this migration wasn’t seamless, I had some problems with the new PV so I had to restart the pod and the migration process a few times. I think this might have caused some corruption on the database.

Some tasks seem to randomly fail with this error message:
Traceback (most recent call last): File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 82, in execute return self.cursor.execute(sql) psycopg2.errors.InternalError: unexpected data beyond EOF in block 166 of relation base/16384/2664 HINT: This has been seen to occur with buggy kernels; consider updating your system. The above exception was the direct cause of the following exception: Traceback (most recent call last): File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py”, line 481, in run self.pre_run_hook(self.instance, private_data_dir) File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py”, line 1287, in pre_run_hook super(RunProjectUpdate, self).pre_run_hook(instance, private_data_dir) File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py”, line 417, in pre_run_hook create_partition(instance.event_class._meta.db_table, start=instance.created) File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py”, line 1163, in create_partition cursor.execute( File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 75, in _execute_with_wrappers return executor(sql, params, many, context) File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 84, in _execute return self.cursor.execute(sql, params) File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py”, line 90, in exit raise dj_exc_value.with_traceback(traceback) from exc_value File “/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 82, in _execute return self.cursor.execute(sql) django.db.utils.InternalError: unexpected data beyond EOF in block 166 of relation base/16384/2664 HINT: This has been seen to occur with buggy kernels; consider updating your system.

My question is, is there some kind of check I can do on the database? Or is it possible to re-import the database again? I have backups from before the migration

AWX_Project · September 16, 2022, 5:21pm

You might attempt a restore from your 0.22.0 backup by creating an awx restore object, see this documentation https://github.com/ansible/awx-operator/blob/devel/roles/restore/README.md

we also have a tool that might help debug db connectivity issues “awx-manage check_db” that you may find useful

Do different tasks hit this error, or the same one (at random times)? Do you ever see a different stack trace when hitting an error?

AWX Team

Pablo_Ramos · September 16, 2022, 6:31pm

It’s always the same task that fails, which is a “Source Control Update” task. Then the actual task fails because it needs this update before running
The stack trace seems to be the same, the only thing that changes is the block number i.e: “block 173 of relation base/16384/2664”
“awx-manage check db” seems to return no errors. These errors don’t happen all the time, so is fair to say that connectivity with the DB works fine most of the time.

Thanks for the link to the Restore role, I wasn’t aware of it! I’ll try to do a restore in another namespace and check for errors

AWX_Project · September 21, 2022, 6:37pm

Hi!

Did running the restore role help resolve the issue? Also, it might be worth looking a the postgres pod logs around the time that the project update fails and see if anything stands out as problematic.

AWX Team

Pablo_Ramos · September 22, 2022, 1:05pm

We haven’t been able to run the restore role because we weren’t making backups using the backups role. We were only doing a pg_dumpall on the posgres pod.

We are running the storage on GlusterFS, which made me find this kb: https://access.redhat.com/solutions/3673761. We apply the recommended group in there but we’re still seeing the same error, just not as often. For now the workaround has been to disable inventory sync before each task.
Sadly we don’t have enough log retention to see the output of the pods during the migration

Topic		Replies	Views
Refresh postgres database? AWX Project awx	8	53	January 6, 2023
ATTENTION - do NOT upgrade to AWX operator to 2.13.0 Get Help awx	6	2128	April 17, 2024
awx migration 19.2.0 to 19.2.1 or higher AWX Project awx	1	1	September 20, 2021
Import AWX database to AWX-Operator AWX Project awx	7	5	August 6, 2021
Job was not working after migration postgres data from 17.1.0 to 19.0.0 AWX Project awx , ee	5	17	August 4, 2021

Re-import postgres database

Related topics