AWX postgres container stuck in Restarting status

Hi!

I installed AWX a couple of weeks back, and have been using it without issues since then. Yesterday I got an “Internal Server Error” from the GUI, and then I noticed that the postgres container had crashed, and is stuck in “Restarting” status.

The only output from “docker logs postgres” is a repeating:

initdb: directory “/var/lib/postgresql/data” exists but is not empty
If you want to create a new database system, either remove or empty
the directory “/var/lib/postgresql/data” or run initdb
with an argument other than “/var/lib/postgresql/data”.
initdb: directory “/var/lib/postgresql/data” exists but is not empty
If you want to create a new database system, either remove or empty
the directory “/var/lib/postgresql/data” or run initdb
with an argument other than “/var/lib/postgresql/data”.
The files belonging to this database system will be owned by user “postgres”.
This user must also own the server process.
The database cluster will be initialized with locale “en_US.utf8”.
The default database encoding has accordingly been set to “UTF8”.
The default text search configuration will be set to “english”.
Data page checksums are disabled.

In troubleshooting, I upgraded the awx from github and re-ran the installer, but the issue persists.

I also tried moving /tmp/pgdocker out of the way, and then rerunning the installer to see if it would properly generate an emtpy DB and run from that. That works, but this means I’m starting from scratch. And there’s nothing saying this won’t happen again, so I’d rather not start over before pin pointing the issue.

Any ideas on how I could troubleshoot this further?

Thanks!

We’ve had this reported a couple of times and I have yet to reproduce it locally (though I have no doubt of its severity). I need to figure out why the postgres container is insisting on re-running its init utility and then failing to start up when data already exists there.

Hi,

FYI I had the same issue when I updated my environemt and I had to delete the /tmp/pgdocker folder to get it to work.
Since it was a lab environment I didn’t look into the issue further…

This is repeatedly happening to our dev instances. They will be working fine, and then all of sudden you will see “A server error has occurred” when trying to access the UI. Every time it has been that the postgres container is stuck in “restarting” status.

Any troubleshooting advice? I have been unsuccessful in recovering the container once it gets in this state. As mentioned above, deleting /tmp/pgdocker means AWX re-initializes and you start from scratch. We’ve been having to roll back to stable snapshots each time this happens.

Thanks,
Stephen

+1

I ended up rebuilding my DEV on OpenShift rather than docker to resolve this because I couldn’t figure out the issue on docker.

–Tony

I’m going to see about wrapping this up today, here’s the issue: https://github.com/ansible/awx/issues/438

I did a full install after issue 438 was closed and merged, but I am still seeing this issue.

`

[root@awx-staging /]# docker ps |grep postgres

eadae4f62e7a postgres:9.6 “docker-entrypoint.sh” 2 weeksnago Restarting (1) 4 minutes ago 5432/tcp postgres

`

`

[root@awx-staging /]# docker logs postgres

initdb: directory “/var/lib/postgresql/data/pgdata” exists but is not empty

If you want to create a new database system, either remove or empty

the directory “/var/lib/postgresql/data/pgdata” or run initdb

with an argument other than “/var/lib/postgresql/data/pgdata”.

The files belonging to this database system will be owned by user “postgres”.

This user must also own the server process.

The database cluster will be initialized with locale “en_US.utf8”.

The default database encoding has accordingly been set to “UTF8”.

The default text search configuration will be set to “english”.

Data page checksums are disabled.

`

Any suggestions? Thanks!
-Stephen

You’ll want to make sure you clear the pg container, the images, and probably clear the postgres data directory.

Sorry, my last message probably wasn’t very clear.

I started with a clean Centos7 install and deployed AWX after the 438 issue was closed. Previously, this issue would surface within a week - the UI would show “A server error has occurred” and upon investigation the postgres container would be stuck restarting. This most recent install lasted over 2 weeks, but the same issue is back. Clearing the pg container, data, etc obviously clears all data and settings which is quite obnoxious.

Anyone else having this issue still?

Thanks,
Stephen

You’ll continue to have the problem if you are using your existing pg container and data. I left the outline for a migration path in the PR linked for that issue closure.