AWX Postgres Ansible Setup Error

I am setting up AWX in a separate environment from our production one. I have setup the new K8s cluster and copied over the config values from AWX in production and changed hostnames and private repos as needed.

When trying to bring up AWX I can see my Operator Controller Manager launches as expected. Then the Postgres pod tries to launch but never finishes. kubectl get po -n awx shows a status of pending.

When reviewing logs from the controller manager kubectl logs -f deploy/awx-controller-manager -n awx I see the following error message: Kind=PodList err-Index with name field:status.phase does not exist

What does this error mean? I have been struggling with this for a while. When I built AWX in our production environment I didn’t encounter this issue. I am starting to believe something is different between the two environments but I am not quite sure what that is. Your guidance is appreciated. Thanks!

Hi @jeremytourville,

Could you give some more details on your deployment method (K8s YAML, Helm, etc…) and your K8s cluster setup (KIND, K3s, OpenShift, bare metal, cloud, etc…)? Also, do you have any overrides in the config that you copied over from your production cluster?

Long running GH issue that seems to be related: After upgrade “Kind=PodList err-Index with name field:status.phase does not exist” · Issue #1022 · ansible/awx-operator

Best regards,

Joe

After researching more and figuring out what to search for…

In Kubernetes, the status.phase field within a Pod’s status is a high-level summary of the Pod’s lifecycle state. It indicates whether the Pod is Pending, Running, Succeeded, Failed, or Unknown. The status.phase is not a comprehensive rollup of all container states, but rather a simplified overview.

It sounds like the Ansible job is unable to determine a status from gathered facts. Now, I am trying to see what info is actually captured in the facts.

This is how I am running my install:

kubectl apply -f storage.yaml
helm install -n awx /root/awx-operator/ -f awxvalues.yaml --generate-name

Here is info about my cluster and helm:

[root@gdev-kube01 awx-operator]# helm version
version.BuildInfo{Version:"v3.14.0", GitCommit:"3fc9f4b2638e76f26739cd77c7017139be81d0ea", GitTreeState:"clean", GoVersion:"go1.21.5"}

[root@gdev-kube01 ~]# kubectl cluster-info
Kubernetes control plane is running at https://x.x.8.37:6443
CoreDNS is running at https://x.x.8.37:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

[root@gdev-kube01 ~]# kubectl get nodes --show-labels
NAME                    STATUS   ROLES           AGE   VERSION    LABELS
gdev-kube01.gdev.org   Ready    control-plane   18d   v1.28.15   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=gdev-kube01.gdev.org,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
gdev-kube02.gdev.org   Ready    <none>          18d   v1.28.15   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=gdev-kube02.gdev.org,kubernetes.io/os=linux,node-for=psql
gdev-kube03.gdev.org   Ready    <none>          18d   v1.28.15   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=gdev-kube03.gdev.org,kubernetes.io/os=linux

[root@gdev-kube01 awx-operator]# cat Chart.yaml
apiVersion: v2
appVersion: 2.11.0
description: A Helm chart for the AWX Operator
name: awx-operator
type: application
version: 2.11.0

[root@gdev-kube01 ~]# cat storage.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageClass.kubernetes.io/is-default-class: "true"
  name: local-storage
  namespace: awx
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
#volumeBindingMode: WaitForFirstConsumer


---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-pv
  namespace: awx
spec:
  capacity:
    storage: 2Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteMany
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /var/lib/postgresql/data
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - gdev-kube01.gdev.org
          - gdev-kube02.gdev.org
          - gdev-kube03.gdev.org

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-13-awx-postgres-13-0
  namespace: awx
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 2Gi

[root@gdev-kube01 awx-operator]# kubectl get sc,pv
NAME                                        PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/local-storage   kubernetes.io/no-provisioner   Delete          Immediate           false                  3d21h

NAME                           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS    REASON   AGE
persistentvolume/postgres-pv   2Gi        RWX            Delete           Bound    awx/postgres-13-awx-postgres-13-0   local-storage            3d21h

[root@gdev-kube01 awx-operator]# kubectl get pvc -n awx
NAME                            STATUS   VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS    AGE
postgres-13-awx-postgres-13-0   Bound    postgres-pv   2Gi        RWX            local-storage   3d21h


[root@gdev-kube01 awx-operator]# kubectl describe pvc/postgres-13-awx-postgres-13-0 -n awx
Name:          postgres-13-awx-postgres-13-0
Namespace:     awx
StorageClass:  local-storage
Status:        Bound
Volume:        postgres-pv
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      2Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       awx-postgres-13-0
Events:        <none>

[root@gdev-kube01 awx-operator]# kubectl describe pv/postgres-pv
Name:              postgres-pv
Labels:            <none>
Annotations:       pv.kubernetes.io/bound-by-controller: yes
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      local-storage
Status:            Bound
Claim:             awx/postgres-13-awx-postgres-13-0
Reclaim Policy:    Delete
Access Modes:      RWX
VolumeMode:        Filesystem
Capacity:          2Gi
Node Affinity:
  Required Terms:
    Term 0:        kubernetes.io/hostname in [gdev-kube01.gdev.org, gdev-kube02.gdev.org, gdev-kube03.gdev.org]
Message:
Source:
    Type:  LocalVolume (a persistent volume backed by local storage on a node)
    Path:  /var/lib/postgresql/data
Events:    <none>


[root@gdev-kube01 ~]# cat awxvalues.yaml
AWX:
  # enable use of awx-deploy template
  enabled: true
  name: awx
  spec:
    replicas: 2
    service_type: NodePort
    nodeport_port: 30080
    admin_user: admin
    hostname: awx.gdev.org
    image: gdev-podman1.gdev.org:5001/quay.io/ansible/awx
    image_version: 23.7.0
    init_container_image: gdev-podman1.gdev.org:5001/quay.io/ansible/awx-ee
    init_container_image_version: latest
    ee_images:
    - name: AWX EE
      image: gdev-podman1.gdev.org:5001/quay.io/ansible/awx-ee:23.7.0
    ee_extra_env: |
      - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
        value: enabled
    postgres_image: gdev-podman1.gdev.org:5001/postgres
    postgres_image_version: "13"
    postgres_selector: |
      nodefor: psql
    control_plane_ee_image: gdev-podman1.gdev.org:5001/quay.io/ansible/awx-ee:23.7.0
    redis_image: gdev-podman1.gdev.org:5001/redis
    redis_image_version: "7"
customVolumes:
  postgres:
    enabled: true
    hostPath: /var/lib/postgresql
    size: 2Gi
    storageClassName: local-storage
  projects:
    enabled: true
    hostPath: /opt/projects/data
    size: 5Gi



[root@gdev-kube01 awx-operator]# kubectl get all -A
NAMESPACE      NAME                                                  READY   STATUS    RESTARTS      AGE
awx            pod/awx-operator-controller-manager-d69fb7c58-lp4c8   2/2     Running   0             3d21h
awx            pod/awx-postgres-13-0                                 0/1     Pending   0             3d17h
kube-flannel   pod/kube-flannel-ds-72gx9                             1/1     Running   0             18d
kube-flannel   pod/kube-flannel-ds-qhn29                             1/1     Running   0             18d
kube-flannel   pod/kube-flannel-ds-v4zpm                             1/1     Running   0             18d
kube-system    pod/coredns-7ddf588b8-fsjpd                           1/1     Running   0             18d
kube-system    pod/coredns-7ddf588b8-pvtft                           1/1     Running   0             18d
kube-system    pod/etcd-gdev-kube01.gdev.org                        1/1     Running   1             18d
kube-system    pod/kube-apiserver-gdev-kube01.gdev.org              1/1     Running   1             18d
kube-system    pod/kube-controller-manager-gdev-kube01.gdev.org     1/1     Running   2 (11d ago)   18d
kube-system    pod/kube-proxy-4tf5z                                  1/1     Running   0             18d
kube-system    pod/kube-proxy-594p9                                  1/1     Running   0             18d
kube-system    pod/kube-proxy-85n5f                                  1/1     Running   0             18d
kube-system    pod/kube-scheduler-gdev-kube01.gdev.org              1/1     Running   2 (11d ago)   18d

NAMESPACE     NAME                                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
awx           service/awx-operator-controller-manager-metrics-service   ClusterIP   x,x.173.84   <none>        8443/TCP                 3d21h
awx           service/awx-postgres-13                                   ClusterIP   None           <none>        5432/TCP                 3d21h
default       service/kubernetes                                        ClusterIP   x,x.0.1      <none>        443/TCP                  18d
kube-system   service/kube-dns                                          ClusterIP   x,x.0.10     <none>        53/UDP,53/TCP,9153/TCP   18d

NAMESPACE      NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   daemonset.apps/kube-flannel-ds   3         3         3       3            3           <none>                   18d
kube-system    daemonset.apps/kube-proxy        3         3         3       3            3           kubernetes.io/os=linux   18d

NAMESPACE     NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
awx           deployment.apps/awx-operator-controller-manager   1/1     1            1           3d21h
kube-system   deployment.apps/coredns                           2/2     2            2           18d

NAMESPACE     NAME                                                        DESIRED   CURRENT   READY   AGE
awx           replicaset.apps/awx-operator-controller-manager-d69fb7c58   1         1         1       3d21h
kube-system   replicaset.apps/coredns-7ddf588b8                           2         2         2       18d

NAMESPACE   NAME                               READY   AGE
awx         statefulset.apps/awx-postgres-13   0/1     3d21h

Here is what I am seeing in the log:

[root@gdev-kube01 ~]# kubectl logs -f deploy/awx-operator-controller-manager -n awx

TASK [installer : Get the postgres pod information] ****************************
task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:196

-------------------------------------------------------------------------------
{"level":"info","ts":"2025-04-21T14:29:53Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"5421913118183412794","EventData.Name":"installer : Get the postgres pod information"}
{"level":"info","ts":"2025-04-21T14:29:53Z","logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":"2025-04-21T14:29:53Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"5421913118183412794","EventData.Name":"installer : Wait for Database to initialize if managed DB"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Wait for Database to initialize if managed DB] ***************
task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:206

-------------------------------------------------------------------------------
{"level":"info","ts":"2025-04-21T14:29:54Z","logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}

The operator switched from docker.io/library/postgres:13 which runs as root ( 0 ) to quay.io/sclorg/postgresql-15-c9s which runs as postgres ( 26 ). You could be observing that the postgres user cannot write to the data volume which could be confirmed by exec’ing into the pod.

Acknowledged, this all depends on which version of the operator you run in your environment. I am still running a little older version which uses root (0).

Again, I acknowledge this should be correct but the pod never gets to a running state, therefore I can’t exec into the pod.

I believe the main point of your post is to say check the permissions on the folder structure. I have done that- I duplicated the folder structure from my production environment and made sure permissions were set the same at each folder level.

Here is what I am observing in my production environment which is working. My folder structure is as follows:

700 root:root  /var/lib/postgresql
755 root:root  /var/lib/postgresql/data                                                                                   
755 root:root /var/lib/postgresql/data/data/
700 systemd-coredump:root /var/lib/postgresql/data/data/pgdata

systemd-coredump translates to 999 when reviewing /etc/passwd so chown 999:0 /var/lib/postgresql/data/data/pgdata

Despite having checked the folder permissions at each level I still run into the error.

Running kubectl describe sts/awx-postgres-13 -n awx shows no events.

Other things I have tried:

  • I deleted the helm deployment by running helm delete awx-operator-<number> and then running helm install again to redeploy. This made no difference.
  • I even tried resetting my cluster (this is the only app “running” in it.) and redeploying the app in case of etcd corruption of some sort. :thinking:

OK, I figured out the issue -

kubectl describe po/awx-postgres-13-0 -n awx showed me that there was a scheduling issue with the pod. No nodes were available for scheduling. This was because my awxvalues.yaml had the following defined:

My node label was set wrong-

gdev-kube02.gdev.org Ready <none> 18d v1.28.15 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=gdev-kube02.gdev.org,kubernetes.io/os=linux,node-for=psql

it should have been-
gdev-kube02.gdev.org Ready <none> 18d v1.28.15 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=gdev-kube02.gdev.org,kubernetes.io/os=linux,nodefor=psql

Syntax makes all the difference! Once I fixed the label the postgresql pod spun up as expected.

2 Likes