Migrating AWX postgres 13 to 15

Greetings,

in the past few months we have been attempting to migrate awx to a newer version, with that postgres. However we are experiencing issues with the migration from postgres-13 to postgres-15.

This is our current awx.yml

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  # Avoid hiding logs
  no_log: false
  ingress_type: ingress
  ingress_tls_secret: awx-secret-tls
  # hostname: {{ inventory_hostname }}
  hostname: {{ dnsname }}
  admin_user: admin
  # Password depends on: roles/awx/templates/admin.yml.j2
  admin_password_secret: awx-admin-password
  secret_key_secret: awx-secret-key
  
  old_postgres_configuration_secret: awx-old-postgres-configuration
  postgres_configuration_secret: awx-postgres-configuration
  
  # STORAGE
  # -------
  # Claim storage for Postgres
  postgres_storage_class: awx-postgres-volume
    # Claim storage for Projects
  projects_persistence: True
  projects_existing_claim: awx-projects-claim

  # CPU and memory
  # --------------
  # Tasks
  task_resource_requirements:
    requests:
      cpu: 1000m
      memory: {{ task_resource_min }}Mi
    limits:
      cpu: 2000m
      memory: {{ task_resource_max }}Mi
  # Web
  web_resource_requirements:
    requests:
      cpu: 1000m
      memory: {{ web_resource_min }}Mi
    limits:
      cpu: 2000m
      memory: {{ web_resource_max }}Mi
  # Execution Environment
  ee_resource_requirements:
    requests:
      cpu: 600m
      memory: {{ ee_resource_min }}Mi
    limits:
      cpu: 1200m
      memory: {{ ee_resource_max }}Mi
  # Postgres
  postgres_resource_requirements:
    requests:
      cpu: 600m
      memory: {{ postgres_resource_min }}Mi
    limits:
      cpu: 2000m
      memory: {{ postgres_resource_max }}Mi

  # Adding reconnect in hope that logs won't get truncated
  # https://github.com/ansible/awx/issues/14057
  ee_extra_env: |
    - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
      value: enabled
  # CPU request/limit:
  # https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
  # "For CPU resource units, the quantity expression 0.1 is equivalent to the expression 100m. 
  # CPU resource is always specified as an absolute amount of resource, never as a relative amount.""
  ### In our case, 2000m equals 20% CPU tinme

  # No devel...
  development_mode: False

Note the two following lines are set.

old_postgres_configuration_secret: awx-old-postgres-configuration
postgres_configuration_secret: awx-postgres-configuration

old_pg.yml (awx-old-postgres-configuration):

---
apiVersion: v1
kind: Secret
metadata:
  name: awx-old-postgres-configuration
  namespace: awx
stringData:
  host: awx-postgres-13-0
  port: '5432'
  database: awx
  username: awx
  password: {{ pg_pass }}
type: Opaque

pg.yml (awx–postgres-configuration):

---
apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: awx-postgres-15-0
  port: '5432'
  database: awx
  username: awx
  password: {{ pg_pass }}
type: Opaque

note that the two pg files are equivilant.

Now to the issue..

When the pods start we can clearly see awx-controller, postgres-13-0, and postgres-15-0 has started, while inspecting the logs of awx-controller it clearly states “migration” is ongoing. After a few minutes the postgres-13-0 pod is removed leaving awx-controller and postgres-15-0.

Awx-controller then logs the following:

pg_restore: error: input file is too short (read 0, expected 5)\\nTerminated\\n\", \"stderr_lines\": [\"pg_dump: error: could not translate host name \\\"awx-postgres-13-0\\\" to address: Name or service not known\", \"pg_restore: error: input file is too short (read 0, expected 5)\", \"Terminated\"], \"stdout\": \"keepalive_pid: 269\\nMigrating data from old database...\\n\", \"stdout_lines\": [\"keepalive_pid: 269\", \"Migrating data from old database...\"]}\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost : ok=63 changed=0 unreachable=0 failed=1 skipped=26 rescued=0 ignored=0 \n","job":"8243886739850538834","name":"awx","namespace":"awx","error":"exit status 2","stacktrace":"github.com/operator-framework/ansible-operator-plugins/internal/ansible/runner.(*runner).Run.func1\n\tansible-operator-plugins/internal/ansible/runner/runner.go:269"}

Which all makes sence when you consider postgres-13-0 being terminated.

We did add a StorageClass as postgres-15-0 initially didnt start at all, because postgres-13-0 was already using the persistantvolume created, and therefore postgres-15-0 could not connect to it. By adding StorageClass both postgres-13-0 and postgres-15-0 could be up at the same time.

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: awx-postgres-volume
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain

If this was the right approach we don’t know.

If there is anyone out there with any idea to what is wrong we would greatly appreciate it.

1 Like