Greetings,
in the past few months we have been attempting to migrate awx to a newer version, with that postgres. However we are experiencing issues with the migration from postgres-13 to postgres-15.
This is our current awx.yml
---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
namespace: awx
spec:
# Avoid hiding logs
no_log: false
ingress_type: ingress
ingress_tls_secret: awx-secret-tls
# hostname: {{ inventory_hostname }}
hostname: {{ dnsname }}
admin_user: admin
# Password depends on: roles/awx/templates/admin.yml.j2
admin_password_secret: awx-admin-password
secret_key_secret: awx-secret-key
old_postgres_configuration_secret: awx-old-postgres-configuration
postgres_configuration_secret: awx-postgres-configuration
# STORAGE
# -------
# Claim storage for Postgres
postgres_storage_class: awx-postgres-volume
# Claim storage for Projects
projects_persistence: True
projects_existing_claim: awx-projects-claim
# CPU and memory
# --------------
# Tasks
task_resource_requirements:
requests:
cpu: 1000m
memory: {{ task_resource_min }}Mi
limits:
cpu: 2000m
memory: {{ task_resource_max }}Mi
# Web
web_resource_requirements:
requests:
cpu: 1000m
memory: {{ web_resource_min }}Mi
limits:
cpu: 2000m
memory: {{ web_resource_max }}Mi
# Execution Environment
ee_resource_requirements:
requests:
cpu: 600m
memory: {{ ee_resource_min }}Mi
limits:
cpu: 1200m
memory: {{ ee_resource_max }}Mi
# Postgres
postgres_resource_requirements:
requests:
cpu: 600m
memory: {{ postgres_resource_min }}Mi
limits:
cpu: 2000m
memory: {{ postgres_resource_max }}Mi
# Adding reconnect in hope that logs won't get truncated
# https://github.com/ansible/awx/issues/14057
ee_extra_env: |
- name: RECEPTOR_KUBE_SUPPORT_RECONNECT
value: enabled
# CPU request/limit:
# https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
# "For CPU resource units, the quantity expression 0.1 is equivalent to the expression 100m.
# CPU resource is always specified as an absolute amount of resource, never as a relative amount.""
### In our case, 2000m equals 20% CPU tinme
# No devel...
development_mode: False
Note the two following lines are set.
old_postgres_configuration_secret: awx-old-postgres-configuration
postgres_configuration_secret: awx-postgres-configuration
old_pg.yml (awx-old-postgres-configuration):
---
apiVersion: v1
kind: Secret
metadata:
name: awx-old-postgres-configuration
namespace: awx
stringData:
host: awx-postgres-13-0
port: '5432'
database: awx
username: awx
password: {{ pg_pass }}
type: Opaque
pg.yml (awx–postgres-configuration):
---
apiVersion: v1
kind: Secret
metadata:
name: awx-postgres-configuration
namespace: awx
stringData:
host: awx-postgres-15-0
port: '5432'
database: awx
username: awx
password: {{ pg_pass }}
type: Opaque
note that the two pg files are equivilant.
Now to the issue..
When the pods start we can clearly see awx-controller, postgres-13-0, and postgres-15-0 has started, while inspecting the logs of awx-controller it clearly states “migration” is ongoing. After a few minutes the postgres-13-0 pod is removed leaving awx-controller and postgres-15-0.
Awx-controller then logs the following:
pg_restore: error: input file is too short (read 0, expected 5)\\nTerminated\\n\", \"stderr_lines\": [\"pg_dump: error: could not translate host name \\\"awx-postgres-13-0\\\" to address: Name or service not known\", \"pg_restore: error: input file is too short (read 0, expected 5)\", \"Terminated\"], \"stdout\": \"keepalive_pid: 269\\nMigrating data from old database...\\n\", \"stdout_lines\": [\"keepalive_pid: 269\", \"Migrating data from old database...\"]}\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost : ok=63 changed=0 unreachable=0 failed=1 skipped=26 rescued=0 ignored=0 \n","job":"8243886739850538834","name":"awx","namespace":"awx","error":"exit status 2","stacktrace":"github.com/operator-framework/ansible-operator-plugins/internal/ansible/runner.(*runner).Run.func1\n\tansible-operator-plugins/internal/ansible/runner/runner.go:269"}
Which all makes sence when you consider postgres-13-0 being terminated.
We did add a StorageClass as postgres-15-0 initially didnt start at all, because postgres-13-0 was already using the persistantvolume created, and therefore postgres-15-0 could not connect to it. By adding StorageClass both postgres-13-0 and postgres-15-0 could be up at the same time.
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: awx-postgres-volume
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
If this was the right approach we don’t know.
If there is anyone out there with any idea to what is wrong we would greatly appreciate it.